Sentence Function Recognition Based on Active Learning
Guo Chen1,2(),Tianxiang Xu1
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China 2Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety, Nanjing 210094, China
[Objective] This paper uses active learning methods, structured abstracts and a few annotations to create a classification model for sentence functions, aiming to reduce the dependence on manually labeled corpus. [Methods] First, we trained the SVM, CNN and Bi-LSTM classifiers with structured function sentences from abstracts. With the help of active learning techniques, we predicted the function of a large number of unlabeled common abstract sentences. Third, we automatically identified uncertain samples for manual annotation, which were used to optimize the initial classifier. Finally, we used active learning to improve the performance of classifiers. [Results] We examined the new method with Library and Information Science literature. The precision, recall, and F1 values were 84.65%, 84.49%, and 84.57%, which were 3.25%, 3.24%, and 3.25% higher than those of the traditional methods. [Limitations] We only conducted five iterations to avoid massive work of manual corpus annotation. [Conclusions] Active learning method could effectively discover the difference between unlabeled corpus and existing training corpus, which also reduces the manual labeling costs. The proposed method might be used in citation and full text analysis.
( Li Xiangdong, Cao Huan, Ding Cong , et al. Short-text Classification Based on HowNet and Domain Keyword Set Extension[J]. New Technology of Library and Information Service, 2015(2):31-38.)
Fan X, Hu H . Utilizing High-quality Feature Extension Mode to Classify Chinese Short-text[J]. Journal of Networks, 2010,5(12):1417-1425.
Kim K, Chung B S, Choi Y , et al. Language Independent Semantic Kernels for Short-Text Classification[J]. Expert Systems with Applications, 2014,41(2):735-743.
Chen M, Jin X, Shen D. Short Text Classification Improved by Learning Multi-Granularity Topics[C]// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. AAAI Press, 2011: 1776-1781.
Dai Z, Sun A, Liu X Y. Crest: Cluster-based Representation Enrichment for Short Text Classification [C]// Proceedings of the 2013 Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2013: 256-267.
Young T, Hazarika D, Poria S , et al. Recent Trends in Deep Learning Based Natural Language Processing[J]. IEEE Computational Intelligence Magazine, 2018,13(3):55-75.
( Wu Peng, Ying Yang, Shen Si . Negative Emotions of Online Users’ Analysis Based on Bidirectional Long Short-Term Memory[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(8):845-853.)
( Wang Dongbo, Gao Ruiqing, Shen Si , et al. Deep Learning-Based Classification of Pre-Qin Classics Questions[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(11):1114-1122.)
( Wang Shengyu, Zeng Biqing, Shang Qi , et al. Word Attention-based Convolutional Neural Networks for Sentiment Analysis[J]. Journal of Chinese Information Processing, 2018,32(9):123-131.)
Teufel S, Siddharthan A, Dan T. Automatic Classification of Citation Function [C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2006: 103-110.
Dong C, Schäfer U. Ensemble-style Self-training on Citation Classification [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 623-631.
Teufel S, Moens M . Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status[J]. Computational Linguistics, 2002,28(4):409-445.
Abu-Jbara A, Radev D. Coherent Citation-Based Summarization of Scientific Papers [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011: 500-509.
许红波 . 基于引文上下文的学术文献摘要方法研究[D]. 西安: 西北农林科技大学, 2017.
( Xu Hongbo . Citation-Context Based Academic Literature Summarization Method[D]. Xi’an: Northwest A&F University, 2017.)
McKnight L, Srinivasan P . Categorization of Sentence Types in Medical Abstracts[J]. AMIA Annual Symposium Proceedings, 2003: 440-444.
( Zhao Hong, Wang Fang . A Deep Learning Model and Self-Training Algorithm for Theoretical Terms Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(9):923-938.)
Pan S J, Yang Q . A Survey on Transfer Learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2009,22(10):1345-1359.
( Zhou Qingqing, Zhang Chengzhi . Microblog Emotion Classification Based on Transfer Learning——A Case Study of Microblogs About H7N9[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(4):339-348.)
Cohn D A, Ghahramani Z, Jordan M I . Active Learning with Statistical Models[J]. Journal of Artificial Intelligence Research, 1996,4(1):705-712.
( Active Learning[OL]. [2018-12-27]. )
Yamamoto Y, Takagi T. A Sentence Classification System for Multi Biomedical Literature Summarization [C]// Proceedings of the 21st International Conference on Data Engineering Workshops. IEEE, 2005: 1163.