Sentence Function Recognition Based on Active Learning
Guo Chen1,2(),Tianxiang Xu1
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China 2Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety, Nanjing 210094, China
[Objective] This paper uses active learning methods, structured abstracts and a few annotations to create a classification model for sentence functions, aiming to reduce the dependence on manually labeled corpus. [Methods] First, we trained the SVM, CNN and Bi-LSTM classifiers with structured function sentences from abstracts. With the help of active learning techniques, we predicted the function of a large number of unlabeled common abstract sentences. Third, we automatically identified uncertain samples for manual annotation, which were used to optimize the initial classifier. Finally, we used active learning to improve the performance of classifiers. [Results] We examined the new method with Library and Information Science literature. The precision, recall, and F1 values were 84.65%, 84.49%, and 84.57%, which were 3.25%, 3.24%, and 3.25% higher than those of the traditional methods. [Limitations] We only conducted five iterations to avoid massive work of manual corpus annotation. [Conclusions] Active learning method could effectively discover the difference between unlabeled corpus and existing training corpus, which also reduces the manual labeling costs. The proposed method might be used in citation and full text analysis.
陈果,许天祥. 基于主动学习的科技论文句子功能识别研究 *[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning. Data Analysis and Knowledge Discovery, 2019, 3(8): 53-61.
( Lu Wei, Huang Yong, Cheng Qikai . The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(9):979-985.)
( Tang Xiaobo, Xiao Lu . Research of Micro-Blog Topics Mining Based on Sentence Granularity[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(6):623-632.)
[3]
段平 . 如何撰写科技论文英文信息型摘要[J]. 大学英语, 2000(12):51-52.
[3]
( Duan Ping . How to Write English Informative Abstract in Paper for Special Science and Technology[J]. College English, 2000(12):51-52.)
( Zheng Yanning, Hua Bolin . An Analysis of the Application of Sentence-Level Knowledge Extraction in Information Science[J]. Information Studies:Theory & Application, 2011,34(12):1-4.)
( Wang Wenjuan, Ma Jianxia, Chen Chun , et al. A Review of Citation Context Classifications and Implementation Methods[J]. Library and Information Service, 2016,60(6):118-127.)
( Li Xiangdong, Cao Huan, Ding Cong , et al. Short-text Classification Based on HowNet and Domain Keyword Set Extension[J]. New Technology of Library and Information Service, 2015(2):31-38.)
[8]
Fan X, Hu H . Utilizing High-quality Feature Extension Mode to Classify Chinese Short-text[J]. Journal of Networks, 2010,5(12):1417-1425.
[9]
Kim K, Chung B S, Choi Y , et al. Language Independent Semantic Kernels for Short-Text Classification[J]. Expert Systems with Applications, 2014,41(2):735-743.
[10]
Chen M, Jin X, Shen D. Short Text Classification Improved by Learning Multi-Granularity Topics[C]// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. AAAI Press, 2011: 1776-1781.
[11]
Dai Z, Sun A, Liu X Y. Crest: Cluster-based Representation Enrichment for Short Text Classification [C]// Proceedings of the 2013 Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2013: 256-267.
[12]
Young T, Hazarika D, Poria S , et al. Recent Trends in Deep Learning Based Natural Language Processing[J]. IEEE Computational Intelligence Magazine, 2018,13(3):55-75.
( Wu Peng, Ying Yang, Shen Si . Negative Emotions of Online Users’ Analysis Based on Bidirectional Long Short-Term Memory[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(8):845-853.)
( Wang Dongbo, Gao Ruiqing, Shen Si , et al. Deep Learning-Based Classification of Pre-Qin Classics Questions[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(11):1114-1122.)
( Wang Shengyu, Zeng Biqing, Shang Qi , et al. Word Attention-based Convolutional Neural Networks for Sentiment Analysis[J]. Journal of Chinese Information Processing, 2018,32(9):123-131.)
[16]
Teufel S, Siddharthan A, Dan T. Automatic Classification of Citation Function [C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2006: 103-110.
[17]
Dong C, Schäfer U. Ensemble-style Self-training on Citation Classification [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 623-631.
[18]
Teufel S, Moens M . Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status[J]. Computational Linguistics, 2002,28(4):409-445.
[19]
Abu-Jbara A, Radev D. Coherent Citation-Based Summarization of Scientific Papers [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011: 500-509.
[20]
许红波 . 基于引文上下文的学术文献摘要方法研究[D]. 西安: 西北农林科技大学, 2017.
[20]
( Xu Hongbo . Citation-Context Based Academic Literature Summarization Method[D]. Xi’an: Northwest A&F University, 2017.)
[21]
McKnight L, Srinivasan P . Categorization of Sentence Types in Medical Abstracts[J]. AMIA Annual Symposium Proceedings, 2003: 440-444.
( Wang Dongbo, Lu Haoxiang, Zhou Xin , et al. A Comparative Study of Model Performances Facing Abstract Structure Function[J]. Library and Information Service, 2018,62(12):84-90.)
[24]
Karlos S, Fazakis N, Kalleris K, et al. An Incremental Self-Trained Ensemble Algorithm [C]// Proceedings of the 2018 IEEE Conference on Evolving & Adaptive Intelligent Systems. IEEE, 2018: 1-8.
( Zhao Hong, Wang Fang . A Deep Learning Model and Self-Training Algorithm for Theoretical Terms Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(9):923-938.)
[26]
Pan S J, Yang Q . A Survey on Transfer Learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2009,22(10):1345-1359.
( Zhou Qingqing, Zhang Chengzhi . Microblog Emotion Classification Based on Transfer Learning——A Case Study of Microblogs About H7N9[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(4):339-348.)
[28]
Cohn D A, Ghahramani Z, Jordan M I . Active Learning with Statistical Models[J]. Journal of Artificial Intelligence Research, 1996,4(1):705-712.
[29]
主动学习[OL]. [2018-12-27].
[29]
( Active Learning[OL]. [2018-12-27]. )
[30]
Yamamoto Y, Takagi T. A Sentence Classification System for Multi Biomedical Literature Summarization [C]// Proceedings of the 21st International Conference on Data Engineering Workshops. IEEE, 2005: 1163.
( Chen Tao, Xie Yangqun . Literature Review of Feature Dimension Reduction in Text Categorization[J]. Journal of the China Society for Scientific and Technical Information, 2005,24(6):690-695.)
[32]
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [C]// Proceedings of the Neural Information Processing Systems 2013. 2013:3111-3119.