Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (8): 53-61    DOI: 10.11925/infotech.2096-3467.2018.1198
Current Issue | Archive | Adv Search |
Sentence Function Recognition Based on Active Learning
Guo Chen1,2(),Tianxiang Xu1
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety, Nanjing 210094, China
Download: PDF(1017 KB)   HTML ( 15
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper uses active learning methods, structured abstracts and a few annotations to create a classification model for sentence functions, aiming to reduce the dependence on manually labeled corpus. [Methods] First, we trained the SVM, CNN and Bi-LSTM classifiers with structured function sentences from abstracts. With the help of active learning techniques, we predicted the function of a large number of unlabeled common abstract sentences. Third, we automatically identified uncertain samples for manual annotation, which were used to optimize the initial classifier. Finally, we used active learning to improve the performance of classifiers. [Results] We examined the new method with Library and Information Science literature. The precision, recall, and F1 values were 84.65%, 84.49%, and 84.57%, which were 3.25%, 3.24%, and 3.25% higher than those of the traditional methods. [Limitations] We only conducted five iterations to avoid massive work of manual corpus annotation. [Conclusions] Active learning method could effectively discover the difference between unlabeled corpus and existing training corpus, which also reduces the manual labeling costs. The proposed method might be used in citation and full text analysis.

Key wordsStructured Abstract      Sentence Function Recognition      Active Learning      Short Text Classification     
Received: 29 October 2018      Published: 29 September 2019
ZTFLH:  TP391  
Corresponding Authors: Guo Chen     E-mail: delphi1987@qq.com

Cite this article:

Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning. Data Analysis and Knowledge Discovery, 2019, 3(8): 53-61.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.1198     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I8/53

功能类型 结构化摘要标签
目的/意义 目的/意义 目的
方法/过程 方法/过程 过程/方法 方法/内容 方法 过程
结果/结论 结论/结果 结果/结论 结果 结论
局限 局限
应用背景 应用背景
文献范围 文献范围
类别 特征词
目的/意义 重要旨在意义以期问题
方法/过程 进行分析提供采用通过
结果/结论 结果表明发现能够表明结果显示
序号 SVM CNN Bi-LSTM
P R F1 P R F1 P R F1
1 91.66 91.12 91.39 92.75 92.45 92.60 91.93 92.07 92.00
2 91.80 91.62 91.71 92.56 92.51 92.53 93.01 93.20 93.10
3 91.21 91.12 91.16 92.73 92.60 92.66 93.12 93.07 93.09
4 90.96 90.77 90.86 89.50 91.41 90.44 93.68 93.48 93.58
5 92.39 92.21 92.29 92.54 92.52 92.53 94.35 94.19 94.27
6 91.19 91.05 91.11 90.30 90.36 90.32 93.35 93.38 93.36
7 90.10 90.62 90.35 93.23 93.18 93.20 93.81 93.97 93.89
8 91.87 90.68 91.27 93.41 93.21 93.31 93.23 93.52 93.37
9 91.39 91.36 91.37 92.11 91.12 92.11 92.23 92.13 92.18
10 89.88 89.91 89.89 91.01 91.11 91.06 91.68 91.48 91.58
均值 91.24 91.05 91.14 92.01 92.05 92.03 93.04 93.05 93.04
方法 P R F1
SVM 81.62 81.19 81.40
CNN 81.21 81.12 81.16
Bi-LSTM 81.40 81.25 81.32
迭代轮数 SVM CNN Bi-LSTM
P R F1 P R F1 P R F1
1 82.94 81.21 82.07 81.80 82.57 82.18 83.07 82.22 82.64
2 83.14 83.18 83.16 82.87 82.70 82.78 83.90 83.80 83.85
3 83.46 83.46 83.46 82.85 82.70 82.77 83.70 83.32 83.51
4 83.37 83.39 83.38 83.38 83.18 83.28 84.29 83.94 84.11
5 83.31 83.32 83.31 83.94 83.80 83.87 84.65 84.49 84.57
[1] 陆伟, 黄永, 程齐凯 . 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014,33(9):979-985.
[1] ( Lu Wei, Huang Yong, Cheng Qikai . The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(9):979-985.)
[2] 唐晓波, 肖璐 . 基于单句粒度的微博主题挖掘研究[J]. 情报学报, 2014,33(6):623-632.
[2] ( Tang Xiaobo, Xiao Lu . Research of Micro-Blog Topics Mining Based on Sentence Granularity[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(6):623-632.)
[3] 段平 . 如何撰写科技论文英文信息型摘要[J]. 大学英语, 2000(12):51-52.
[3] ( Duan Ping . How to Write English Informative Abstract in Paper for Special Science and Technology[J]. College English, 2000(12):51-52.)
[4] 郑彦宁, 化柏林 . 句子级知识抽取在情报学中的应用分析[J]. 情报理论与实践, 2011,34(12):1-4.
[4] ( Zheng Yanning, Hua Bolin . An Analysis of the Application of Sentence-Level Knowledge Extraction in Information Science[J]. Information Studies:Theory & Application, 2011,34(12):1-4.)
[5] 王文娟, 马建霞, 陈春 , 等. 引文文本分类与实现方法研究综述[J]. 图书情报工作, 2016,60(6):118-127.
[5] ( Wang Wenjuan, Ma Jianxia, Chen Chun , et al. A Review of Citation Context Classifications and Implementation Methods[J]. Library and Information Service, 2016,60(6):118-127.)
[6] 刘康, 钱旭, 王自强 . 主动学习算法综述[J]. 计算机工程与应用, 2012,48(34):1-4, 22.
[6] ( Liu Kang, Qian Xu, Wang Ziqiang . Survey on Active Learning Algorithms[J]. Computer Engineering and Applications, 2012,48(34):1-4, 22.)
[7] 李湘东, 曹环, 丁丛 , 等. 利用《知网》和领域关键词集扩展方法的短文本分类研究[J]. 现代图书情报技术, 2015(2):31-38.
[7] ( Li Xiangdong, Cao Huan, Ding Cong , et al. Short-text Classification Based on HowNet and Domain Keyword Set Extension[J]. New Technology of Library and Information Service, 2015(2):31-38.)
[8] Fan X, Hu H . Utilizing High-quality Feature Extension Mode to Classify Chinese Short-text[J]. Journal of Networks, 2010,5(12):1417-1425.
[9] Kim K, Chung B S, Choi Y , et al. Language Independent Semantic Kernels for Short-Text Classification[J]. Expert Systems with Applications, 2014,41(2):735-743.
[10] Chen M, Jin X, Shen D. Short Text Classification Improved by Learning Multi-Granularity Topics[C]// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. AAAI Press, 2011: 1776-1781.
[11] Dai Z, Sun A, Liu X Y. Crest: Cluster-based Representation Enrichment for Short Text Classification [C]// Proceedings of the 2013 Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2013: 256-267.
[12] Young T, Hazarika D, Poria S , et al. Recent Trends in Deep Learning Based Natural Language Processing[J]. IEEE Computational Intelligence Magazine, 2018,13(3):55-75.
[13] 吴鹏, 应杨, 沈思 . 基于双向长短期记忆模型的网民负面情感分类研究[J]. 情报学报, 2018,37(8):845-853.
[13] ( Wu Peng, Ying Yang, Shen Si . Negative Emotions of Online Users’ Analysis Based on Bidirectional Long Short-Term Memory[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(8):845-853.)
[14] 王东波, 高瑞卿, 沈思 , 等. 基于深度学习的先秦典籍问句自动分类研究[J]. 情报学报, 2018,37(11):1114-1122.
[14] ( Wang Dongbo, Gao Ruiqing, Shen Si , et al. Deep Learning-Based Classification of Pre-Qin Classics Questions[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(11):1114-1122.)
[15] 王盛玉, 曾碧卿, 商齐 , 等. 基于词注意力卷积神经网络模型的情感分析研究[J]. 中文信息学报, 2018,32(9):123-131.
[15] ( Wang Shengyu, Zeng Biqing, Shang Qi , et al. Word Attention-based Convolutional Neural Networks for Sentiment Analysis[J]. Journal of Chinese Information Processing, 2018,32(9):123-131.)
[16] Teufel S, Siddharthan A, Dan T. Automatic Classification of Citation Function [C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2006: 103-110.
[17] Dong C, Schäfer U. Ensemble-style Self-training on Citation Classification [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 623-631.
[18] Teufel S, Moens M . Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status[J]. Computational Linguistics, 2002,28(4):409-445.
[19] Abu-Jbara A, Radev D. Coherent Citation-Based Summarization of Scientific Papers [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011: 500-509.
[20] 许红波 . 基于引文上下文的学术文献摘要方法研究[D]. 西安: 西北农林科技大学, 2017.
[20] ( Xu Hongbo . Citation-Context Based Academic Literature Summarization Method[D]. Xi’an: Northwest A&F University, 2017.)
[21] McKnight L, Srinivasan P . Categorization of Sentence Types in Medical Abstracts[J]. AMIA Annual Symposium Proceedings, 2003: 440-444.
[22] 华秀丽, 徐凡, 王中卿 , 等. 细粒度科技论文摘要句子分类方法[J]. 计算机工程, 2012,38(14):138-140.
doi: 10.3969/j.issn.1000-3428.2012.14.041
[22] ( Hua Xiuli, Xu Fan, Wang Zhongqing , et al. Fine-grained Classification Method for Abstract Sentence of Scientific Paper[J]. Computer Engineering, 2012,38(14):138-140.)
doi: 10.3969/j.issn.1000-3428.2012.14.041
[23] 王东波, 陆昊翔, 周鑫 , 等. 面向摘要结构功能划分的模型性能比较研究[J]. 图书情报工作, 2018,62(12):84-90.
[23] ( Wang Dongbo, Lu Haoxiang, Zhou Xin , et al. A Comparative Study of Model Performances Facing Abstract Structure Function[J]. Library and Information Service, 2018,62(12):84-90.)
[24] Karlos S, Fazakis N, Kalleris K, et al. An Incremental Self-Trained Ensemble Algorithm [C]// Proceedings of the 2018 IEEE Conference on Evolving & Adaptive Intelligent Systems. IEEE, 2018: 1-8.
[25] 赵洪, 王芳 . 理论术语抽取的深度学习模型及自训练算法研究[J]. 情报学报, 2018,37(9):923-938.
[25] ( Zhao Hong, Wang Fang . A Deep Learning Model and Self-Training Algorithm for Theoretical Terms Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(9):923-938.)
[26] Pan S J, Yang Q . A Survey on Transfer Learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2009,22(10):1345-1359.
[27] 周清清, 章成志 . 基于迁移学习微博情绪分类研究——以H7N9微博为例[J]. 情报学报, 2016,35(4):339-348.
[27] ( Zhou Qingqing, Zhang Chengzhi . Microblog Emotion Classification Based on Transfer Learning——A Case Study of Microblogs About H7N9[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(4):339-348.)
[28] Cohn D A, Ghahramani Z, Jordan M I . Active Learning with Statistical Models[J]. Journal of Artificial Intelligence Research, 1996,4(1):705-712.
[29] 主动学习[OL]. [2018-12-27].
[29] ( Active Learning[OL]. [2018-12-27]. )
[30] Yamamoto Y, Takagi T. A Sentence Classification System for Multi Biomedical Literature Summarization [C]// Proceedings of the 21st International Conference on Data Engineering Workshops. IEEE, 2005: 1163.
[31] 陈涛, 谢阳群 . 文本分类中的特征降维方法综述[J]. 情报学报, 2005,24(6):690-695.
[31] ( Chen Tao, Xie Yangqun . Literature Review of Feature Dimension Reduction in Text Categorization[J]. Journal of the China Society for Scientific and Technical Information, 2005,24(6):690-695.)
[32] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [C]// Proceedings of the Neural Information Processing Systems 2013. 2013:3111-3119.
[1] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[2] Guangshang Gao. Reviewing Basic Methods of Entity Resolution[J]. 数据分析与知识发现, 2019, 3(5): 27-40.
[3] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[5] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[6] Qun Zhang, Hongjun Wang, Lunwen Wang. Classifying Short Texts with Word Embedding and LDA Model[J]. 数据分析与知识发现, 2016, 32(12): 27-35.
[7] Bi Qiumin, Li Ming, Zeng Zhiyong. Semi-supervised Micro-blog Sentiment Classification Method Combining Active Learning and Co-training[J]. 现代图书情报技术, 2015, 31(1): 38-44.
[8] Zhang Fan, Le Xiaoqiu. Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus[J]. 现代图书情报技术, 2014, 30(9): 15-21.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn