Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (3): 67-73    DOI: 10.11925/infotech.1003-3513.2016.03.09
Orginal Article Current Issue | Archive | Adv Search |
A Scientific Research Object Labeling System Based on Active earning
He Huixin,Liu Lijuan()
Tongfang Knowledge Network Technology Co., Ltd. (Beijing), Beijing 100192, China
Download: PDF(558 KB)   HTML ( 26
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to identify the research object attribute instance from the paper titles. With the help of limited labeled samples, we could maximumize the accuracy of research object recognition. [Methods] We first analyzed the grammatical features of scientific research objects based on conditional random field sequence labeling algorithm. Second, we recognized and extracted research objects using a small amount of samples. Finally, we introduced an active learning iterative labeling system based on unlabeled data to improve the research object recognition accuracy. [Results] The results showed that the proposed method could efficiently use the unlabeled data, and increase the accuracy of the research object recognition to 78.3%. [Limitations] The proposed algorithm needs to be further optimized to improve its efficiency. [Conclusions] The proposed method performed well on the research object attributes identification, which is the foundation for further mining the knowledge system and the structure of science and technology literature.

Key wordsScientific literature      Research objects      Conditional Random Fields      Iterative labeling system      Active learning     
Received: 13 October 2015      Published: 12 April 2016

Cite this article:

He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning. New Technology of Library and Information Service, 2016, 32(3): 67-73.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.03.09     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I3/67

[1] Lan M, Zhang Y Z, Lu Y, et al.Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results [C]. In: Proceedings of the 18th World Wide Web Conference, Madrid, Spain. 2009.
[2] 李红亮. 基于规则的百科人物属性抽取算法的研究[D]. 成都: 西南交通大学, 2013.
[2] (Li Hongliang.Research on Character Attributes Extraction Based on Rules from Baidu Encyclopedia [D]. Chengdu: Southwest Jiaotong University, 2013.)
[3] 曾道建, 来斯惟, 张元哲, 等. 面向非结构化文本的开放式实体属性抽取[J]. 江西师范大学学报: 自然科学版, 2013, 37(3): 279-283.
[3] (Zeng Daojian, Lai Siwei, Zhang Yuanzhe, et al.Open Entity Attribute-Value Extraction from Unstructured Text[J]. Journal of Jiangxi Normal University: Natural Science Edition, 2013, 37(3): 279-283.)
[4] Ghani R, Probst K, Liu Y, et al.Text Mining for Product Attribute Extraction[J]. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48.
[5] 贾真, 杨宇飞, 何大可, 等. 面向中文网络百科的属性和属性值抽取[J]. 北京大学学报: 自然科学版, 2014, 50(1): 41-47.
[5] (Jia Zhen, Yang Yufei, He Dake, et al.Attribute and Attribute Value Extracted from Chinese Online Encyclopedia[J]. Acta Scientiarum Naturalium University Pekinensis, 2014, 50(1): 41-47.)
[6] 刘丽佳, 郭剑毅, 周兰江, 等. 基于LM 算法的领域概念实体属性关系抽取[J]. 中文信息学报, 2014, 28(6): 216-222.
[6] (Liu Lijia, Guo Jianyi, Zhou Lanjiang, et al.Domain Concepts Entity Attribute Relation Extraction Based on LM Algorithm[J]. Journal of Chinese Information Processing, 2014, 28(6): 216-222.)
[7] 丁玉飞, 王曰芬, 刘卫江. 面向半结构化文本的知识抽取研究[J]. 情报理论与实践, 2015, 38(3): 101-106.
[7] (Ding Yufei, Wang Yuefen, Liu Weijiang.Research on Knowledge Extraction for Semi-structure Text[J]. Information Studies: Theory & Application, 2015, 38(3): 101-106.)
[8] 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14.
[8] (Ding Junjun, Zheng Yanning, Hua Bolin.Academic Concept Attribute Extraction Based on the Rules[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.)
[9] Rebholz-Schuhmann D.Biomedical Named Entity Recognition, Whatizit [A]. // Encyclopedia of Systems Biology[M]. Springer New York, 2013: 132-134.
[10] Fundel K, Küffner R, Zimmer R.RelEx—Relation Extraction Using Dependency Parse Trees[J]. Bioinformatics, 2007, 23(3): 365-371.
[11] 张晗, 路振宇, 崔雷. 利用关联规则对医学文本数据库进行知识抽取的尝试——以四种抗肿瘤药为例[J]. 现代图书情报技术, 2006(9): 49-52.
[11] (Zhang Han, Lu Zhenyu, Cui Lei.Knowledge Extraction from Medical Literature Database Using Association Rule Mining —— Taking Four Anti- neoplastic Medicines as an Example[J]. New Technology of Library and Information Service, 2006(9): 49-52.)
[12] Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning. 2001.
[13] 孟洪宇, 谢晴宇, 常虹, 等. 基于条件随机场的《伤寒论》 中医术语自动识别[J]. 北京中医药大学学报, 2015, 38(9): 587-590.
[13] (Meng Hongyu, Xie Qingyu, Chang Hong, et al.Automatic Identification of TCM Terminology in Shanghan Lun Based on Conditional Random Field[J]. Journal of Beijing University of Chinese Medicine, 2015, 38(9): 587-590.)
[14] 张帆, 乐小虬. 领域科技文献创新点句中主题属性实例识别方法研究[J]. 现代图书情报技术, 2015(5): 15-23.
[14] (Zhang Fan, Le Xiaoqiu.Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. New Technology of Library and Information Service, 2015(5): 15-23.)
[15] Pham S B, Hoffmann A.Extracting Positive Attributions from Scientific Papers[A]. // Discovery Science[M]. Springer Berlin Heidelberg, 2004: 169-182.
[16] Pechsiri C, Kawtrakul A.Mining Causality for Explanation Knowledge from Text[J]. Journal of Computer Science and Technology, 2007, 22(6): 877-889.
[17] Pechsiri C, Piriyakul R.Explanation Knowledge Graph Construction Through Causality Extraction from Texts[J]. Journal of Computer Science and Technology, 2010, 25(5): 1055-1070.
[18] Xiao L, Tang K, Liu X, et al.Information Extraction from Nanotoxicity Related Publications [C]. In: Proceedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China. 2013: 25-30.
[19] 程紫光. 面向领域知识库构建的实体识别及关系抽取技术[D]. 哈尔滨: 哈尔滨工业大学, 2014.
[19] (Cheng Ziguang.Research on Named Entity Recognition and Relation Extraction Facing to Domain-Oriented Knowledge Base Construction [D]. Harbin: Harbin Institute of Technology, 2014.)
[20] Xiao J, Su J, Zhou G D, et al.Protein-Protein Interaction Extraction: A Supervised Learning Approach [C]. In: Proceedings of the 1st International Symposium on Semantic Mining in Biomedicine. 2005: 51-59.
[21] 张益嘉. 生物医学领域的信息抽取与复合物识别研究[D]. 大连: 大连理工大学, 2014.
[21] (Zhang Yijia.Information Extraction in Biomedical Literature and Protein Complex Identification [D]. Dalian: Dalian University of Technology, 2014.)
[22] Li Y P, Hu X H, Lin H F, et al.Learning an Enriched Representation from Unlabeled Data for Protein-Protein Interaction Extraction[J]. BMC Bioinformatics, 2010, 11(S2): 7-10.
[23] 闫紫飞, 姬东鸿. 基于CRF和半监督学习的中文时间信息抽取[J]. 计算机工程与设计, 2015, 36(6): 1642-1646.
[23] (Yan Zifei, Ji Donghong.Exploration of Chinese Temporal Information Extraction Based on CRF and Semi-supervised Learning[J]. Computer Engineering and Design, 2015, 36(6): 1642-1646.)
[24] 中国知网[OL]. [2015-06-25].
[24] (CNKI [OL]. [2015-06-25].
[1] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[2] Guangshang Gao. Reviewing Basic Methods of Entity Resolution[J]. 数据分析与知识发现, 2019, 3(5): 27-40.
[3] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[4] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[5] Jiaqi Wang,Junsheng Zhang,Xiaodong Qiao. Analyzing Representation and Semantic Links of Scientific Research Events[J]. 数据分析与知识发现, 2018, 2(5): 32-39.
[6] Xiaoyu Wang,Bin Li. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[7] Dongbo Wang,Yi Wu,Wenhao Ye,Ruilun Liu. Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[8] Wang Ying, Wu Zhenxin, Xie Jing. Review on Semantic Retrieval System for Scientific Literature[J]. 现代图书情报技术, 2015, 31(5): 1-7.
[9] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[10] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] Bi Qiumin, Li Ming, Zeng Zhiyong. Semi-supervised Micro-blog Sentiment Classification Method Combining Active Learning and Co-training[J]. 现代图书情报技术, 2015, 31(1): 38-44.
[12] Zeng Zhen, Lv Xueqiang, Li Zhuo. The Automatic Identification of Chinese Names in Query Logs[J]. 现代图书情报技术, 2014, 30(12): 71-77.
[13] Lin Chen, Wang Lancheng. Object Recognition of Network Comments Based on Conditional Random Fields[J]. 现代图书情报技术, 2013, (6): 63-67.
[14] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
[15] Xing Meifeng. Study on Solution to Redundancy of Scientific Literature Keywords[J]. 现代图书情报技术, 2012, 28(1): 34-39.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn