|
|
A Scientific Research Object Labeling System Based on Active earning |
He Huixin,Liu Lijuan() |
Tongfang Knowledge Network Technology Co., Ltd. (Beijing), Beijing 100192, China |
|
|
Abstract [Objective] This study aims to identify the research object attribute instance from the paper titles. With the help of limited labeled samples, we could maximumize the accuracy of research object recognition. [Methods] We first analyzed the grammatical features of scientific research objects based on conditional random field sequence labeling algorithm. Second, we recognized and extracted research objects using a small amount of samples. Finally, we introduced an active learning iterative labeling system based on unlabeled data to improve the research object recognition accuracy. [Results] The results showed that the proposed method could efficiently use the unlabeled data, and increase the accuracy of the research object recognition to 78.3%. [Limitations] The proposed algorithm needs to be further optimized to improve its efficiency. [Conclusions] The proposed method performed well on the research object attributes identification, which is the foundation for further mining the knowledge system and the structure of science and technology literature.
|
Received: 13 October 2015
Published: 12 April 2016
|
[1] | Lan M, Zhang Y Z, Lu Y, et al.Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results [C]. In: Proceedings of the 18th World Wide Web Conference, Madrid, Spain. 2009. | [2] | 李红亮. 基于规则的百科人物属性抽取算法的研究[D]. 成都: 西南交通大学, 2013. | [2] | (Li Hongliang.Research on Character Attributes Extraction Based on Rules from Baidu Encyclopedia [D]. Chengdu: Southwest Jiaotong University, 2013.) | [3] | 曾道建, 来斯惟, 张元哲, 等. 面向非结构化文本的开放式实体属性抽取[J]. 江西师范大学学报: 自然科学版, 2013, 37(3): 279-283. | [3] | (Zeng Daojian, Lai Siwei, Zhang Yuanzhe, et al.Open Entity Attribute-Value Extraction from Unstructured Text[J]. Journal of Jiangxi Normal University: Natural Science Edition, 2013, 37(3): 279-283.) | [4] | Ghani R, Probst K, Liu Y, et al.Text Mining for Product Attribute Extraction[J]. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48. | [5] | 贾真, 杨宇飞, 何大可, 等. 面向中文网络百科的属性和属性值抽取[J]. 北京大学学报: 自然科学版, 2014, 50(1): 41-47. | [5] | (Jia Zhen, Yang Yufei, He Dake, et al.Attribute and Attribute Value Extracted from Chinese Online Encyclopedia[J]. Acta Scientiarum Naturalium University Pekinensis, 2014, 50(1): 41-47.) | [6] | 刘丽佳, 郭剑毅, 周兰江, 等. 基于LM 算法的领域概念实体属性关系抽取[J]. 中文信息学报, 2014, 28(6): 216-222. | [6] | (Liu Lijia, Guo Jianyi, Zhou Lanjiang, et al.Domain Concepts Entity Attribute Relation Extraction Based on LM Algorithm[J]. Journal of Chinese Information Processing, 2014, 28(6): 216-222.) | [7] | 丁玉飞, 王曰芬, 刘卫江. 面向半结构化文本的知识抽取研究[J]. 情报理论与实践, 2015, 38(3): 101-106. | [7] | (Ding Yufei, Wang Yuefen, Liu Weijiang.Research on Knowledge Extraction for Semi-structure Text[J]. Information Studies: Theory & Application, 2015, 38(3): 101-106.) | [8] | 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14. | [8] | (Ding Junjun, Zheng Yanning, Hua Bolin.Academic Concept Attribute Extraction Based on the Rules[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.) | [9] | Rebholz-Schuhmann D.Biomedical Named Entity Recognition, Whatizit [A]. // Encyclopedia of Systems Biology[M]. Springer New York, 2013: 132-134. | [10] | Fundel K, Küffner R, Zimmer R.RelEx—Relation Extraction Using Dependency Parse Trees[J]. Bioinformatics, 2007, 23(3): 365-371. | [11] | 张晗, 路振宇, 崔雷. 利用关联规则对医学文本数据库进行知识抽取的尝试——以四种抗肿瘤药为例[J]. 现代图书情报技术, 2006(9): 49-52. | [11] | (Zhang Han, Lu Zhenyu, Cui Lei.Knowledge Extraction from Medical Literature Database Using Association Rule Mining —— Taking Four Anti- neoplastic Medicines as an Example[J]. New Technology of Library and Information Service, 2006(9): 49-52.) | [12] | Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning. 2001. | [13] | 孟洪宇, 谢晴宇, 常虹, 等. 基于条件随机场的《伤寒论》 中医术语自动识别[J]. 北京中医药大学学报, 2015, 38(9): 587-590. | [13] | (Meng Hongyu, Xie Qingyu, Chang Hong, et al.Automatic Identification of TCM Terminology in Shanghan Lun Based on Conditional Random Field[J]. Journal of Beijing University of Chinese Medicine, 2015, 38(9): 587-590.) | [14] | 张帆, 乐小虬. 领域科技文献创新点句中主题属性实例识别方法研究[J]. 现代图书情报技术, 2015(5): 15-23. | [14] | (Zhang Fan, Le Xiaoqiu.Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. New Technology of Library and Information Service, 2015(5): 15-23.) | [15] | Pham S B, Hoffmann A.Extracting Positive Attributions from Scientific Papers[A]. // Discovery Science[M]. Springer Berlin Heidelberg, 2004: 169-182. | [16] | Pechsiri C, Kawtrakul A.Mining Causality for Explanation Knowledge from Text[J]. Journal of Computer Science and Technology, 2007, 22(6): 877-889. | [17] | Pechsiri C, Piriyakul R.Explanation Knowledge Graph Construction Through Causality Extraction from Texts[J]. Journal of Computer Science and Technology, 2010, 25(5): 1055-1070. | [18] | Xiao L, Tang K, Liu X, et al.Information Extraction from Nanotoxicity Related Publications [C]. In: Proceedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China. 2013: 25-30. | [19] | 程紫光. 面向领域知识库构建的实体识别及关系抽取技术[D]. 哈尔滨: 哈尔滨工业大学, 2014. | [19] | (Cheng Ziguang.Research on Named Entity Recognition and Relation Extraction Facing to Domain-Oriented Knowledge Base Construction [D]. Harbin: Harbin Institute of Technology, 2014.) | [20] | Xiao J, Su J, Zhou G D, et al.Protein-Protein Interaction Extraction: A Supervised Learning Approach [C]. In: Proceedings of the 1st International Symposium on Semantic Mining in Biomedicine. 2005: 51-59. | [21] | 张益嘉. 生物医学领域的信息抽取与复合物识别研究[D]. 大连: 大连理工大学, 2014. | [21] | (Zhang Yijia.Information Extraction in Biomedical Literature and Protein Complex Identification [D]. Dalian: Dalian University of Technology, 2014.) | [22] | Li Y P, Hu X H, Lin H F, et al.Learning an Enriched Representation from Unlabeled Data for Protein-Protein Interaction Extraction[J]. BMC Bioinformatics, 2010, 11(S2): 7-10. | [23] | 闫紫飞, 姬东鸿. 基于CRF和半监督学习的中文时间信息抽取[J]. 计算机工程与设计, 2015, 36(6): 1642-1646. | [23] | (Yan Zifei, Ji Donghong.Exploration of Chinese Temporal Information Extraction Based on CRF and Semi-supervised Learning[J]. Computer Engineering and Design, 2015, 36(6): 1642-1646.) | [24] | 中国知网[OL]. [2015-06-25]. | [24] | (CNKI [OL]. [2015-06-25]. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|