[Objective] This study aims to identify the research object attribute instance from the paper titles. With the help of limited labeled samples, we could maximumize the accuracy of research object recognition. [Methods] We first analyzed the grammatical features of scientific research objects based on conditional random field sequence labeling algorithm. Second, we recognized and extracted research objects using a small amount of samples. Finally, we introduced an active learning iterative labeling system based on unlabeled data to improve the research object recognition accuracy. [Results] The results showed that the proposed method could efficiently use the unlabeled data, and increase the accuracy of the research object recognition to 78.3%. [Limitations] The proposed algorithm needs to be further optimized to improve its efficiency. [Conclusions] The proposed method performed well on the research object attributes identification, which is the foundation for further mining the knowledge system and the structure of science and technology literature.
贺惠新,刘丽娟. 主动学习的科技文献研究对象标引体系研究*[J]. 现代图书情报技术, 2016, 32(3): 67-73.
He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning. New Technology of Library and Information Service, DOI：10.11925/infotech.1003-3513.2016.03.09.
(Zhang Han, Lu Zhenyu, Cui Lei.Knowledge Extraction from Medical Literature Database Using Association Rule Mining —— Taking Four Anti- neoplastic Medicines as an Example[J]. New Technology of Library and Information Service, 2006(9): 49-52.)
Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning. 2001.
(Meng Hongyu, Xie Qingyu, Chang Hong, et al.Automatic Identification of TCM Terminology in Shanghan Lun Based on Conditional Random Field[J]. Journal of Beijing University of Chinese Medicine, 2015, 38(9): 587-590.)
(Zhang Fan, Le Xiaoqiu.Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. New Technology of Library and Information Service, 2015(5): 15-23.)
Pham S B, Hoffmann A.Extracting Positive Attributions from Scientific Papers[A]. // Discovery Science[M]. Springer Berlin Heidelberg, 2004: 169-182.
Pechsiri C, Kawtrakul A.Mining Causality for Explanation Knowledge from Text[J]. Journal of Computer Science and Technology, 2007, 22(6): 877-889.
Pechsiri C, Piriyakul R.Explanation Knowledge Graph Construction Through Causality Extraction from Texts[J]. Journal of Computer Science and Technology, 2010, 25(5): 1055-1070.
Xiao L, Tang K, Liu X, et al.Information Extraction from Nanotoxicity Related Publications [C]. In: Proceedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China. 2013: 25-30.
(Cheng Ziguang.Research on Named Entity Recognition and Relation Extraction Facing to Domain-Oriented Knowledge Base Construction [D]. Harbin: Harbin Institute of Technology, 2014.)
Xiao J, Su J, Zhou G D, et al.Protein-Protein Interaction Extraction: A Supervised Learning Approach [C]. In: Proceedings of the 1st International Symposium on Semantic Mining in Biomedicine. 2005: 51-59.
张益嘉. 生物医学领域的信息抽取与复合物识别研究[D]. 大连: 大连理工大学, 2014.
(Zhang Yijia.Information Extraction in Biomedical Literature and Protein Complex Identification [D]. Dalian: Dalian University of Technology, 2014.)
Li Y P, Hu X H, Lin H F, et al.Learning an Enriched Representation from Unlabeled Data for Protein-Protein Interaction Extraction[J]. BMC Bioinformatics, 2010, 11(S2): 7-10.