[Objective] This study aims to identify the research object attribute instance from the paper titles. With the help of limited labeled samples, we could maximumize the accuracy of research object recognition. [Methods] We first analyzed the grammatical features of scientific research objects based on conditional random field sequence labeling algorithm. Second, we recognized and extracted research objects using a small amount of samples. Finally, we introduced an active learning iterative labeling system based on unlabeled data to improve the research object recognition accuracy. [Results] The results showed that the proposed method could efficiently use the unlabeled data, and increase the accuracy of the research object recognition to 78.3%. [Limitations] The proposed algorithm needs to be further optimized to improve its efficiency. [Conclusions] The proposed method performed well on the research object attributes identification, which is the foundation for further mining the knowledge system and the structure of science and technology literature.
贺惠新,刘丽娟. 主动学习的科技文献研究对象标引体系研究*[J]. 现代图书情报技术, 2016, 32(3): 67-73.
He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning. New Technology of Library and Information Service, 2016, 32(3): 67-73.
Lan M, Zhang Y Z, Lu Y, et al.Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results [C]. In: Proceedings of the 18th World Wide Web Conference, Madrid, Spain. 2009.
[2]
李红亮. 基于规则的百科人物属性抽取算法的研究[D]. 成都: 西南交通大学, 2013.
[2]
(Li Hongliang.Research on Character Attributes Extraction Based on Rules from Baidu Encyclopedia [D]. Chengdu: Southwest Jiaotong University, 2013.)
(Zeng Daojian, Lai Siwei, Zhang Yuanzhe, et al.Open Entity Attribute-Value Extraction from Unstructured Text[J]. Journal of Jiangxi Normal University: Natural Science Edition, 2013, 37(3): 279-283.)
[4]
Ghani R, Probst K, Liu Y, et al.Text Mining for Product Attribute Extraction[J]. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48.
(Jia Zhen, Yang Yufei, He Dake, et al.Attribute and Attribute Value Extracted from Chinese Online Encyclopedia[J]. Acta Scientiarum Naturalium University Pekinensis, 2014, 50(1): 41-47.)
(Liu Lijia, Guo Jianyi, Zhou Lanjiang, et al.Domain Concepts Entity Attribute Relation Extraction Based on LM Algorithm[J]. Journal of Chinese Information Processing, 2014, 28(6): 216-222.)
(Ding Yufei, Wang Yuefen, Liu Weijiang.Research on Knowledge Extraction for Semi-structure Text[J]. Information Studies: Theory & Application, 2015, 38(3): 101-106.)
(Ding Junjun, Zheng Yanning, Hua Bolin.Academic Concept Attribute Extraction Based on the Rules[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.)
[9]
Rebholz-Schuhmann D.Biomedical Named Entity Recognition, Whatizit [A]. // Encyclopedia of Systems Biology[M]. Springer New York, 2013: 132-134.
(Zhang Han, Lu Zhenyu, Cui Lei.Knowledge Extraction from Medical Literature Database Using Association Rule Mining —— Taking Four Anti- neoplastic Medicines as an Example[J]. New Technology of Library and Information Service, 2006(9): 49-52.)
[12]
Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning. 2001.
(Meng Hongyu, Xie Qingyu, Chang Hong, et al.Automatic Identification of TCM Terminology in Shanghan Lun Based on Conditional Random Field[J]. Journal of Beijing University of Chinese Medicine, 2015, 38(9): 587-590.)
(Zhang Fan, Le Xiaoqiu.Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. New Technology of Library and Information Service, 2015(5): 15-23.)
[15]
Pham S B, Hoffmann A.Extracting Positive Attributions from Scientific Papers[A]. // Discovery Science[M]. Springer Berlin Heidelberg, 2004: 169-182.
[16]
Pechsiri C, Kawtrakul A.Mining Causality for Explanation Knowledge from Text[J]. Journal of Computer Science and Technology, 2007, 22(6): 877-889.
[17]
Pechsiri C, Piriyakul R.Explanation Knowledge Graph Construction Through Causality Extraction from Texts[J]. Journal of Computer Science and Technology, 2010, 25(5): 1055-1070.
[18]
Xiao L, Tang K, Liu X, et al.Information Extraction from Nanotoxicity Related Publications [C]. In: Proceedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China. 2013: 25-30.
(Cheng Ziguang.Research on Named Entity Recognition and Relation Extraction Facing to Domain-Oriented Knowledge Base Construction [D]. Harbin: Harbin Institute of Technology, 2014.)
[20]
Xiao J, Su J, Zhou G D, et al.Protein-Protein Interaction Extraction: A Supervised Learning Approach [C]. In: Proceedings of the 1st International Symposium on Semantic Mining in Biomedicine. 2005: 51-59.
[21]
张益嘉. 生物医学领域的信息抽取与复合物识别研究[D]. 大连: 大连理工大学, 2014.
[21]
(Zhang Yijia.Information Extraction in Biomedical Literature and Protein Complex Identification [D]. Dalian: Dalian University of Technology, 2014.)
[22]
Li Y P, Hu X H, Lin H F, et al.Learning an Enriched Representation from Unlabeled Data for Protein-Protein Interaction Extraction[J]. BMC Bioinformatics, 2010, 11(S2): 7-10.
(Yan Zifei, Ji Donghong.Exploration of Chinese Temporal Information Extraction Based on CRF and Semi-supervised Learning[J]. Computer Engineering and Design, 2015, 36(6): 1642-1646.)