[Objective] This article aims to extract concept attribute instances in innovation sentences, and then to explore the relationship between concepts. [Methods] A method of recognizing core concept and concept attribute instances from dependency tree is presented. This method is based on the results of semantic role labeling and dependency parsing, and takes advantage of property of classes in domain Ontology. Considering the feature of dependency parsing, a concept combination module and a conjunction relationship detection module are designed to improve the effect of concept attribute instances recognition. [Results] The results show that the F value of core concept recognition is 77.94%, and the average F value of concept attribute instances recognition is around 90%. [Limitations] Stanford parsing tool leads to wrong parsing results which may result in inaccurate recognition. The class of Properties or Attributes in NCIt is not well filtered and standardized. [Conclusions] This method can effectively extract core concepts and concept attribute instances in innovation sentences.
张帆, 乐小虬. 领域科技文献创新点句中主题属性实例识别方法研究[J]. 现代图书情报技术, 2015, 31(5): 15-23.
Zhang Fan, Le Xiaoqiu. Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper. New Technology of Library and Information Service, 2015, 31(5): 15-23.
[1] 李如森, 彭彩红, 赵福荣. 科技论文创新性判断方法[J]. 鞍山钢铁学院学报, 2001, 24(3): 234-236. (Li Rusen, Peng Caihong, Zhao Furong. Judging Method of Innovation for Scientific and Technological Papers [J]. Journal of Anshan Institute of Iron and Steel Technology, 2001, 24(3): 234-236.)
[2] 张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21. (Zhang Fan, Le Xiaoqiu. Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus [J]. New Technology of Library and Information Service, 2014(9): 15-21.)
[3] Lan M, Zhang Y Z, Lu Y, et al. Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results [C]. In: Proceedings of the 2nd Web People Search Evaluation Workshop, Madrid, Spain. 2009.
[4] Ghani R, Probst K, Liu Y, et al. Text Mining for Product Attribute Extraction [J]. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48.
[5] 丁君军, 郑彦宁, 化柏林. 国内外属性抽取研究综述[J]. 情报科学, 2011, 29(5): 793-796. (Ding Junjun, Zheng Yanning, Hua Bolin. Survey on Attribute Extraction at Home and Abroad [J]. Information Science, 2011, 29(5): 793-796.)
[6] 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14. (Ding Junjun, Zheng Yanning, Hua Bolin. Attribute Extraction of Academic Concepts Based on Rules [J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.)
[7] Dietrich R S. Biomedical Named Entity Recognition, Whatizit [A].// Dubitzky W, Wolkenhauer O, Yokota H, et al. Encyclopedia of Systems Biology [M]. New York: Springer, 2013: 132-134.
[8] Jones D E, Igo S, Hurdle J, et al. Automatic Extraction of Nanoparticle Properties Using Natural Language Processing: NanoSifter an Application to Acquire PAMAM Dendrimer Properties [J]. PLoS One, 2014, 9(1): e83932.
[9] Fundel K, Küffner R, Zimmer R. RelEx--Relation Extraction Using Dependency Parse Trees [J]. Bioinformatics, 2007, 23(3): 365-371.
[10] Tang Y T, Li S J, Kao H Y, et al. Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction [J]. PLoS One, 2011, 6(5): e19633.
[11] Pechsiri C, Painuall S, Janviriyasopak U. Medicinal Property Knowledge Extraction from Herbal Documents for Supporting Question Answering System [A]. //New Frontiers in Applied Data Mining [M]. Springer Berlin Heidelberg, 2012: 431-443.
[12] Pechsiri C, Piriyakul R. Developing the UCKG-Why-QA System [C]. In: Proceedings of the 7th International Conference on Computing and Convergence Technology, Seoul, Korea. IEEE, 2012: 679-683.
[13] Feng D H, Burns G, Hovy E H. Extracting Data Records from Unstructured Biomedical Full Text [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. 2007: 837-846.
[14] Feng D H, Burns G, Zhu J B, et al. Towards Automated Semantic Analysis on Biomedical Research Articles [C]. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing. 2008.
[15] Pham S B, Hoffmann A G. Extracting Positive Attributions from Scientific Papers [C]. In: Proceedings of the 7th International Conference on Discovery Science, Padova, Italy. 2004: 169-182.
[16] Pechsiri C, Kawtrakul A. Mining Causality for Explanation Knowledge from Text [J]. Journal of Computer Science and Technology, 2007, 22(6): 877-889.
[17] Xiao L, Tang K, Liu X, et al. Information Extraction from Nanotoxicity Related Publications [C]. In: Proceedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China. 2013: 25-30.
[18] Pechsiri C, Piriyakul R. Explanation Knowledge Graph Construction Through Causality Extraction from Texts [J]. Journal of Computer Science and Technology, 2010, 25(5): 1055-1070.
[19] 王璐, 朱东华, 任智军. 科技术语属性抽取方法研究[J]. 现代图书情报技术, 2007(5): 69-72. (Wang Lu, Zhu Donghua, Ren Zhijun. A Study on Extraction Method of Term's Attributes [J]. New Technology of Library and Information Service, 2007(5): 69-72.)
[20] Wikipedia: Argument(linguistics)-Notes 1[EB/OL]. [2014-10-29]. http://en.wikipedia.org/wiki/Argument_%28linguistics% 29#cite_note-1.
[21] Wikipedia: Semantic Role Labeling [EB/OL]. [2014-11-04]. http://en.wikipedia.org/wiki/Semantic_role_labeling.
[22] Baker C F, Fillmore C J, Lowe J B. The Berkeley Framenet Project [C]. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th Interna-tional Conference on Computational Linguistics. 1998: 86-90.
[23] Palmer M, Gildea D, Kingsbury P. The Proposition Bank: An Annotated Corpus of Semantic Roles [J]. Computational Linguistics, 2005, 31(1): 71-106.
[24] Meyers A, Reeves R, Macleod C, et al. The NomBank Project: An Interim Report [C]. In: Proceedings of the 2004 Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics Annual Meeting. 2004: 24-31.
[25] 周露阳. 论审评学术论文创新因素的指标体系[J]. 编辑学报, 2006, 18(1): 68-70. (Zhou Luyang. Index System for Identifying Innovation Factors in Academic Papers [J]. Acta Editologica, 2006, 18(1): 68-70.)
[26] 田丽, 周润智. 谈研究生学位论文的创新性[J]. 教育科学, 1999, 2: 55-57. (Tian Li, Zhou Runzhi. Research of Innovation of Postgraduates Dissertations [J]. Education Science, 1999, 2: 55-57.)
[27] National Cancer Institute Thesaurus [EB/OL]. [2014-04-28]. http://ncit.nci.nih.gov/.
[28] Semantic Knowledge Representation Project [EB/OL]. [2014-02-03]. http://skr3.nlm.nih.gov/.
[29] Choi J D, Palmer M. Transition-based Semantic Role Labeling Using Predicate Argument Clustering [C]. In: Proceedings of the 2011 ACL Workshop on Relational Models of Semantics. 2011: 37-45.
[30] The Stanford Natural Language Processing Group [EB/OL]. [2013-09-24]. http://nlp.Stanford.edu.