|
|
Information Extraction from Chinese Plant Species Diversity Description Text |
Yufeng Duan(),Sisi Huang |
Business School, East China Normal University, Shanghai 200241, China |
|
|
Abstract [Objective] To extract information from Chinese plant species diversity description text. [Methods] Take the plant species diversity domain ontology as the foundation, and adopt the strategy of stepwise selection and annotation on paragraph, sentence and concept. [Results] A sample including 4 734 information points is used to test. The value of extraction accuracy rate, recall rate and F-measure achieves 0.86, 0.85 and 0.85 respectively. [Limitations] In order to solve the problems on extracting information from description text, the rule set should be improved in the future. [Conclusions] The research scheme can fulfill the information extraction from Chinese plant species diversity description text effectively.
|
Received: 14 September 2015
Published: 04 February 2016
|
[1] | BHL. Biodiversity Heritage Library [EB/OL]. [2015-09-27]. . | [2] | Thessen A E, Cui H, Mozzherin D. Applications of Natural Language Processing in Biodiversity Science [J]. Advances in Bioinformatics, 2012: Article ID 391574. doi: 10.1155/2012/ 391574. | [3] | Vanel J M. Worldwide Botanical Knowledge Base [EB/OL]. [2011-10-11]. . | [4] | 郑家恒, 菅小艳. 农作物信息抽取系统的设计与实现[J]. 计算机工程, 2006, 32(7): 197-198, 220. | [4] | (Zheng Jiaheng, Jian Xiaoyan.Design and Realization of the System of Farm Crop Information Extraction[J]. Computer Engineering, 2006, 32(7): 197-198, 220.) | [5] | Cui H, Heidorn P.The Reusability of Induced Knowledge for Automatic Semantic Markup of Taxonomic Descriptions[J]. Journal of the American Society for Information Science and Technology. 2007, 58(1): 133-149. | [6] | 段宇锋, 黑珍珍, 鞠菲, 等. 基于自主学习规则的中文物种描述文本的语义标注研究[J]. 现代图书情报技术, 2012(5): 41-47. | [6] | (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al.Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules[J]. New Technology of Library and Information Service, 2012(5): 41-47.) | [7] | 段宇锋, 黑珍珍, 鞠菲, 等. 基于贝叶斯分类的中文物种描述文本的语义标注研究[J]. 情报学报, 2012, 31(8): 805-812. | [7] | (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al.Semantic Annotation of Species Description Text in Chinese Literature by Naïve Bayes Classifier[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(8): 805-812.) | [8] | 段宇锋, 朱雯晶, 陈巧, 等. 朴素贝叶斯算法与Bootstrapping方法相结合的中文物种描述文本语义标注研究[J]. 现代图书情报技术, 2014(5): 83-89. | [8] | (Duan Yufeng, Zhu Wenjing, Chen Qiao, et al.Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method[J]. New Technology of Library and Information Service, 2014(5): 83-89.) | [9] | Taylor A.Extracting Knowledge from Biological Descriptions [C]. In: Proceedings of the 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. 1995: 114-119. | [10] | Wood M M, Lydon S J, Tablan V, et al.Using Parallel Texts to Improve Recall in IE [C]. In: Proceedings of Recent Advances in Natural Language Processing (RANLP’03). 2003: 505-512. | [11] | Tang X, Heidorn P B. Using Automatically Extracted Information in Species Page Retrieval [OL]. [2011-08-10]. . | [12] | Soderland S.Learning Information Extraction Rules for Semi-Structured and Free Text[J]. Machine Learning, 1999, 34(1-3): 233-272. | [13] | Abascal R, Sanchez J A.X-tract: Structure Extraction from Botanical Textual Descriptions [C]. In: Proceeding of the String Processing & Information Retrieval Symposium & International Workshop on Groupware.1999: 2-7. | [14] | Diederich J, Frotuner R, Milton J. Computer-assisted Data Extraction from the Taxonomical Literature [OL]. [2011- 08-15]. . | [15] | Cui H.CharaParser for Fine-grained Semantic Annotation of Organism Morphological Descriptions[J]. Journal of the American Society for Information Science and Technology, 2012, 63(4): 738-754. | [16] | Cui H, Singaram S, Janning A.Combine Unsupervised Learning and Heuristic Rules to Annotate Morphological Characters[J]. Proceedings of the American Society for Information Science and Technology, 2011, 48(1): 1-9. | [17] | 沙丽华. 面向领域文档的语义标注方法研究[D]. 长春: 吉林大学, 2009. | [17] | (Sha Lihua.Research on Semantic Annotation for Domain Documents [D]. Changchun: Jilin University, 2009.) | [18] | 石静. 基于本体的植物信息抽取与分析研究[D]. 杨凌: 西北农林科技大学, 2010. | [18] | (Shi Jing.Information Extraction and Analysis Based on Plant Ontology [D]. Yangling: Northwest Agriculture and Foresty University, 2010.) | [19] | Gruber T R.Toward Principles for the Design of Ontologies Used for Knowledge Sharing[J]. International Journal of Human-Computer Studies, 1995, 43(5-6): 907-928. | [20] | 向阳, 王敏, 马强. 基于Jena的本体构建方法研究[J]. 计算机工程, 2007, 33(14): 59-61. | [20] | (Xiang Yang, Wang Min, Ma Qiang.Research on Jena-based Ontology Building[J]. Computer Engineering, 2007, 33(14): 59-61.) | [21] | 段宇锋, 朱雯晶, 陈巧, 等. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015(4): 41-49. | [21] | (Duan Yufeng, Zhu Wenjing, Chen Qiao, et al.The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. New Technology of Library and Information Service, 2015(4): 41-49.) | [22] | 中国植物志编辑委员会. 中国植物志[DB/OL]. [2007-09-28]. . | [22] | (Flora of China Editorial Committee. Flora of China [DB/OL]. [2007-09-28]. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|