Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (1): 87-96    DOI: 10.11925/infotech.1003-3513.2016.01.13
Orginal Article Current Issue | Archive | Adv Search |
Information Extraction from Chinese Plant Species Diversity Description Text
Yufeng Duan(),Sisi Huang
Business School, East China Normal University, Shanghai 200241, China
Export: BibTeX | EndNote (RIS)      

[Objective] To extract information from Chinese plant species diversity description text. [Methods] Take the plant species diversity domain ontology as the foundation, and adopt the strategy of stepwise selection and annotation on paragraph, sentence and concept. [Results] A sample including 4 734 information points is used to test. The value of extraction accuracy rate, recall rate and F-measure achieves 0.86, 0.85 and 0.85 respectively. [Limitations] In order to solve the problems on extracting information from description text, the rule set should be improved in the future. [Conclusions] The research scheme can fulfill the information extraction from Chinese plant species diversity description text effectively.

Key wordsInformation extraction      Plant species diversity description text      Chinese information processing      Ontology     
Received: 14 September 2015      Published: 04 February 2016

Cite this article:

Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text. New Technology of Library and Information Service, 2016, 32(1): 87-96.

URL:     OR

[1] BHL. Biodiversity Heritage Library [EB/OL]. [2015-09-27]. .
[2] Thessen A E, Cui H, Mozzherin D. Applications of Natural Language Processing in Biodiversity Science [J]. Advances in Bioinformatics, 2012: Article ID 391574. doi: 10.1155/2012/ 391574.
[3] Vanel J M. Worldwide Botanical Knowledge Base [EB/OL]. [2011-10-11]. .
[4] 郑家恒, 菅小艳. 农作物信息抽取系统的设计与实现[J]. 计算机工程, 2006, 32(7): 197-198, 220.
[4] (Zheng Jiaheng, Jian Xiaoyan.Design and Realization of the System of Farm Crop Information Extraction[J]. Computer Engineering, 2006, 32(7): 197-198, 220.)
[5] Cui H, Heidorn P.The Reusability of Induced Knowledge for Automatic Semantic Markup of Taxonomic Descriptions[J]. Journal of the American Society for Information Science and Technology. 2007, 58(1): 133-149.
[6] 段宇锋, 黑珍珍, 鞠菲, 等. 基于自主学习规则的中文物种描述文本的语义标注研究[J]. 现代图书情报技术, 2012(5): 41-47.
[6] (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al.Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules[J]. New Technology of Library and Information Service, 2012(5): 41-47.)
[7] 段宇锋, 黑珍珍, 鞠菲, 等. 基于贝叶斯分类的中文物种描述文本的语义标注研究[J]. 情报学报, 2012, 31(8): 805-812.
[7] (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al.Semantic Annotation of Species Description Text in Chinese Literature by Naïve Bayes Classifier[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(8): 805-812.)
[8] 段宇锋, 朱雯晶, 陈巧, 等. 朴素贝叶斯算法与Bootstrapping方法相结合的中文物种描述文本语义标注研究[J]. 现代图书情报技术, 2014(5): 83-89.
[8] (Duan Yufeng, Zhu Wenjing, Chen Qiao, et al.Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method[J]. New Technology of Library and Information Service, 2014(5): 83-89.)
[9] Taylor A.Extracting Knowledge from Biological Descriptions [C]. In: Proceedings of the 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. 1995: 114-119.
[10] Wood M M, Lydon S J, Tablan V, et al.Using Parallel Texts to Improve Recall in IE [C]. In: Proceedings of Recent Advances in Natural Language Processing (RANLP’03). 2003: 505-512.
[11] Tang X, Heidorn P B. Using Automatically Extracted Information in Species Page Retrieval [OL]. [2011-08-10]. .
[12] Soderland S.Learning Information Extraction Rules for Semi-Structured and Free Text[J]. Machine Learning, 1999, 34(1-3): 233-272.
[13] Abascal R, Sanchez J A.X-tract: Structure Extraction from Botanical Textual Descriptions [C]. In: Proceeding of the String Processing & Information Retrieval Symposium & International Workshop on Groupware.1999: 2-7.
[14] Diederich J, Frotuner R, Milton J. Computer-assisted Data Extraction from the Taxonomical Literature [OL]. [2011- 08-15]. .
[15] Cui H.CharaParser for Fine-grained Semantic Annotation of Organism Morphological Descriptions[J]. Journal of the American Society for Information Science and Technology, 2012, 63(4): 738-754.
[16] Cui H, Singaram S, Janning A.Combine Unsupervised Learning and Heuristic Rules to Annotate Morphological Characters[J]. Proceedings of the American Society for Information Science and Technology, 2011, 48(1): 1-9.
[17] 沙丽华. 面向领域文档的语义标注方法研究[D]. 长春: 吉林大学, 2009.
[17] (Sha Lihua.Research on Semantic Annotation for Domain Documents [D]. Changchun: Jilin University, 2009.)
[18] 石静. 基于本体的植物信息抽取与分析研究[D]. 杨凌: 西北农林科技大学, 2010.
[18] (Shi Jing.Information Extraction and Analysis Based on Plant Ontology [D]. Yangling: Northwest Agriculture and Foresty University, 2010.)
[19] Gruber T R.Toward Principles for the Design of Ontologies Used for Knowledge Sharing[J]. International Journal of Human-Computer Studies, 1995, 43(5-6): 907-928.
[20] 向阳, 王敏, 马强. 基于Jena的本体构建方法研究[J]. 计算机工程, 2007, 33(14): 59-61.
[20] (Xiang Yang, Wang Min, Ma Qiang.Research on Jena-based Ontology Building[J]. Computer Engineering, 2007, 33(14): 59-61.)
[21] 段宇锋, 朱雯晶, 陈巧, 等. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015(4): 41-49.
[21] (Duan Yufeng, Zhu Wenjing, Chen Qiao, et al.The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. New Technology of Library and Information Service, 2015(4): 41-49.)
[22] 中国植物志编辑委员会. 中国植物志[DB/OL]. [2007-09-28]. .
[22] (Flora of China Editorial Committee. Flora of China [DB/OL]. [2007-09-28].
[1] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[2] Sheng Shu, Huang Qi, Yang Yang, Xie Qiwen, Qin Xinguo. Exchanging Chinese Medical Information Based on HL7 FHIR[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[3] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[4] Tao Yue,Yu Li,Zhang Runjie. Active Learning Strategies for Extracting Phrase-Level Topics from Scientific Literature[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[5] Wang Yi,Shen Zhe,Yao Yifan,Cheng Ying. Domain-Specific Event Graph Construction Methods:A Review[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[6] Shaohua Qiang,Yunlu Luo,Yupeng Li,Peng Wu. Ontology Reasoning for Financial Affairs with RBR and CBR[J]. 数据分析与知识发现, 2019, 3(8): 94-104.
[7] Shiqi Deng,Liang Hong. Constructing Domain Ontology for Intelligent Applications: Case Study of Anti Tele-Fraud[J]. 数据分析与知识发现, 2019, 3(7): 73-84.
[8] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[9] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[10] Guangshang Gao. A Survey of User Profiles Methods[J]. 数据分析与知识发现, 2019, 3(3): 25-35.
[11] Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[12] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[13] He Youshi,He Shufang. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[14] Mu Dongmei,Jin Shan,Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[15] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938