Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (1): 37-46    DOI: 10.11925/infotech.2096-3467.2017.01.05
Orginal Article Current Issue | Archive | Adv Search |
Extracting Semantic Knowledge from Plant Species Diversity Collections
Liu Jianhua1,2(), Wang Ying1, Zhang Zhixiong1, Li Chuanxi3
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
3China Great Wall Asset Management Co., Ltd, Beijing 100045, China
Download: PDF (4615 KB)   HTML ( 49
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective]This paper aims to extract semantic knowledge from the biodiversity studies. [Methods] We proposed a new knowledge extraction framework focusing on species. It included various entities as well as the relationship among them. The new method was then examined with various specialized databases. [Results] The species-oriented knowledge extraction framework, could successfully retrieve semantic information from the target entities and the relations among them. This method expanded the scope of knowledge extraction practice in the biodiversity field. [Limitations] The recall and precision ratio of the new method was effected by the dictionaries and rules. More studies are needed to examine the semantic relationship among the named entities beyond co-occurrence, hierarchical and simple syntactic relations. [Conclusions] The proposed method expands the contents and methods of knowledge extraction in biodiversity research. It supports the semantic information retrieval and computation.

Key wordsPlant Species Diversity      Plant Species      Knowledge Extraction      Relation Extraction     
Received: 14 April 2016      Published: 22 February 2017
ZTFLH:  G250  

Cite this article:

Liu Jianhua,Wang Ying,Zhang Zhixiong,Li Chuanxi. Extracting Semantic Knowledge from Plant Species Diversity Collections. Data Analysis and Knowledge Discovery, 2017, 1(1): 37-46.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.01.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I1/37

实体类型 数量 实体类型 数量
物种-属(Genus) 115 698 植物茎(plantStemForm) 1 983
物种-科(family) 25 332 省(province) 1 845
习性(habit) 13 510 花期(plantFlowerTime) 1 773
花颜色(plantFlowerColor) 12 649 植物根类型(plantRootType) 1 725
实体类型 数量 实体类型 数量
生态环境(cultivatedHabitat) 12 277 化合物(chemicalCompound) 1 637
植物茎类型(plantStemType) 10 306 授粉系统(plantPollinationSystem) 1 509
物种-种(species) 9 478 基因(gene) 1 270
寿命(longevity) 8 233 国家(country) 1 227
植物果实类型(plantFruitType) 6 489 物种-目(order) 1 088
植物雌蕊融合(plantGynoeciumCarpelFusion) 4 875 花对称性(plantFlowerSymmetry) 1 043
植物雄蕊排列(planAndroeciumStamenArrangement) 4 793 化学元素(ChemicalElement) 736
植物叶规格(plantLeafArrangement) 3 908 实验材料与工具(Tool) 722
植物叶形状(plantLeafShape) 3 609 物理环境(PhysicalEnvironment) 717
植物叶缘(plantLeafMargin) 3 268 植物花被(plantFlowerPerianthForm) 621
花序形态(plantInflorescenceForm) 3 268 植物叶结构(plantLeafStructure) 510
植物叶面(plantLeafSurface) 2 859 器官(Organ) 780
花结构数量(plantNumbersOfFloralStructure) 2 815 机构(Organization) 323
植物叶部(plantLeafDivision) 2 615 培养环境(culturedHabitat) 264
无确定类型的主题词(Term) 2 482 物种-门(phylum) 252
光合作用(photosynthesis) 2 282 植物叶(plantLeaf) 244
植物叶性(plantFlowerSexuality) 2 222 物种-纲(class) 153
植物雄蕊类型(plantAndroeciumStamenType) 2 152 植物根结构(plantRootStructure) 127
[1] Thessen A E, Cui H, Mozzherin D.Applications of Natural Language Processing in Biodiversity Science[J]. Advances in Bioinformatics, 2012. DOI: 10.1155/2012/391574.
doi: 10.1155/2012/391574 pmid: 22685456
[2] Naderi N, Kappler T, Baker C J, et al.OrganismTagger: Detection, Normalization and Grounding of Organism Entities in Biomedical Documents[J]. Bioinformatics, 2011, 27(19): 2721-2729.
doi: 10.1093/bioinformatics/btr452
[3] Species [EB/OL]. [2016-04-12]. .
[4] Gerner M, Nenadic G, Bergman C M.LINNAEUS: A Species Name Identification System for Biomedical Literature[J]. BMC Bioinformatics, 2010. DOI: 10.1186/1471-2105-11-85.
doi: 10.1186/1471-2105-11-85 pmid: 20149233
[5] The NCBI Taxonomy Homepage [EB/OL]. [2016-04-12]. .
[6] Page R D M. BioNames: Linking Taxonomy, Texts, and Trees [OL]. .
[7] Species 2000 [EB/OL]. [2016-04-12]. .
[8] Akella L M, Norton C N, Miller H.NetiNeti: D1iscovery of Scientific Names from Text Using Machine Learning Methods[J]. BMC Bioinformatics, 2012. DOI: 10.1186/1471- 2105-13-211.
doi: 10.1080/00207160.2012.742189
[9] The OrganismTagger System [EB/OL]. [2016-04-12]. .
[10] Koning D, Sarlar I N, Moritz T.Taxongrab: Extracting Taxonomic Names from Text[J]. Biodiversity Informatics, 2005, 2: 79-82.
doi: 10.17161/bi.v2i0.17
[11] Taylor A.Extracting Knowledge from Biological Descriptions[C]//Proceedings of the 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. 1995: 114-119.
[12] Tang X, Heidorn P B.Using Automatically Extracted Information in Species Page Retrieval[C]//Proceedings of TDWG 2007. 2007.
[13] Cui H.CharaParser for Fine-grained Semantic Annotation of Organism Morphological Descriptions[J]. Journal of the Society for Information Science and Technology, 2012, 63(4): 738-754.
doi: 10.1002/asi.22618
[14] 段宇锋, 黄思思. 中文植物物种多样性描述文本的信息抽取研究[J]. 现代图书情报技术, 2016(1): 87-96.
[14] (Duan Yufeng, Huang Sisi.Information Extraction from Chinese Plant Species Diversity Description Text[J]. New Technology of Library and Information Service, 2016(1): 87-96.)
[15] Li C, Liakata M, Rebholz-Schuhmann D.Biological Network Extraction from Scientific Literature: State of the Art and Challenges[J]. Briefings in Bioinformatics, 2013. DOI: 10.1093/bib/bbt006.
doi: 10.1093/bib/bbt006 pmid: 23434632
[16] Skusa A, Rüegg A, Köhler J.Extraction of Biological Interaction Networks from Scientific Literature[J]. Briefings in Bioinformatics, 2005, 6(3): 263-276.
[17] 白光祖, 何远标, 马建霞, 等. 利用小样本量机器学习实现学术文摘结构的自动识别[J]. 现代图书情报技术, 2014(7-8): 34-40.
[17] (Bai Guangzu, He Yuanbiao, Ma Jianxia, et al. Application of Machine Learning with Limited Corpus to Identify Structure of Scientific Abstracts Automatically, 2014 (7-8): 34-40.)
[18] 许哲平, 崔金钟, 覃海宁, 等. 中国植物物种多样性 e-Science 平台建设构想[J]. 植物物种多样性, 2010, 18(5): 480-488.
[18] (Xu Zheping, Cui Jinzhong, Qin Haining, et al.On the Architecture of Biodiversity e-Science Infrastructure in China[J]. Biodiversity Science, 2010, 18(5): 480-488.)
[19] Jiang W, Guan Y, Wang X L.Improving Feature Extraction in Named Entity Recognition Based on Maximum Entropy Model[C]//Proceedings of the 5th International Conference on Machine Learning and Cybernetics. 2006: 2630-2635.
[20] De Marneffe M-C, Manning C D. Stanford Typed Dependencies Manual [OL]. .
[21] Hearst M A.Automatic Acquisition of Hyponyms from Large Text Corpora[C]// Proceedings of the 14th International Conference on Computational Linguistics, 1992.
[1] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[2] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[3] Qin Zhang,Hongmei Guo,Zhixiong Zhang. Extracting Entity Relationship with Word Embedding Representation Features[J]. 数据分析与知识发现, 2017, 1(9): 8-15.
[4] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[5] Duan Yufeng, Huang Sisi. Research on Construction of Chinese Plant Species Diversity Domain Ontology Based on BFO[J]. 现代图书情报技术, 2015, 31(12): 72-79.
[6] Hua Bolin. Extracting Information Method Term from Chinese Academic Literature[J]. 现代图书情报技术, 2013, (6): 68-75.
[7] Huang Xun, You Hongliang, Yu Yang. A Review of Relation Extraction[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[8] Wang Xiuyan, Cui Lei. Overview of Semantic Relations Extraction Between Biomedical Entities by Key Verbs[J]. 现代图书情报技术, 2011, 27(9): 21-27.
[9] Liu Jianhua ,Zhang Zhixiong. Relation Extraction Based on Stanford Parser[J]. 现代图书情报技术, 2009, 25(5): 1-5.
[10] Miao Chen,Xiaozhong Liu,Jian Qin. Semantic Relation Extraction from Socially-generated Tags:A Methodology for Metadata Generation[J]. 现代图书情报技术, 2009, 3(3): 38-45.
[11] Jiang Caihong,Qiao Xiaodong ,Zhu Lijun. Ontology-based Patent Abstracts' Knowledge Extraction[J]. 现代图书情报技术, 2009, 3(2): 23-28.
[12] Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Xu Jian,Hong Na,Zhao Qi. Analysis of State-of-the-Art Knowledge Extraction Technologies[J]. 现代图书情报技术, 2008, 24(8): 2-11.
[13] Xu Jian,Zhang Zhixiong,Wu Zhenxin. Review on Techniques of Entity Relation Extraction[J]. 现代图书情报技术, 2008, 24(8): 18-23.
[14] Zhou Ning,Wang Miao. Research on Special Domain Oriented Knowledge Management Model Based on MUDs[J]. 现代图书情报技术, 2008, 24(5): 33-38.
[15] Hua Bolin. Stop-word Processing Technique in Knowledge Extraction[J]. 现代图书情报技术, 2007, 2(8): 48-51.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn