Extracting Semantic Knowledge from Plant Species Diversity Collections
Liu Jianhua1,2(), Wang Ying1, Zhang Zhixiong1, Li Chuanxi3
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China 3China Great Wall Asset Management Co., Ltd, Beijing 100045, China
[Objective]This paper aims to extract semantic knowledge from the biodiversity studies. [Methods] We proposed a new knowledge extraction framework focusing on species. It included various entities as well as the relationship among them. The new method was then examined with various specialized databases. [Results] The species-oriented knowledge extraction framework, could successfully retrieve semantic information from the target entities and the relations among them. This method expanded the scope of knowledge extraction practice in the biodiversity field. [Limitations] The recall and precision ratio of the new method was effected by the dictionaries and rules. More studies are needed to examine the semantic relationship among the named entities beyond co-occurrence, hierarchical and simple syntactic relations. [Conclusions] The proposed method expands the contents and methods of knowledge extraction in biodiversity research. It supports the semantic information retrieval and computation.
Thessen A E, Cui H, Mozzherin D.Applications of Natural Language Processing in Biodiversity Science[J]. Advances in Bioinformatics, 2012. DOI: 10.1155/2012/391574.
doi: 10.1155/2012/391574
pmid: 22685456
[2]
Naderi N, Kappler T, Baker C J, et al.OrganismTagger: Detection, Normalization and Grounding of Organism Entities in Biomedical Documents[J]. Bioinformatics, 2011, 27(19): 2721-2729.
doi: 10.1093/bioinformatics/btr452
[3]
Species [EB/OL]. [2016-04-12]. .
[4]
Gerner M, Nenadic G, Bergman C M.LINNAEUS: A Species Name Identification System for Biomedical Literature[J]. BMC Bioinformatics, 2010. DOI: 10.1186/1471-2105-11-85.
doi: 10.1186/1471-2105-11-85
pmid: 20149233
[5]
The NCBI Taxonomy Homepage [EB/OL]. [2016-04-12]. .
[6]
Page R D M. BioNames: Linking Taxonomy, Texts, and Trees [OL]. .
[7]
Species 2000 [EB/OL]. [2016-04-12]. .
[8]
Akella L M, Norton C N, Miller H.NetiNeti: D1iscovery of Scientific Names from Text Using Machine Learning Methods[J]. BMC Bioinformatics, 2012. DOI: 10.1186/1471- 2105-13-211.
doi: 10.1080/00207160.2012.742189
[9]
The OrganismTagger System [EB/OL]. [2016-04-12]. .
[10]
Koning D, Sarlar I N, Moritz T.Taxongrab: Extracting Taxonomic Names from Text[J]. Biodiversity Informatics, 2005, 2: 79-82.
doi: 10.17161/bi.v2i0.17
[11]
Taylor A.Extracting Knowledge from Biological Descriptions[C]//Proceedings of the 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. 1995: 114-119.
[12]
Tang X, Heidorn P B.Using Automatically Extracted Information in Species Page Retrieval[C]//Proceedings of TDWG 2007. 2007.
[13]
Cui H.CharaParser for Fine-grained Semantic Annotation of Organism Morphological Descriptions[J]. Journal of the Society for Information Science and Technology, 2012, 63(4): 738-754.
doi: 10.1002/asi.22618
(Duan Yufeng, Huang Sisi.Information Extraction from Chinese Plant Species Diversity Description Text[J]. New Technology of Library and Information Service, 2016(1): 87-96.)
[15]
Li C, Liakata M, Rebholz-Schuhmann D.Biological Network Extraction from Scientific Literature: State of the Art and Challenges[J]. Briefings in Bioinformatics, 2013. DOI: 10.1093/bib/bbt006.
doi: 10.1093/bib/bbt006
pmid: 23434632
[16]
Skusa A, Rüegg A, Köhler J.Extraction of Biological Interaction Networks from Scientific Literature[J]. Briefings in Bioinformatics, 2005, 6(3): 263-276.
(Bai Guangzu, He Yuanbiao, Ma Jianxia, et al. Application of Machine Learning with Limited Corpus to Identify Structure of Scientific Abstracts Automatically, 2014 (7-8): 34-40.)
(Xu Zheping, Cui Jinzhong, Qin Haining, et al.On the Architecture of Biodiversity e-Science Infrastructure in China[J]. Biodiversity Science, 2010, 18(5): 480-488.)
[19]
Jiang W, Guan Y, Wang X L.Improving Feature Extraction in Named Entity Recognition Based on Maximum Entropy Model[C]//Proceedings of the 5th International Conference on Machine Learning and Cybernetics. 2006: 2630-2635.
[20]
De Marneffe M-C, Manning C D. Stanford Typed Dependencies Manual [OL]. .
[21]
Hearst M A.Automatic Acquisition of Hyponyms from Large Text Corpora[C]// Proceedings of the 14th International Conference on Computational Linguistics, 1992.