|
|
Extracting Semantic Knowledge from Plant Species Diversity Collections |
Liu Jianhua1,2(), Wang Ying1, Zhang Zhixiong1, Li Chuanxi3 |
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China 3China Great Wall Asset Management Co., Ltd, Beijing 100045, China |
|
|
Abstract [Objective]This paper aims to extract semantic knowledge from the biodiversity studies. [Methods] We proposed a new knowledge extraction framework focusing on species. It included various entities as well as the relationship among them. The new method was then examined with various specialized databases. [Results] The species-oriented knowledge extraction framework, could successfully retrieve semantic information from the target entities and the relations among them. This method expanded the scope of knowledge extraction practice in the biodiversity field. [Limitations] The recall and precision ratio of the new method was effected by the dictionaries and rules. More studies are needed to examine the semantic relationship among the named entities beyond co-occurrence, hierarchical and simple syntactic relations. [Conclusions] The proposed method expands the contents and methods of knowledge extraction in biodiversity research. It supports the semantic information retrieval and computation.
|
Received: 14 April 2016
Published: 22 February 2017
|
|
[1] |
Thessen A E, Cui H, Mozzherin D.Applications of Natural Language Processing in Biodiversity Science[J]. Advances in Bioinformatics, 2012. DOI: 10.1155/2012/391574.
doi: 10.1155/2012/391574
pmid: 22685456
|
[2] |
Naderi N, Kappler T, Baker C J, et al.OrganismTagger: Detection, Normalization and Grounding of Organism Entities in Biomedical Documents[J]. Bioinformatics, 2011, 27(19): 2721-2729.
doi: 10.1093/bioinformatics/btr452
|
[3] |
Species [EB/OL]. [2016-04-12]. .
|
[4] |
Gerner M, Nenadic G, Bergman C M.LINNAEUS: A Species Name Identification System for Biomedical Literature[J]. BMC Bioinformatics, 2010. DOI: 10.1186/1471-2105-11-85.
doi: 10.1186/1471-2105-11-85
pmid: 20149233
|
[5] |
The NCBI Taxonomy Homepage [EB/OL]. [2016-04-12]. .
|
[6] |
Page R D M. BioNames: Linking Taxonomy, Texts, and Trees [OL]. .
|
[7] |
Species 2000 [EB/OL]. [2016-04-12]. .
|
[8] |
Akella L M, Norton C N, Miller H.NetiNeti: D1iscovery of Scientific Names from Text Using Machine Learning Methods[J]. BMC Bioinformatics, 2012. DOI: 10.1186/1471- 2105-13-211.
doi: 10.1080/00207160.2012.742189
|
[9] |
The OrganismTagger System [EB/OL]. [2016-04-12]. .
|
[10] |
Koning D, Sarlar I N, Moritz T.Taxongrab: Extracting Taxonomic Names from Text[J]. Biodiversity Informatics, 2005, 2: 79-82.
doi: 10.17161/bi.v2i0.17
|
[11] |
Taylor A.Extracting Knowledge from Biological Descriptions[C]//Proceedings of the 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. 1995: 114-119.
|
[12] |
Tang X, Heidorn P B.Using Automatically Extracted Information in Species Page Retrieval[C]//Proceedings of TDWG 2007. 2007.
|
[13] |
Cui H.CharaParser for Fine-grained Semantic Annotation of Organism Morphological Descriptions[J]. Journal of the Society for Information Science and Technology, 2012, 63(4): 738-754.
doi: 10.1002/asi.22618
|
[14] |
段宇锋, 黄思思. 中文植物物种多样性描述文本的信息抽取研究[J]. 现代图书情报技术, 2016(1): 87-96.
|
[14] |
(Duan Yufeng, Huang Sisi.Information Extraction from Chinese Plant Species Diversity Description Text[J]. New Technology of Library and Information Service, 2016(1): 87-96.)
|
[15] |
Li C, Liakata M, Rebholz-Schuhmann D.Biological Network Extraction from Scientific Literature: State of the Art and Challenges[J]. Briefings in Bioinformatics, 2013. DOI: 10.1093/bib/bbt006.
doi: 10.1093/bib/bbt006
pmid: 23434632
|
[16] |
Skusa A, Rüegg A, Köhler J.Extraction of Biological Interaction Networks from Scientific Literature[J]. Briefings in Bioinformatics, 2005, 6(3): 263-276.
|
[17] |
白光祖, 何远标, 马建霞, 等. 利用小样本量机器学习实现学术文摘结构的自动识别[J]. 现代图书情报技术, 2014(7-8): 34-40.
|
[17] |
(Bai Guangzu, He Yuanbiao, Ma Jianxia, et al. Application of Machine Learning with Limited Corpus to Identify Structure of Scientific Abstracts Automatically, 2014 (7-8): 34-40.)
|
[18] |
许哲平, 崔金钟, 覃海宁, 等. 中国植物物种多样性 e-Science 平台建设构想[J]. 植物物种多样性, 2010, 18(5): 480-488.
|
[18] |
(Xu Zheping, Cui Jinzhong, Qin Haining, et al.On the Architecture of Biodiversity e-Science Infrastructure in China[J]. Biodiversity Science, 2010, 18(5): 480-488.)
|
[19] |
Jiang W, Guan Y, Wang X L.Improving Feature Extraction in Named Entity Recognition Based on Maximum Entropy Model[C]//Proceedings of the 5th International Conference on Machine Learning and Cybernetics. 2006: 2630-2635.
|
[20] |
De Marneffe M-C, Manning C D. Stanford Typed Dependencies Manual [OL]. .
|
[21] |
Hearst M A.Automatic Acquisition of Hyponyms from Large Text Corpora[C]// Proceedings of the 14th International Conference on Computational Linguistics, 1992.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|