|
|
Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF |
Ma Jianxia( ),Yuan Hui,Jiang Xiang |
The Northwest Institute of Eco-Environment and Resources, Library and Information Center, Chinese Academy of Sciences, Lanzhou 730000, China Department of Library, Information and Archives Management, University of Chinese Academy of Sciences,Beijing 100190, China |
|
|
Abstract [Objective] This study tries to extract named entities from the text, such as fragile ecological governance technology, implementation site, and implementation time, etc.[Methods] We combined the Bi-LSTM+CRF and feature-based named entity knowledge base to automatically extract needed data from CNKI documents.[Results] For the extraction of entities on ecological governance technology, the P, R and F1 values were 74.34%, 64.04% and 68.81%, respectively. Compared to the classic CRF method, our new model improves the P and F1 values by 9.41% and 4.26%, while the R value was basically the same.[Limitations] The accuracy of Chinese word segmentation tools may affect the performance of our model. More research is needed to study the relationship among entities.[Conclusions] The proposed model could be used for resource and environment information analysis based on fine-grained contents.
|
Received: 08 January 2019
Published: 26 April 2020
|
|
Corresponding Authors:
Ma Jianxia
E-mail: majx@lzb.ac.cn
|
[1] |
甄霖, 王继军, 姜志德 , 等. 生态技术评价方法及全球生态治理技术研究[J]. 生态学报, 2016,36(22):7152-7157.
|
[1] |
( Zhen Lin, Wang Jijun, Jiang Zhide , et al. The Methodology for Assessing Ecological Restoration Technologies and Evaluation of Global Ecosystem Rehabilitation Technologies[J]. Acta Ecologica Sinica, 2016,36(22):7152-7157.)
|
[2] |
国家发展和改革委员会. 全国主体功能区规划[M]. 北京: 人民出版社, 2015.
|
[2] |
(National Development and Reform Commission. Planning of Major Function Regionalization[M]. Beijing: People’s Publishing House, 2015.)
|
[3] |
Habibi M, Weber L, Neves M , et al. Deep Learning with Word Embeddings Improves Biomedical Named Entity Recognition[J]. Bioinformatics, 2017,33(14):37-48.
|
[4] |
Wang X, Zhang Y, Ren X , et al. Cross-Type Biomedical Named Entity Recognition with Deep Multi-Task Learning[J]. Bioinformatics, 2018,35(10):1745-1752.
|
[5] |
Yoon W, So C H, Lee J , et al. CollaboNet: Collaboration of Deep Neural Networks for Biomedical Named Entity Recognition[J]. BMC Bioinformatics, 2019, 20(10): Article No. 249.
|
[6] |
Huang Z, Xu W, Yu K . Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508. 01991.
|
[7] |
Strubell E, Verga P, Belanger D , et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions[OL]. arXiv Preprint, arXiv:1702.02098.
|
[8] |
徐飞, 叶文豪, 宋英华 . 基于BiLSTM-CRF模型的食品安全事件词性自动标注研究[J]. 情报学报, 2018,37(12):1204-1211.
|
[8] |
( Xu Fei, Ye Wenhao, Song Yinghua . Part-of-Speech Automated Annotation of Food Safety Events Based on BiLSTM-CRF[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(12):1204-1211.)
|
[9] |
Bhasuran B, Natarajan J . Automatic Extraction of Gene-disease Associations from Literature Using Joint Ensemble Learning[J]. PLoS One, 2018,13(7):e0200699.
|
[10] |
Wiese G, Weissenborn D, Neves M . Neural Domain Adaptation for Biomedical Question Answering[OL]. arXiv Preprint, arXiv:1706.03610.
|
[11] |
Le Cun Y, Bengio Y, Hinton G . Deep Learning[J]. Nature, 2015,521(7553):436-444.
|
[12] |
Wang X, Zhang Y, Ren X , et al. Cross-Type Biomedical Named Entity Recognition with Deep Multi-Task Learning[J]. Bioinformatics, 2018,35(10):1745-1752.
|
[13] |
Hu K, Luo Q, Qi K , et al. Understanding the Topic Evolution of Scientific Literatures like an Evolving City: Using Google Word2Vec Model and Spatial Autocorrelation Analysis[J]. Information Processing & Management, 2019,56(4):1185-1203.
|
[14] |
Wang C, Ma X, Chen J , et al. Information Extraction and Knowledge Graph Construction from Geoscience Literature[J]. Computers and Geosciences, 2018,112:112-120.
|
[15] |
Peters S E, McClennen M . The Paleobiology Database Application Programming Interface[J]. Paleobiology, 2016,42(1):1-7.
|
[16] |
Peters S E, Zhang C, Livny M , et al. A Machine Reading System for Assembling Synthetic Paleontological Databases[J]. PLoS One, 2014,9(12):e113523.
|
[17] |
Holden E J, Liu W, Horrocks T , et al. GeoDocA - Fast Analysis of Geological Content in Mineral Exploration Reports: A Text Mining Approach[J].Ore Geology Reviews, 2019, 111(8):Article 102919.
|
[18] |
Qiu Q, Xie Z, Wu L , et al. Geoscience Keyphrase Extraction Algorithm Using Enhanced Word Embedding[J]. Expert Systems with Applications, 2019,125(1):157-169.
|
[19] |
Mikolov T, Sutskever I, Chen K , et al. Distributed Representations of Words and Phrases and Their Compositionality [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013,26:3111-3119.
|
[20] |
来斯惟 . 基于神经网络的词和文档语义向量表示方法研究[D]. 北京: 中国科学院自动化研究所, 2016.
|
[20] |
( Lai Siwei . Word and Document Embeddings based on Neural Network Approaches[D]. Beijing: Institute of Automation, Chinese Academy of Sciences, 2016.)
|
[21] |
Peters M E, Neumann M, Iyyer M , et al. Deep Contextualized Word Representations[OL]. rXiv Preprint, arXiv: 1802.05365.
|
[22] |
Devlin J, Chang M W, Lee K , et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|