Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF
Ma Jianxia(),Yuan Hui,Jiang Xiang
The Northwest Institute of Eco-Environment and Resources, Library and Information Center, Chinese Academy of Sciences, Lanzhou 730000, China Department of Library, Information and Archives Management, University of Chinese Academy of Sciences,Beijing 100190, China
[Objective] This study tries to extract named entities from the text, such as fragile ecological governance technology, implementation site, and implementation time, etc.[Methods] We combined the Bi-LSTM+CRF and feature-based named entity knowledge base to automatically extract needed data from CNKI documents.[Results] For the extraction of entities on ecological governance technology, the P, R and F1 values were 74.34%, 64.04% and 68.81%, respectively. Compared to the classic CRF method, our new model improves the P and F1 values by 9.41% and 4.26%, while the R value was basically the same.[Limitations] The accuracy of Chinese word segmentation tools may affect the performance of our model. More research is needed to study the relationship among entities.[Conclusions] The proposed model could be used for resource and environment information analysis based on fine-grained contents.
马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 78-88.
( Zhen Lin, Wang Jijun, Jiang Zhide , et al. The Methodology for Assessing Ecological Restoration Technologies and Evaluation of Global Ecosystem Rehabilitation Technologies[J]. Acta Ecologica Sinica, 2016,36(22):7152-7157.)
[2]
国家发展和改革委员会. 全国主体功能区规划[M]. 北京: 人民出版社, 2015.
[2]
(National Development and Reform Commission. Planning of Major Function Regionalization[M]. Beijing: People’s Publishing House, 2015.)
[3]
Habibi M, Weber L, Neves M , et al. Deep Learning with Word Embeddings Improves Biomedical Named Entity Recognition[J]. Bioinformatics, 2017,33(14):37-48.
[4]
Wang X, Zhang Y, Ren X , et al. Cross-Type Biomedical Named Entity Recognition with Deep Multi-Task Learning[J]. Bioinformatics, 2018,35(10):1745-1752.
[5]
Yoon W, So C H, Lee J , et al. CollaboNet: Collaboration of Deep Neural Networks for Biomedical Named Entity Recognition[J]. BMC Bioinformatics, 2019, 20(10): Article No. 249.
[6]
Huang Z, Xu W, Yu K . Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508. 01991.
[7]
Strubell E, Verga P, Belanger D , et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions[OL]. arXiv Preprint, arXiv:1702.02098.
( Xu Fei, Ye Wenhao, Song Yinghua . Part-of-Speech Automated Annotation of Food Safety Events Based on BiLSTM-CRF[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(12):1204-1211.)
[9]
Bhasuran B, Natarajan J . Automatic Extraction of Gene-disease Associations from Literature Using Joint Ensemble Learning[J]. PLoS One, 2018,13(7):e0200699.
[10]
Wiese G, Weissenborn D, Neves M . Neural Domain Adaptation for Biomedical Question Answering[OL]. arXiv Preprint, arXiv:1706.03610.
[11]
Le Cun Y, Bengio Y, Hinton G . Deep Learning[J]. Nature, 2015,521(7553):436-444.
[12]
Wang X, Zhang Y, Ren X , et al. Cross-Type Biomedical Named Entity Recognition with Deep Multi-Task Learning[J]. Bioinformatics, 2018,35(10):1745-1752.
[13]
Hu K, Luo Q, Qi K , et al. Understanding the Topic Evolution of Scientific Literatures like an Evolving City: Using Google Word2Vec Model and Spatial Autocorrelation Analysis[J]. Information Processing & Management, 2019,56(4):1185-1203.
[14]
Wang C, Ma X, Chen J , et al. Information Extraction and Knowledge Graph Construction from Geoscience Literature[J]. Computers and Geosciences, 2018,112:112-120.
[15]
Peters S E, McClennen M . The Paleobiology Database Application Programming Interface[J]. Paleobiology, 2016,42(1):1-7.
[16]
Peters S E, Zhang C, Livny M , et al. A Machine Reading System for Assembling Synthetic Paleontological Databases[J]. PLoS One, 2014,9(12):e113523.
[17]
Holden E J, Liu W, Horrocks T , et al. GeoDocA - Fast Analysis of Geological Content in Mineral Exploration Reports: A Text Mining Approach[J].Ore Geology Reviews, 2019, 111(8):Article 102919.
[18]
Qiu Q, Xie Z, Wu L , et al. Geoscience Keyphrase Extraction Algorithm Using Enhanced Word Embedding[J]. Expert Systems with Applications, 2019,125(1):157-169.
[19]
Mikolov T, Sutskever I, Chen K , et al. Distributed Representations of Words and Phrases and Their Compositionality [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013,26:3111-3119.
( Lai Siwei . Word and Document Embeddings based on Neural Network Approaches[D]. Beijing: Institute of Automation, Chinese Academy of Sciences, 2016.)
[21]
Peters M E, Neumann M, Iyyer M , et al. Deep Contextualized Word Representations[OL]. rXiv Preprint, arXiv: 1802.05365.
[22]
Devlin J, Chang M W, Lee K , et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.