|
|
Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning |
Xu Chenfei1,2,Ye Haiying2,Bao Ping1() |
1Institution of Chinese Agricultural Civilization, Nanjing Agricultural University, Nanjing 210095, China 2Economics and Management School, Nantong University, Nantong 226019, China |
|
|
Abstract [Objective] This paper tries to automatically identify the produce aliases, related human figures, places of origin and cited books from ancient local chronicles, aiming to establish a knowledge base for traditional products. [Methods] Firstly, we chose Local Chronicle of Yunnan: Produce as the basic corpus and preprocessed its texts to carry out corpus tagging. Then, we adopted four deep learning models (Bi-RNN, Bi-LSTM, Bi-LSTM-CRF and BERT) to identify the needed entities. Finally, we compared outputs of these models. [Results] The P-value and F-value of the Bi-LSTM model were 5.54% and 3.51% higher than those of the Bi-LSTM-CRF model. The R-value of the BERT model reached 83.36%, which was the best among all models. The Bi-LSTM-CRF model yielded the best results with the entity recognition of cited books (F-value=89.71%), and the BERT model had the best performance on character entities with a F-value of 87.90%. [Limitations] Due to the linguistic characteristics of ancient local chronicles and the domain knowledge required for identifying related entities, there may be errors in tagging. [Conclusions] Deep learning could help us identify needed entities from ancient local chronicles effectively.
|
Received: 08 January 2020
Published: 05 June 2020
|
|
Corresponding Authors:
Bao Ping
E-mail: baoping@njau.edu.cn
|
[1] |
黄水清, 王东波. 古文信息处理研究的现状及趋势[J]. 图书情报工作, 2017,61(12):42-49.
|
[1] |
( Huang Shuiqing, Wang Dongbo. Review and Trend of Researches on Ancient Chinese Character Information Processing[J]. Library and Information Service, 2017,61(12):42-49.)
|
[2] |
仓修良. 方志学通论(增订本)[M]. 上海: 华东师范大学出版社, 2014.
|
[2] |
( Cang Xiuliang. General Theory of the Study of Local Chronicles (Revised Edition)[M]. Shanghai: East China Normal University Press, 2014.)
|
[3] |
包平, 李昕升, 卢勇. 方志物产史料的价值、利用与展望——以《方志物产》为中心[J]. 中国农史, 2018,37(3):117-126.
|
[3] |
( Bao Ping, Li Xinsheng, Lu Yong. The Value and Utilization and Prospect of the Historical Materials of Products in Local Chronicles——Take Products in Local Chronicles for Example[J]. Agricultural History of China, 2018,37(3):117-126.)
|
[4] |
谢韬. 基于古文学的命名实体识别的研究与实现[D]. 北京: 北京邮电大学, 2018.
|
[4] |
( Xie Tao. Research and Implementation of Named Entity Recognition Based on Ancient Literature[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
|
[5] |
王铮. 基于CRF的古籍地名自动识别研究——以《三国演义》为例[D]. 南宁: 广西民族大学, 2008.
|
[5] |
( Wang Zheng. Conditional Random Fields Based Location Name Recognition in Ancient Chinese——Take the “Romance of the Three Kingdoms” as an Example[D]. Nanning: Guangxi University for Nationalities, 2008.)
|
[6] |
肖磊. 《左传》地名研究初探[J]. 文教资料, 2009(18):204-207.
|
[6] |
( Xiao Lei. A Preliminary Study on Place Names in Zuo Zhuan[J]. Data of Culture and Education, 2009(18):204-207.)
|
[7] |
汪青青. 先秦人名识别初探[J]. 文教资料, 2009(18):202-204.
|
[7] |
( Wang Qingqing. A Preliminary Study on Name Recognition in Pre-Qin Period[J]. Data of Culture and Education, 2009(18):202-204.)
|
[8] |
黄水清, 王东波, 何琳. 基于先秦语料库的古汉语地名自动识别模型构建研究[J]. 图书情报工作, 2015,59(12):135-140.
|
[8] |
( Huang Shuiqing, Wang Dongbo, He Lin. Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J]. Library and Information Service, 2015,59(12):135-140.)
|
[9] |
叶辉, 姬东鸿. 基于多特征条件随机场的《金匮要略》症状药物信息抽取研究[J]. 中国中医药图书情报杂志, 2016,40(5):14-17.
|
[9] |
( Ye Hui, Ji Donghong. Research on Symptom and Medicine Information Abstraction of TCM Book Jin Gui Yao Lue Based on Conditional Random Field[J]. Chinese Journal of Library and Information Science for Traditional Chinese Medicine, 2016,40(5):14-17.)
|
[10] |
王东波, 高瑞卿, 沈思, 等. 面向先秦典籍的历史事件基本实体构件自动识别研究[J]. 国家图书馆学刊, 2018,27(1):65-77.
|
[10] |
( Wang Dongbo, Gao Ruiqing, Shen Si, et al. Research on Automatic Recognition of Basic Entity Component of Historic Events for Pre-Qin Classics[J]. Journal of the National Library of China, 2018,27(1):65-77.)
|
[11] |
龚德山. 命名实体识别在中药名词和方剂名词识别中的比较研究[D]. 北京:北京中医药大学, 2019.
|
[11] |
( Gong Deshan. A Comparative Study of Named Entity Recognition in Recognizing the Names of Chinese Medicine Herbs and Formulae[D]. Beijing: Beijing University of Chinese Medicine, 2019.)
|
[12] |
刘士纲. 《清实录》人名撷取自动化[D]. 台北: 台湾大学, 2012.
|
[12] |
( Liu Shigang. Automated Annotation of Person Name of the Veritable Records of the Qing Dynasty[D]. Taipei: Taiwan University, 2012.)
|
[13] |
张尚斌. 词夹子演算法在专有名词辨识上的应用——以历史文件为例[D]. 台北: 台湾大学, 2006.
|
[13] |
( Zhang Shangbin. A Word-Clip Algorithm for Named Entity Recognition——by Example of Historical Documents[D]. Taipei: Taiwan University, 2006.)
|
[14] |
衡中青. 地方志知识组织及内容挖掘研究: 以《方志物产·广东》为例[M]. 芜湖: 安徽师范大学出版社, 2012.
|
[14] |
( Heng Zhongqing. Research on Knowledge Organization & Content Mining of the Chinese Local Chronicle——Taking Local Chronicle of Guangdong: Produce as an Example[M]. Wuhu: Anhui Normal University Press, 2012.)
|
[15] |
朱锁玲. 命名实体识别在方志内容挖掘中的应用研究——以广东、福建、台湾三省《方志物产》为例[D]. 南京: 南京农业大学, 2011.
|
[15] |
( Zhu Suoling. Research on the Application of Named Entity Recognition in Content Mining of Chinese Local Chronicles——Taking Local Chronicle: Produce of Guangdong, Fujian and Taiwan as Examples[D]. Nanjing: Nanjing Agricultural University, 2011.)
|
[16] |
李娜. 基于条件随机场的方志古籍别名自动抽取模型构建[J]. 中文信息学报, 2018,32(11):41-48, 61.
|
[16] |
( Li Na. Automatic Extraction of Alias in Ancient Local Chronicles Based on Conditional Random Fields[J]. Journal of Chinese Information Processing, 2018,32(11):41-48, 61.)
|
[17] |
邱锡鹏. 神经网络与深度学习[EB/OL]. [2019-11-21].https://nndl.github.io/nndl-book.pdf.
|
[17] |
( Qiu Xipeng. Neural Networks and Deep Learning[EB/OL]. [2019-11-21].https://nndl.github.io/nndl-book.pdf.)
|
[18] |
Bengio Y, Simard P, Frasconi P. Learning Long-term Dependencies with Gradient Descent is Difficult[J]. IEEE Transactions on Neural Networks, 1994,5(2):157-166.
doi: 10.1109/72.279181
pmid: 18267787
|
[19] |
Greff K, Srivastava R K, Koutník J, et al. LSTM: A Search Space Odyssey[J]. IEEE Transactions on Neural Networks & Learning Systems, 2015,28(10):2222-2232.
doi: 10.1109/TNNLS.2016.2582924
pmid: 27411231
|
[20] |
Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
|
[21] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
|
[22] |
Rong X. Word2vec Parameter Learning Explained[OL]. arXiv Preprint, arXiv: 1411.2738.
|
[23] |
Khare R, Çelik T. Microformats: A Pragmatic Path to the Semantic Web[C]// ACM, Proceedings of the 15th International Conference on World Wide Web. 2006: 865-866.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|