Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (8): 86-97    DOI: 10.11925/infotech.2096-3467.2020.0032
Current Issue | Archive | Adv Search |
Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning
Xu Chenfei1,2,Ye Haiying2,Bao Ping1()
1Institution of Chinese Agricultural Civilization, Nanjing Agricultural University, Nanjing 210095, China
2Economics and Management School, Nantong University, Nantong 226019, China
Download: PDF (2235 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to automatically identify the produce aliases, related human figures, places of origin and cited books from ancient local chronicles, aiming to establish a knowledge base for traditional products. [Methods] Firstly, we chose Local Chronicle of Yunnan: Produce as the basic corpus and preprocessed its texts to carry out corpus tagging. Then, we adopted four deep learning models (Bi-RNN, Bi-LSTM, Bi-LSTM-CRF and BERT) to identify the needed entities. Finally, we compared outputs of these models. [Results] The P-value and F-value of the Bi-LSTM model were 5.54% and 3.51% higher than those of the Bi-LSTM-CRF model. The R-value of the BERT model reached 83.36%, which was the best among all models. The Bi-LSTM-CRF model yielded the best results with the entity recognition of cited books (F-value=89.71%), and the BERT model had the best performance on character entities with a F-value of 87.90%. [Limitations] Due to the linguistic characteristics of ancient local chronicles and the domain knowledge required for identifying related entities, there may be errors in tagging. [Conclusions] Deep learning could help us identify needed entities from ancient local chronicles effectively.

Key wordsDeep Learning      Local Chronicle: Produce      Named Entity Recognition      Models Construction      Digital Humanities     
Received: 08 January 2020      Published: 05 June 2020
ZTFLH:  G255  
Corresponding Authors: Bao Ping     E-mail: baoping@njau.edu.cn

Cite this article:

Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning. Data Analysis and Knowledge Discovery, 2020, 4(8): 86-97.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0032     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I8/86

Entity Recognition Model of Local Chronicle: Produce Based on RNN
Entity Recognition Model of Local Chronicle: Produce Based on Bi-LSTM
Entity Recognition Model of Local Chronicle: Produce Based on Bi-LSTM-CRF
Entity Recognition Model of Local Chronicle: Produce Based on BERT
Input Representation of Local Chronicle: Produce Based on BERT
10 Randomly Selected Produce Items
序号 词语 标记
1 B-PN
2 E-PN
3 B-PC
4 I-PC
5 I-PC
6 I-PC
7 E-PC
8 O
9 O
10 O
11 O
12 O
13 B-PA
14 I-PA
15 E-PA
Processing Results of Ancient Local Chronicles
超参数
Bi-LSTM/Bi-RNN层数 2
隐含层大小 256
学习率 0.001
Batch-size 64
Dropout比率 0.5
Clip gradient 5
Hyper-parameters of Experiment
超参数
BERT层数 2
隐含层大小 128
学习率 2e-5
Batch-size 32
Train-epochs 10
Hyper-parameters of Experiment(BERT)
模型 P(%) R(%) F(%)
Bi-RNN 69.91 75.10 72.38
Bi-LSTM 76.33 76.73 76.51
Bi-LSTM-CRF 81.87 78.30 80.02
BERT 76.61 83.36 79.83
Results of Different Models of Ancient Local Chronicles: Produce
The Results of Identifying Different Entities of Bi-RNN and Bi-LSTM
The Results of Identifying Different Entities of Bi-LSTM and Bi-LSTM-CRF
The Results of Identifying Different Entities of Bi-LSTM-CRF and BERT

">
The Retrieval Results of "Youtanbo"
The Detailed Page of Knowledge Base of Local Chronicles: Produce

">
Linked Data Visualization of "Youtanbo"

">
Space-time Reveal of Produce "Tobacco"
[1] 黄水清, 王东波. 古文信息处理研究的现状及趋势[J]. 图书情报工作, 2017,61(12):42-49.
[1] ( Huang Shuiqing, Wang Dongbo. Review and Trend of Researches on Ancient Chinese Character Information Processing[J]. Library and Information Service, 2017,61(12):42-49.)
[2] 仓修良. 方志学通论(增订本)[M]. 上海: 华东师范大学出版社, 2014.
[2] ( Cang Xiuliang. General Theory of the Study of Local Chronicles (Revised Edition)[M]. Shanghai: East China Normal University Press, 2014.)
[3] 包平, 李昕升, 卢勇. 方志物产史料的价值、利用与展望——以《方志物产》为中心[J]. 中国农史, 2018,37(3):117-126.
[3] ( Bao Ping, Li Xinsheng, Lu Yong. The Value and Utilization and Prospect of the Historical Materials of Products in Local Chronicles——Take Products in Local Chronicles for Example[J]. Agricultural History of China, 2018,37(3):117-126.)
[4] 谢韬. 基于古文学的命名实体识别的研究与实现[D]. 北京: 北京邮电大学, 2018.
[4] ( Xie Tao. Research and Implementation of Named Entity Recognition Based on Ancient Literature[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
[5] 王铮. 基于CRF的古籍地名自动识别研究——以《三国演义》为例[D]. 南宁: 广西民族大学, 2008.
[5] ( Wang Zheng. Conditional Random Fields Based Location Name Recognition in Ancient Chinese——Take the “Romance of the Three Kingdoms” as an Example[D]. Nanning: Guangxi University for Nationalities, 2008.)
[6] 肖磊. 《左传》地名研究初探[J]. 文教资料, 2009(18):204-207.
[6] ( Xiao Lei. A Preliminary Study on Place Names in Zuo Zhuan[J]. Data of Culture and Education, 2009(18):204-207.)
[7] 汪青青. 先秦人名识别初探[J]. 文教资料, 2009(18):202-204.
[7] ( Wang Qingqing. A Preliminary Study on Name Recognition in Pre-Qin Period[J]. Data of Culture and Education, 2009(18):202-204.)
[8] 黄水清, 王东波, 何琳. 基于先秦语料库的古汉语地名自动识别模型构建研究[J]. 图书情报工作, 2015,59(12):135-140.
[8] ( Huang Shuiqing, Wang Dongbo, He Lin. Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J]. Library and Information Service, 2015,59(12):135-140.)
[9] 叶辉, 姬东鸿. 基于多特征条件随机场的《金匮要略》症状药物信息抽取研究[J]. 中国中医药图书情报杂志, 2016,40(5):14-17.
[9] ( Ye Hui, Ji Donghong. Research on Symptom and Medicine Information Abstraction of TCM Book Jin Gui Yao Lue Based on Conditional Random Field[J]. Chinese Journal of Library and Information Science for Traditional Chinese Medicine, 2016,40(5):14-17.)
[10] 王东波, 高瑞卿, 沈思, 等. 面向先秦典籍的历史事件基本实体构件自动识别研究[J]. 国家图书馆学刊, 2018,27(1):65-77.
[10] ( Wang Dongbo, Gao Ruiqing, Shen Si, et al. Research on Automatic Recognition of Basic Entity Component of Historic Events for Pre-Qin Classics[J]. Journal of the National Library of China, 2018,27(1):65-77.)
[11] 龚德山. 命名实体识别在中药名词和方剂名词识别中的比较研究[D]. 北京:北京中医药大学, 2019.
[11] ( Gong Deshan. A Comparative Study of Named Entity Recognition in Recognizing the Names of Chinese Medicine Herbs and Formulae[D]. Beijing: Beijing University of Chinese Medicine, 2019.)
[12] 刘士纲. 《清实录》人名撷取自动化[D]. 台北: 台湾大学, 2012.
[12] ( Liu Shigang. Automated Annotation of Person Name of the Veritable Records of the Qing Dynasty[D]. Taipei: Taiwan University, 2012.)
[13] 张尚斌. 词夹子演算法在专有名词辨识上的应用——以历史文件为例[D]. 台北: 台湾大学, 2006.
[13] ( Zhang Shangbin. A Word-Clip Algorithm for Named Entity Recognition——by Example of Historical Documents[D]. Taipei: Taiwan University, 2006.)
[14] 衡中青. 地方志知识组织及内容挖掘研究: 以《方志物产·广东》为例[M]. 芜湖: 安徽师范大学出版社, 2012.
[14] ( Heng Zhongqing. Research on Knowledge Organization & Content Mining of the Chinese Local Chronicle——Taking Local Chronicle of Guangdong: Produce as an Example[M]. Wuhu: Anhui Normal University Press, 2012.)
[15] 朱锁玲. 命名实体识别在方志内容挖掘中的应用研究——以广东、福建、台湾三省《方志物产》为例[D]. 南京: 南京农业大学, 2011.
[15] ( Zhu Suoling. Research on the Application of Named Entity Recognition in Content Mining of Chinese Local Chronicles——Taking Local Chronicle: Produce of Guangdong, Fujian and Taiwan as Examples[D]. Nanjing: Nanjing Agricultural University, 2011.)
[16] 李娜. 基于条件随机场的方志古籍别名自动抽取模型构建[J]. 中文信息学报, 2018,32(11):41-48, 61.
[16] ( Li Na. Automatic Extraction of Alias in Ancient Local Chronicles Based on Conditional Random Fields[J]. Journal of Chinese Information Processing, 2018,32(11):41-48, 61.)
[17] 邱锡鹏. 神经网络与深度学习[EB/OL]. [2019-11-21].https://nndl.github.io/nndl-book.pdf.
[17] ( Qiu Xipeng. Neural Networks and Deep Learning[EB/OL]. [2019-11-21].https://nndl.github.io/nndl-book.pdf.)
[18] Bengio Y, Simard P, Frasconi P. Learning Long-term Dependencies with Gradient Descent is Difficult[J]. IEEE Transactions on Neural Networks, 1994,5(2):157-166.
doi: 10.1109/72.279181 pmid: 18267787
[19] Greff K, Srivastava R K, Koutník J, et al. LSTM: A Search Space Odyssey[J]. IEEE Transactions on Neural Networks & Learning Systems, 2015,28(10):2222-2232.
doi: 10.1109/TNNLS.2016.2582924 pmid: 27411231
[20] Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[21] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[22] Rong X. Word2vec Parameter Learning Explained[OL]. arXiv Preprint, arXiv: 1411.2738.
[23] Khare R, Çelik T. Microformats: A Pragmatic Path to the Semantic Web[C]// ACM, Proceedings of the 15th International Conference on World Wide Web. 2006: 865-866.
[1] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[3] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[5] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[7] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[8] Feng Yong,Liu Yang,Xu Hongyan,Wang Rongbing,Zhang Yonggang. Recommendation Model Incorporating Neighbor Reviews for GRU Products[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[9] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[10] Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[11] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[12] Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[13] Cheng Bin,Shi Shuicai,Du Yuncheng,Xiao Shibin. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[14] Li Danyang, Gan Mingxin. Music Recommendation Method Based on Multi-Source Information Fusion[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[15] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn