Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (8): 86-97    DOI: 10.11925/infotech.2096-3467.2020.0032
Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning
Xu Chenfei1,2,Ye Haiying2,Bao Ping1()
1Institution of Chinese Agricultural Civilization, Nanjing Agricultural University, Nanjing 210095, China
2Economics and Management School, Nantong University, Nantong 226019, China
[Objective] This paper tries to automatically identify the produce aliases, related human figures, places of origin and cited books from ancient local chronicles, aiming to establish a knowledge base for traditional products. [Methods] Firstly, we chose Local Chronicle of Yunnan: Produce as the basic corpus and preprocessed its texts to carry out corpus tagging. Then, we adopted four deep learning models (Bi-RNN, Bi-LSTM, Bi-LSTM-CRF and BERT) to identify the needed entities. Finally, we compared outputs of these models. [Results] The P-value and F-value of the Bi-LSTM model were 5.54% and 3.51% higher than those of the Bi-LSTM-CRF model. The R-value of the BERT model reached 83.36%, which was the best among all models. The Bi-LSTM-CRF model yielded the best results with the entity recognition of cited books (F-value=89.71%), and the BERT model had the best performance on character entities with a F-value of 87.90%. [Limitations] Due to the linguistic characteristics of ancient local chronicles and the domain knowledge required for identifying related entities, there may be errors in tagging. [Conclusions] Deep learning could help us identify needed entities from ancient local chronicles effectively.

Key wordsDeep Learning      Local Chronicle: Produce      Named Entity Recognition      Models Construction      Digital Humanities     
Received: 08 January 2020      Published: 05 June 2020
ZTFLH:  G255  
Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning. Data Analysis and Knowledge Discovery, 2020, 4(8): 86-97.

Entity Recognition Model of Local Chronicle: Produce Based on RNN
Entity Recognition Model of Local Chronicle: Produce Based on Bi-LSTM
Entity Recognition Model of Local Chronicle: Produce Based on Bi-LSTM-CRF
Entity Recognition Model of Local Chronicle: Produce Based on BERT
Input Representation of Local Chronicle: Produce Based on BERT
10 Randomly Selected Produce Items
序号 词语 标记
1 B-PN
2 E-PN
3 B-PC
4 I-PC
5 I-PC
6 I-PC
7 E-PC
8 O
9 O
10 O
11 O
12 O
13 B-PA
14 I-PA
15 E-PA
Processing Results of Ancient Local Chronicles
Bi-LSTM/Bi-RNN层数 2
隐含层大小 256
学习率 0.001
Batch-size 64
Dropout比率 0.5
Clip gradient 5
Hyper-parameters of Experiment
BERT层数 2
隐含层大小 128
学习率 2e-5
Batch-size 32
Train-epochs 10
Hyper-parameters of Experiment(BERT)
模型 P(%) R(%) F(%)
Bi-RNN 69.91 75.10 72.38
Bi-LSTM 76.33 76.73 76.51
Bi-LSTM-CRF 81.87 78.30 80.02
BERT 76.61 83.36 79.83
Results of Different Models of Ancient Local Chronicles: Produce
The Results of Identifying Different Entities of Bi-RNN and Bi-LSTM
The Results of Identifying Different Entities of Bi-LSTM and Bi-LSTM-CRF
The Results of Identifying Different Entities of Bi-LSTM-CRF and BERT

The Retrieval Results of "Youtanbo"
The Detailed Page of Knowledge Base of Local Chronicles: Produce

Linked Data Visualization of "Youtanbo"

Space-time Reveal of Produce "Tobacco"
