Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (3): 57-65    DOI: 10.11925/infotech.2096-3467.2018.0213
The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books
Yue Yuan1,Dongbo Wang1,2,Shuiqing Huang1,2(),Bin Li3
1College of Information Science and Technology, Nanjijg Agricultural University, Nanjing 210095, China
2Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University, Nanjing 210095, China
3School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China
[Objective] In the context of digital humanities, in order to excavate the corresponding knowledge from the Pre-Qin literature more deeply and accurately, for different parts of the set of lexicon in the class of entity extraction model on the differences in the study. [Methods] Based on the training and testing corpora consisting of “Zuo Zhuan” and “Guo Yu” which have been manually labeled by the machine, three tagging sets of different sizes are formed, with the Pre-Qin part-of-speech tagging set of Nanjing normal university as the main part, supplemented by the part-of-speech tagging sets of Peking University, the Institute of Computing Technology of Chinese Academy of Sciences and the Ministry of Education. The differences between the results of the entity extraction on the same corpus were compared by using the conditional random field and the feature templates. [Results] Comparative experiments were carried out on three part-of-speech tagging sets of different sizes in the Pre-Qin classics “Zuo Zhuan” and “Guo Yu”. The F values of the three models were 82.53%, 83.42% and 84.07%, respectively. [Limitations] Feature selection needs further improvement, and training results can be improved. [Conclusions] The result is helpful for the extraction of the named entities in the ancient literature of the Pre-Qin period. The set of part-of-speech tags constructed is suitable for the part-of-speech tagging of ancient Chinese.

Key wordsDigital Humanities      Ancient Chinese Character Information Processing      Parts of Speech Tagging      Named Entity Extraction     
Received: 27 February 2018      Published: 17 April 2019

Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books. Data Analysis and Knowledge Discovery, 2019, 3(3): 57-65.

