%A Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li %T The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books %0 Journal Article %D 2019 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2018.0213 %P 57-65 %V 3 %N 3 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4629.shtml} %8 2019-03-25 %X

[Objective] In the context of digital humanities, in order to excavate the corresponding knowledge from the Pre-Qin literature more deeply and accurately, for different parts of the set of lexicon in the class of entity extraction model on the differences in the study. [Methods] Based on the training and testing corpora consisting of “Zuo Zhuan” and “Guo Yu” which have been manually labeled by the machine, three tagging sets of different sizes are formed, with the Pre-Qin part-of-speech tagging set of Nanjing normal university as the main part, supplemented by the part-of-speech tagging sets of Peking University, the Institute of Computing Technology of Chinese Academy of Sciences and the Ministry of Education. The differences between the results of the entity extraction on the same corpus were compared by using the conditional random field and the feature templates. [Results] Comparative experiments were carried out on three part-of-speech tagging sets of different sizes in the Pre-Qin classics “Zuo Zhuan” and “Guo Yu”. The F values of the three models were 82.53%, 83.42% and 84.07%, respectively. [Limitations] Feature selection needs further improvement, and training results can be improved. [Conclusions] The result is helpful for the extraction of the named entities in the ancient literature of the Pre-Qin period. The set of part-of-speech tags constructed is suitable for the part-of-speech tagging of ancient Chinese.