Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (11): 43-51    DOI: 10.11925/infotech.2096-3467.2020.0238
Automatic Classification Method Based on Multi-factor Algorithm
Li Jiao1,Huang Yongwen1,Luo Tingting1,Zhao Ruixue1,2,Xian Guojian1,2()
1Agricultural Information Institute of CAAS, Beijing 100081, China
2Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
Abstract

[Objective] This paper develops an automatic method for classification indexing, aiming to better manage massive information resources and conduct knowledge discovery. [Methods] First, we analyzed the relationship between keywords (e.g., subject terms/concepts) and classification numbers. Then, we designed a multi-factor weighted algorithm. Finally, we proposed a scheme for automatic classification indexing. [Results] We examined our method with annotated corpora of authoritative domains and standard data sets. For literature with single subject classification number, the precision, recall and F values were 84.1%, 79.8%, and 81.9% respectively. For literature with two subject classification numbers, the precision, recall and F values were 83.4%, 78.8%, and 81.0%. [Limitations] The accuracy and completeness of our method relies on high-quality corpora, and the indexing of interdisciplinary literature needs to be improved. [Conclusions] The proposed method could effectively finish the classification tasks.

Received: 24 March 2020      Published: 04 December 2020
Corresponding Authors: Xian Guojian     E-mail: xianguojian@caas.cn