[Objective] This paper develops a manpower-saving method of automatic classification indexing with wide appilcability, aiming to support classification management of massive information resource and disclosure of subject area.
[Methods] By analyzing the correspondence between terms, concepts and other keywords representing the subject concept in the literature and classification number, we designed a multi-factor weighted algorithm, and proposed a full-process automatic classification indexing scheme.
[Results] The experiment based on authoritative multi-domain annotated corpora and standards sets shows: For literature with single subject classification number, the precision, recall and F values were 84.1%, 79.8%, and 81.9% respectively. For literature with two subject classification numbers, the precision, recall and F values were 83.4%, 78.8%, and 81%.
[Limitations] The accuracy and completeness of subject classification indexing depends on high-quality annotation corpora, and the indexing of interdisciplinary literature needs to be improved.
[Conclusions] The proposed automatic classification indexing based on multi-factor algorithm has high operability and practical application value.