Automatic Classification Method Based on Multi-factor Algorithm
Li Jiao1,Huang Yongwen1,Luo Tingting1,Zhao Ruixue1,2,Xian Guojian1,2()
1Agricultural Information Institute of CAAS, Beijing 100081, China 2Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
[Objective] This paper develops an automatic method for classification indexing, aiming to better manage massive information resources and conduct knowledge discovery. [Methods] First, we analyzed the relationship between keywords (e.g., subject terms/concepts) and classification numbers. Then, we designed a multi-factor weighted algorithm. Finally, we proposed a scheme for automatic classification indexing. [Results] We examined our method with annotated corpora of authoritative domains and standard data sets. For literature with single subject classification number, the precision, recall and F values were 84.1%, 79.8%, and 81.9% respectively. For literature with two subject classification numbers, the precision, recall and F values were 83.4%, 78.8%, and 81.0%. [Limitations] The accuracy and completeness of our method relies on high-quality corpora, and the indexing of interdisciplinary literature needs to be improved. [Conclusions] The proposed method could effectively finish the classification tasks.
( Shen Si, Su Xinning. Exploring the Knowledge Organization of Classification Table Under the Condition of Knowledge Service[J]. Library and Information Service, 2014,58(7):113-118.)
( Lin Meilan. Correspondence List of Chinese Library Classification(R), Medical Subject Headings, and Chinese Medicine Subject Thesaurus[M]. Beijing: Science and Technology of China Press, 1992.)
Prasetyo P K, Lo D, Achananuparp P, et al. Automatic Classification of Software Related Microblogs[C]// Proceedings of the 28th IEEE International Conference on Software Maintenance, Riva del Garda, Trento, Italy. IEEE Computer Society, 2012.
( Su Xinning, Xu Jinhong, Shi Jiulin. On Automatic Classification of Archive Documents[J]. Journal of the China Society for Scientific and Technical Information, 1995,14(3):194-200.)
( Diao Qian, Wang Yongcheng, Zhang Huihui. Neural Network Optimizing Algorithm of Chinese Information Auto-classification[J]. Information and Control, 1999,28(3):179-184.)
( Hou Hanqing, Xue Pengjun. Design & Construction of Knowledge Database for Automatic Classification in Chinese[J]. Journal of the China Society for Scientific and Technical Information, 2003,22(6):681-686.)
( Zhao Yan, Hou Hanqing, Geng Jinyu, et al. A Study on the Weighted Design of Automatic Indexing of Chinese Journal Articles[J]. New Century Library, 2004(1):40-43.)
( He Lin, Hou Hanqing. Indexing Experiences and Machine Learning Based Multilevel Auto-classify[J]. Journal of the China Society of Indexers, 2006,4(1):39-43.)
( Li Xiangdong, Xu Peng, Huang Li, et al. Research of Journals Manuscript Categorization Based on KNN Algorithm[J]. Document, Information & Knowledge, 2010(4):71-76.)
( Li Xiangdong, Ba Zhichao, Gao Fan. Review of Digital Documents Automatic Classification Research[J]. New Technology of Library and Information Service, 2016(9):17-26.)
( Li Xiangdong, Ding Cong, Gao Fan. The Research of Bibliographic Information Classification Method Based on the Composite Weighted LDA Model[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(4):352-360.)
( Li Xiangdong, Ruan Tao, Liu Kang. Automatic Classification of Documents from Wikipedia[J]. Data Analysis and Knowledge Discovery, 2017,1(10):43-52.)
[16]
Ning W, Yu M. Exploiting Distributional Semantics to Benefit Machine Learning in Automated Classification of Chinese Clinical Text[C]// Proceedings of the 2016 IEEE International Conference on Bioinformatics & Biomedicine. IEEE, 2017.
[17]
Tateisi Y, Shidahara Y, Miyao Y, et al. Annotation of Computer Science Papers for Semantic Relation Extraction[C]// Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. European Language Resources Association (ELRA), 2014.
( Qian Li, Zhang Xiaolin, Wang Qian. Building and Implement on Automatic Identification Method of Research Design Fingerprint of Scientific Papers[J]. Library and Information Service, 2018,62(2):135-143.)
[19]
Tsai C T, Kundu G, Roth D. Concept-based Analysis of Scientific Literature[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, USA. Association for Computing Machinery, 2013: 1733-1738.
( Yu Li, Qian Li, Fu Changlei, et al. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. Data Analysis and Knowledge Discovery, 2019,3(1):38-45.)
( Hou Hanqing, Xue Pengjun. Design of Web Page Auto-indexing & Auto-classification System Based on the Knowledge Database[J]. Journal of Academic Libraries, 2004,22(1):50-55, 64.)