|
|
Automatic Classification Method Based on Multi-factor Algorithm |
Li Jiao1,Huang Yongwen1,Luo Tingting1,Zhao Ruixue1,2,Xian Guojian1,2() |
1Agricultural Information Institute of CAAS, Beijing 100081, China 2Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China |
|
|
Abstract [Objective] This paper develops an automatic method for classification indexing, aiming to better manage massive information resources and conduct knowledge discovery. [Methods] First, we analyzed the relationship between keywords (e.g., subject terms/concepts) and classification numbers. Then, we designed a multi-factor weighted algorithm. Finally, we proposed a scheme for automatic classification indexing. [Results] We examined our method with annotated corpora of authoritative domains and standard data sets. For literature with single subject classification number, the precision, recall and F values were 84.1%, 79.8%, and 81.9% respectively. For literature with two subject classification numbers, the precision, recall and F values were 83.4%, 78.8%, and 81.0%. [Limitations] The accuracy and completeness of our method relies on high-quality corpora, and the indexing of interdisciplinary literature needs to be improved. [Conclusions] The proposed method could effectively finish the classification tasks.
|
Received: 24 March 2020
Published: 04 December 2020
|
|
Corresponding Authors:
Xian Guojian
E-mail: xianguojian@caas.cn
|
[1] |
沈思, 苏新宁. 知识服务环境下分类表的知识组织探究[J]. 图书情报工作, 2014,58(7):113-118.
|
[1] |
( Shen Si, Su Xinning. Exploring the Knowledge Organization of Classification Table Under the Condition of Knowledge Service[J]. Library and Information Service, 2014,58(7):113-118.)
|
[2] |
樊瑜. 关于修订《中国图书馆分类法·期刊分类表》(第二版)的几点建议[J]. 图书情报工作, 2006,50(3):115-118.
|
[2] |
( Fan Yu. Some Concerns About 2nd Edition of CLC-Classification Table of Periodical[J]. Library and Information Service, 2006,50(3):115-118.)
|
[3] |
林美兰. 中国图书馆图书分类法(R类)与医学主题词表(MeSH)、中医药学主题词表对应表[M]. 北京: 中国科学技术出版社, 1992.
|
[3] |
( Lin Meilan. Correspondence List of Chinese Library Classification(R), Medical Subject Headings, and Chinese Medicine Subject Thesaurus[M]. Beijing: Science and Technology of China Press, 1992.)
|
[4] |
Scorpion[EB/OL]. [2020-01-24]. https://www.oclc.org/research/activities/scorpion.html.
|
[5] |
KBS-CROSS[EB/OL]. [2020-01-24]. http://it.civil.auc.dk/it/delphi/KBS/projects/kbscross.html.
|
[6] |
Prasetyo P K, Lo D, Achananuparp P, et al. Automatic Classification of Software Related Microblogs[C]// Proceedings of the 28th IEEE International Conference on Software Maintenance, Riva del Garda, Trento, Italy. IEEE Computer Society, 2012.
|
[7] |
苏新宁, 徐进鸿, 史九林. 档案自动分类算法研究[J]. 情报学报, 1995,14(3):194-200.
|
[7] |
( Su Xinning, Xu Jinhong, Shi Jiulin. On Automatic Classification of Archive Documents[J]. Journal of the China Society for Scientific and Technical Information, 1995,14(3):194-200.)
|
[8] |
刁倩, 王永成, 张惠惠. 中文信息自动分类系统及其神经网络优化算法[J]. 信息与控制, 1999,28(3):179-184.
|
[8] |
( Diao Qian, Wang Yongcheng, Zhang Huihui. Neural Network Optimizing Algorithm of Chinese Information Auto-classification[J]. Information and Control, 1999,28(3):179-184.)
|
[9] |
侯汉清, 薛鹏军. 中文信息自动分类用知识库的设计与构建[J]. 情报学报, 2003,22(6):681-686.
|
[9] |
( Hou Hanqing, Xue Pengjun. Design & Construction of Knowledge Database for Automatic Classification in Chinese[J]. Journal of the China Society for Scientific and Technical Information, 2003,22(6):681-686.)
|
[10] |
赵妍, 侯汉清, 耿金玉, 等. 中文期刊论文自动标引加权设计研究[J]. 新世纪图书馆, 2004(1):40-43.
|
[10] |
( Zhao Yan, Hou Hanqing, Geng Jinyu, et al. A Study on the Weighted Design of Automatic Indexing of Chinese Journal Articles[J]. New Century Library, 2004(1):40-43.)
|
[11] |
何琳, 侯汉清. 基于标引经验和机器学习相结合的多层自动分类[J]. 中国索引, 2006,4(1):39-43.
|
[11] |
( He Lin, Hou Hanqing. Indexing Experiences and Machine Learning Based Multilevel Auto-classify[J]. Journal of the China Society of Indexers, 2006,4(1):39-43.)
|
[12] |
李湘东, 徐朋, 黄莉, 等. 基于KNN算法的文本自动分类方法研究——以学术期刊栏目自动归类为例[J]. 图书情报知识, 2010(4):71-76.
|
[12] |
( Li Xiangdong, Xu Peng, Huang Li, et al. Research of Journals Manuscript Categorization Based on KNN Algorithm[J]. Document, Information & Knowledge, 2010(4):71-76.)
|
[13] |
李湘东, 巴志超, 高凡. 数字文本自动分类中特征语义关联及加权策略研究综述与展望[J]. 现代图书情报技术, 2016(9):17-26.
|
[13] |
( Li Xiangdong, Ba Zhichao, Gao Fan. Review of Digital Documents Automatic Classification Research[J]. New Technology of Library and Information Service, 2016(9):17-26.)
|
[14] |
李湘东, 丁丛, 高凡. 基于复合加权LDA模型的书目信息分类方法研究[J]. 情报学报, 2017,36(4):352-360.
|
[14] |
( Li Xiangdong, Ding Cong, Gao Fan. The Research of Bibliographic Information Classification Method Based on the Composite Weighted LDA Model[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(4):352-360.)
|
[15] |
李湘东, 阮涛, 刘康. 基于维基百科的多种类型文献自动分类研究[J]. 数据分析与知识发现, 2017,1(10):43-52.
|
[15] |
( Li Xiangdong, Ruan Tao, Liu Kang. Automatic Classification of Documents from Wikipedia[J]. Data Analysis and Knowledge Discovery, 2017,1(10):43-52.)
|
[16] |
Ning W, Yu M. Exploiting Distributional Semantics to Benefit Machine Learning in Automated Classification of Chinese Clinical Text[C]// Proceedings of the 2016 IEEE International Conference on Bioinformatics & Biomedicine. IEEE, 2017.
|
[17] |
Tateisi Y, Shidahara Y, Miyao Y, et al. Annotation of Computer Science Papers for Semantic Relation Extraction[C]// Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. European Language Resources Association (ELRA), 2014.
|
[18] |
钱力, 张晓林, 王茜. 科技论文的研究设计指纹自动识别方法构建与实现[J]. 图书情报工作, 2018,62(2):135-143.
|
[18] |
( Qian Li, Zhang Xiaolin, Wang Qian. Building and Implement on Automatic Identification Method of Research Design Fingerprint of Scientific Papers[J]. Library and Information Service, 2018,62(2):135-143.)
|
[19] |
Tsai C T, Kundu G, Roth D. Concept-based Analysis of Scientific Literature[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, USA. Association for Computing Machinery, 2013: 1733-1738.
|
[20] |
余丽, 钱力, 付常雷, 等. 基于深度学习的文本中细粒度知识元抽取方法研究[J]. 数据分析与知识发现, 2019,3(1):38-45.
|
[20] |
( Yu Li, Qian Li, Fu Changlei, et al. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. Data Analysis and Knowledge Discovery, 2019,3(1):38-45.)
|
[21] |
侯汉清, 薛鹏军. 基于知识库的网页自动标引和自动分类系统的设计[J]. 大学图书馆学报, 2004,22(1):50-55, 64.
|
[21] |
( Hou Hanqing, Xue Pengjun. Design of Web Page Auto-indexing & Auto-classification System Based on the Knowledge Database[J]. Journal of Academic Libraries, 2004,22(1):50-55, 64.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|