Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (3): 80-87    DOI: 10.11925/infotech.1003-3513.2014.03.12
Current Issue | Archive | Adv Search |
The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles
Wang Hao, Ye Peng, Deng Sanhong
School of Information Management, Nanjing University, Nanjing 210093, China
Download: PDF(725 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Under the computing mode of machine learning, using the methods of feature weighting and shallow-hierarchical classification can effectively achieve Chinese Library Classification (CLC) classification for periodical articles. [Context] The traditional way of artificial classification shows its own limits in the background of "Big Data", and the trend of periodicals electronic makes that automatic classification techniques can effectively relief the pressure of artificial classification jobs. [Methods] This paper introduces the thinking of machine-learning into the field of automatic classification of periodical articles. It analyzes and compares the effects of Support Vector Machine(SVM) and BP Neural Networks Algorithm(BPNN) in the procedure of automatic classification, transforms CLC into another classification system with three levels in the thoughts of hierarchical classification, and sets the weights based the sources of classification features. [Results] The experiments of classification tests show that SVM is more reasonable than BPNN under the condition of large-scale sparse data, the accuracy rates of these three levels reach 95.05%, 92.89% and 89.02%, and the integrated accuracy rate is close to 80%, and the feature weights from mulit-sources can lead to better classification results than single-source. [Conclusions] The study proves that the model of machine-learning with feature weighting and shallow-hierarchical classification in automatic classification of periodical articles has higher feasibility, rationality and effectiveness, and a new idea on automatic classification of periodical articles has been presented.

Key wordsMachine-Learning      Periodical article      Automatic text categorization      Feature weighting      Hierarchy classification     
Received: 02 September 2013      Published: 15 April 2014
:  TP391  

Cite this article:

Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles. New Technology of Library and Information Service, 2014, 30(3): 80-87.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.03.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I3/80

[1] 王洪, 贾惠波, 徐端颐. 基于中文学术期刊人工标引的自动分类新算法[J]. 现代图书情报技术, 2002(S1): 59-62. (Wang Hong, Jia Huibo, Xu Duanyi. The New Algorithm of Automatic Classification Based on Artificial Indexing of Chinese Academic Periodical[J]. New Technology of Library and Information Service, 2002(S1): 59-62.)

[2] 包剑, 冀常鹏, 李义杰. 基于矢量空间模型的文本自动分类系统研究[J]. 计算机系统应用, 2005, 14(3): 47-49. (Bao Jian, Ji Changpeng, Li Yijie. Research of Automatic Text Categorization System Based on VSM[J]. Computer Systems & Applications, 2005, 14(3): 47-49.)

[3] 陈玉芹. 多类别科技文献自动分类系统[D]. 武汉: 华中科技大学, 2008. (Chen Yuqin. Multi-class Automatic Categorization System for Technology Literature[D]. Wuhan: Huazhong University of Science and Technology, 2008.)

[4] 张雪英. 基于机器学习的文本自动分类研究进展[J]. 情报学报, 2006, 25(6): 730-739. (Zhang Xueying. Review of Machine Learning in Automatic Text Categorization[J]. Journal of the China Society for Scientific and Technical Information, 2006, 25(6): 730-739.)

[5] 牛延莉, 张化. 文本自动分类研究进展[J]. 软件导刊, 2008, 7(4): 24-26. (Niu Yanli, Zhang Hua. Progress of Automatic Text Classification[J]. Software Guide, 2008, 7(4): 24-26.)

[6] 何琳, 侯汉清, 白振田, 等. 基于标引经验和机器学习相结合的多层自动分类[J]. 情报学报, 2006,25(6):725-729. (He Lin, Hou Hanqing, Bai Zhentian, et al. Automatic Multi-layer Classification Method Based on Integration of Machine Learning and Indexing Experience[J]. Journal of the China Society for Scientific and Technical Information, 2006, 25(6): 725-729.)

[7] 谈文蓉, 杨宪泽, 谈进. 基于相似分类的文献理解及自动文摘系统研究[J]. 计算机科学, 2006, 33(9): 152-154. (Tan Wenrong, Yang Xianze, Tan Jin. Study for Document Interpretation and Automatic Abstracting Based on Analogic Sorting[J]. Computer Science, 2006, 33(9): 152-154.)

[8] 萧莉明, 于宽, 蔡珣. 一种基于Bayes分类器的中文期刊自动分类系统[J]. 现代情报, 2007,27(4):146-147,150. (Xiao Liming, Yu Kuan, Cai Xun. An Automatic Classification System of Chinese Periodical Based on Bayes Classifier[J]. Journal of Modern Information, 2007,27(4): 146-147,150.)

[9] 张野, 杨建林. 基于KNN和SVM的中文文本自动分类研究[J]. 情报科学, 2011,29(9): 1313-1317, 1377. (Zhang Ye,Yang Jianlin. Reseach on Automatic Classification for Chinese Text Based on KNN and SVM[J]. Information Science, 2011,29(9): 1313-1317, 1377.)

[10] Dalal M K, Zaveri M A. Automatic Text Classification: A Technical Review[J]. International Journal of Computer Applications, 2011, 28(2): 37-40.

[11] Li W, Miao D, Wang W. Two-level Hierarchical Combination Method for Text Classification[J]. Expert Systems with Applications, 2011, 38(3): 2030-2039.

[12] Ren F, Sohrab M G. Class-indexing-based Term Weighting for Automatic Text Classification[J]. Information Sciences, 2013, 236: 109-125.

[13] 张燕平, 张玲. 机器学习理论与算法[M]. 北京: 科学出版社, 2012.(Zhang Yanping, Zhang Ling. Machine Learning Theory and Algorithms[M]. Beijing: Science Press, 2012.)

[14] 施彦, 韩力群, 廉小亲. 神经网络设计方法与实例分析[M]. 北京: 北京邮电大学出版社, 2009. (Shi Yan, Han Liqun, Lian Xiaoqin. Neural Network Design Methods and Case Analysis[M]. Beijing: Beijing University of Posts and Telecommunications Press, 2009.)

[15] 张德丰. MATLAB神经网络仿真与应用[M] . 北京: 电子工业出版社, 2009. (Zhang Defeng. MATLAB Neural Network Simulation and Application[M]. Beijing: Publishing House of Electronics Industry, 2009.)

[16] 奉国和. SVM分类核函数及参数选择比较[J]. 计算机工程与应用, 2011,47(3): 123-124,128. (Feng Guohe. Parameter Optimizing for Support Vector Machines Classification[J]. Computer Engineering and Applications, 2011,47(3): 123- 124,128.)

[17] 刘大宁, 杨永乐, 白林. SVM核函数对分类精度影响的研究[J]. 佳木斯大学学报: 自然科学版, 2012, 30(4): 627-630. (Liu Daning,Yang Yongle, Bai Lin. Impact of SVM Kernel Function on the Classification Accuracy [J]. Journal of Jiamusi University: Natural Science Edition, 2012, 30(4): 627-630.)

[18] 王东波, 苏新宁, 朱丹浩,等. 基于支持向量机的医学期刊文章自动分类研究[J]. 情报理论与实践, 2011,34(4): 115-118. (Wang Dongbo, Su Xinning, Zhu Danhao, et al. The Study on Automatic Classification of Medical Journal Articles Based on SVM[J]. Information Studies: Theory & Application, 2011,34(4):115-118.)

[1] Xiangdong Li,Fan Gao,Youhai Li. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[2] Lu Yonghe, Wang Hongbin. Feature Weighting Method Affected by Part of Speech in Text Classification[J]. 现代图书情报技术, 2015, 31(4): 18-25.
[3] Shi Jiebin. Study on Automatic Text Categorization with Support Vector Machine[J]. 现代图书情报技术, 2004, 20(7): 27-29.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn