Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (9): 74-80    DOI: 10.11925/infotech.1003-3513.2014.09.10
Current Issue | Archive | Adv Search |
Study on Improvement of Text Classification Using HS-SVM
Hu Jiming, Chen Guo
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF(622 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] In terms of the class features vector changing and overlapping, this paper improves the classification algorithm conducted by super ball supported vector machine. [Methods] Starting from combing the operational mechanism of LDA and HS-SVM, as well as the related studies, this paper constructs a text classification model based on LDA and HS-SVM. The traditional HS-SVM is improved considering incremental learning and intensive degree, and then the dynamic change of hyper-sphere class' support vector would be achieved and the decision function for constructing hyper-sphere support vector machine would be accurately calculated. [Results] The effect of text classification can be improved from the perspectives of precision rate and recall rate. Comparative experiments are conducted and the results demonstrate that methods in this article are feasible and effective which can effectively improve texts classification. In addition, this method reduces the time of modeling and has little influence on accuracy of predication. [Limitations] Noted that the proposal in this paper is comparatively more complex than the original algorithm that need continuous improvement; and the results needs experiments on more data sets. Meanwhile, the improvement on essence of algorithm is not optimal which is necessary to be further studied. [Conclusions] This study is helpful to improve the accuracy and reduce the training time in large-scale text categorization, and also improve the efficiency and performance of text classification.

Key wordsLDA topic model      Hyper-Sphere Support Vector Machine(HS-SVM)      Incremental learning      Intensive degree decision function     
Received: 04 December 2013      Published: 20 October 2014
:  TP391  

Cite this article:

Hu Jiming, Chen Guo. Study on Improvement of Text Classification Using HS-SVM. New Technology of Library and Information Service, 2014, 30(9): 74-80.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.09.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I9/74

[1] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] 张玉峰, 何超.基于潜在语义分析和改进的HS-SVM的文本分类模型研究[J].图书情报工作, 2010, 54(10): 109-113. (Zhang Yufeng, He Chao. Research of Text Classification Model Based on Latent Semantic Analysis and Improved of HS-SVM [J]. Library and Information Service, 2010, 54(10): 109-113.)
[3] Lakshminarayanan B, Raich R. Inference in Supervised Latent Dirichlet Allocation [C]. In: Proceedings of the 21st IEEE International Workshop on Machine Learning for Signal Processing (MLSP). 2011.
[4] Momtazi S, Naumann F. Topic Modeling for Expert Finding Using Latent Dirichlet Allocation [J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2013, 3(5): 346-353.
[5] Du L, Buntine W, Jin H, et al. Sequential Latent Dirichlet Allocation [J]. Knowledge and Information Systems, 2012, 31(3): 475-503.
[6] Guo Q, Li N, Yang Y, et al. Supervised LDA for Image Annotation[C]. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC). 2011: 471-476.
[7] Tsang I W, Kocsor A, Kwok J T. Simpler Core Vector Machines with Enclosing Balls [C]. In: Proceedings of the 24th International Conference on Machine Learning. New York, NY, USA: ACM, 2007: 911-918.
[8] Strack R, Kecman V, Strack B, et al. Sphere Support Vector Machines for Large Classification Tasks [J]. Neurocom-puting, 2013, 101:59-67.
[9] Chau A L, Li X O, Yu W. Large Data Sets Classification Using Convex-Concave Hull and Support Vector Machine [J]. Soft Computing, 2013, 17(5): 793-804.
[10] Yun S W, Shu Y X, Ge B. An Algorithm of Sphere-Structure Support Vector Machine Multi-classification Recognition on the Basis of Weighted Relative Distances [C]. In: Proceedings of the International Conference on Life System Modeling and Simulation/International Conference on Intelligent Computing for Sustainable Energy and Environment. Berlin: Springer, 2010:506-514.
[11] 艾青, 秦玉平, 李迎春. 基于超球支持向量机的多主题文本分类算法[J]. 计算机工程与设计, 2010, 31(10): 2273-2275, 2279. (Ai Qing, Qin Yuping, Li Yingchun. Multi-subjects Text Classification Algorithm Based on Hyper-Sphere Support Vector Machines [J]. Computer Engineering and Design, 2010, 31(10): 2273-2275, 2279.)
[12] 王德成, 林辉. 一种SVM不平衡分类方法及在故障诊断的应用[J].电机与控制学报, 2012, 16(9): 48-52. (Wang Decheng, Lin Hui. Imbalanced Pattern Classification Method Based on Support Vector Machine and Its Application on Fault Diagnosis [J]. Electric Machines and Control, 2012, 16(9): 48-52.)
[13] 蒋华, 戚玉顺. 基于球结构支持向量机的多标签分类的主动学习[J]. 计算机应用, 2012, 32(5): 1359-1361. (Jiang Hua, Qi Yushun. Active Learning for Multi-label Classification Based on Sphere Structured Support Vector Machine[J]. Journal of Computer Applications, 2012, 32(5): 1359-1361.)
[14] 蒋华, 戚玉顺. 基于球结构SVM的多标签分类[J]. 计算机工程, 2013, 39(1): 294-297. (Jiang Hua, Qi Yushun. Multi-label Classification Based on Sphere Structured SVM [J]. Computer Engineering, 2013, 39(1): 294-297.)
[15] He Y H, Zhang K L. Support Vector Machines Based on Hyper-ball Clustering [C]. In: Proceedings of the International Conference on Machine Learning and Cybernetics, 2008: 840-844.
[16] Liu S, Shi G Y. Weighted Hyper-sphere SVM for Hypertext Classification [C]. In: Proceedings of the 5th International Symposium on Neural Networks: Advances in Neural Networks. Springer: Lecture Notes in Computer Science, 2008, 5263: 733-740.
[17] Han F, Li H, Wen C, et al. A New Incremental Support Vector Machine Algorithm [J]. Journal of Electrical Engineering, 2012, 10(6): 1171-1178.
[18] Cauwenberghs G, Poggio T. Incremental and Decremental Support Vector Machine Learning [C]. In: Proceedings of the 14th Annual Neural Information Processing Systems Conference (NIPS). MIT Press: Advances in Neural Information Processing Systems, 2001, 13: 409-415.
[19] Khreich W, Grangera E, Mirib A, et al. A Survey of Techniques for Incremental Learning of HMM Parameters [J]. Information Sciences, 2012, 197: 105-130.
[20] 刘爽, 陈鹏. 改进的超球支持向量机算法[J]. 计算机工程与应用, 2009, 45(16): 149-151. (Liu Shuang, Chen Peng. Improved Hyper-Sphere Support Vector Machine [J]. Computer Engineering and Applications, 2009, 45(16): 149-151.)
[21] 谭松波, 王月粉. 中文文本分类语料库-TanCorpV1.0 [EB/OL]. [2013-09-10]. http://www.searchforum.org.cn/ tansongbo/corpus.htm. (Tan Songbo, Wang Yuefen. The Corpus of Chinese Text Classification- TanCorpV1.0 [EB/ OL]. [2013-09-10]. http://www.searchforum.org.cn/tansongbo/ corpus.htm.)
[22] 中国科学院计算技术研究所. ICTCLAS2011[EB/OL]. [2013- 09-21]. http://ictclas.org/ictclas_download.aspx. (Institute of Computing Technology Chinese Academy of Sciences. ICTCLAS2011[EB/OL]. [2013-09-21]. http://ictclas.org/ ictclas_download.aspx.)
[23] Tsang I W, Kocsor A, Kwok J T. LibCVM Toolkit Version: 2.2 (beta)[EB/OL]. [2011-08-29]. http://c2inet.sce.ntu.edu.sg/ ivor/cvm.html.
[24] Yildírím E A. Two Algorithms for the Minimum Enclosing Ball Problem [J]. SIAM Journal on Optimization, 2008, 19(3): 1368-1391.
[25] Sebastiani F. Machine Learning in Automated Text Categorization [J]. ACM Computing Surveys, 2002, 34(1): 1-47.
[26] Mobasher B, Dai H,Luo T, et al. Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization [J]. Data Mining and Knowledge Discovery, 2002, 6(1): 61-82.

[1] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[2] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[3] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[4] He Li,Linlin Zhu,Min Yan,Jincheng Liu,Chuang Hong. Identifying Useful Information from Open Innovation Community[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
[5] Jiabin Qu,Shiyan Ou. Analyzing Topic Evolution with Topic Filtering and Relevance[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
[6] Guan Peng,Wang Yuefen. Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model[J]. 现代图书情报技术, 2016, 32(9): 42-50.
[7] Zhuo Keqiu, Yu Wei, Su Xinning. Parallel Implementing Bursty Events Detection Using MapReduce[J]. 现代图书情报技术, 2015, 31(2): 46-54.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn