Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (9): 74-80    DOI: 10.11925/infotech.1003-3513.2014.09.10
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
超球支持向量机文本分类方法改进
胡吉明, 陈果
武汉大学信息资源研究中心 武汉 430072
Study on Improvement of Text Classification Using HS-SVM
Hu Jiming, Chen Guo
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
全文: PDF(622 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 针对文本分类中类别特征向量改变和重叠等问题,对超球支持向量机(HS-SVM)分类算法进行改进。[方法] 基于增量学习和密度决策函数对原始HS-SVM 进行改进,实现超球类支持向量的动态改变,准确计算构造超球支持向量机的决策函数,从而达到提高文本分类效果的目的。[结果] 与原始超球支持向量机的文本分类实验对比表明,本文所提方法在准确率和召回率方面优于其他方案,建模时间减少且对预测精确度的影响不大。[局限] 应进行多种类型数据集上的实验验证,推广方法改进的适用性; 其次对分类算法的底层改进欠缺,需继续探索。[结论] 本研究有利于提高大规模文本分类的准确性和减少训练时间,从而提升文本分类效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈果
胡吉明
关键词 LDA 主题模型超球支持向量机增量学习密度决策函数    
Abstract

[Objective] In terms of the class features vector changing and overlapping, this paper improves the classification algorithm conducted by super ball supported vector machine. [Methods] Starting from combing the operational mechanism of LDA and HS-SVM, as well as the related studies, this paper constructs a text classification model based on LDA and HS-SVM. The traditional HS-SVM is improved considering incremental learning and intensive degree, and then the dynamic change of hyper-sphere class' support vector would be achieved and the decision function for constructing hyper-sphere support vector machine would be accurately calculated. [Results] The effect of text classification can be improved from the perspectives of precision rate and recall rate. Comparative experiments are conducted and the results demonstrate that methods in this article are feasible and effective which can effectively improve texts classification. In addition, this method reduces the time of modeling and has little influence on accuracy of predication. [Limitations] Noted that the proposal in this paper is comparatively more complex than the original algorithm that need continuous improvement; and the results needs experiments on more data sets. Meanwhile, the improvement on essence of algorithm is not optimal which is necessary to be further studied. [Conclusions] This study is helpful to improve the accuracy and reduce the training time in large-scale text categorization, and also improve the efficiency and performance of text classification.

Key wordsLDA topic model    Hyper-Sphere Support Vector Machine(HS-SVM)    Incremental learning    Intensive degree decision function
收稿日期: 2013-12-04     
:  TP391  
基金资助:

本文系教育部人文社会科学青年基金项目“社会网络环境下信息内容主题挖掘与语义分类研究”(项目编号:13YJC870008)和国家自然科学基金青年基金项目“社会网络环境下基于用户-资源关联的信息推荐研究(项目编号:71303178)的研究成果之一。

通讯作者: 胡吉明 E-mail:whuhujiming@qq.com     E-mail: whuhujiming@qq.com
作者简介: 作者贡献声明:胡吉明:提出研究思路,设计研究方案,实施研究过程,撰写和修正研究论文;陈果:采集、清洗和分析数据并进行对比实验。
引用本文:   
胡吉明, 陈果. 超球支持向量机文本分类方法改进[J]. 现代图书情报技术, 2014, 30(9): 74-80.
Hu Jiming, Chen Guo. Study on Improvement of Text Classification Using HS-SVM. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2014.09.10.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.09.10

[1] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] 张玉峰, 何超.基于潜在语义分析和改进的HS-SVM的文本分类模型研究[J].图书情报工作, 2010, 54(10): 109-113. (Zhang Yufeng, He Chao. Research of Text Classification Model Based on Latent Semantic Analysis and Improved of HS-SVM [J]. Library and Information Service, 2010, 54(10): 109-113.)
[3] Lakshminarayanan B, Raich R. Inference in Supervised Latent Dirichlet Allocation [C]. In: Proceedings of the 21st IEEE International Workshop on Machine Learning for Signal Processing (MLSP). 2011.
[4] Momtazi S, Naumann F. Topic Modeling for Expert Finding Using Latent Dirichlet Allocation [J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2013, 3(5): 346-353.
[5] Du L, Buntine W, Jin H, et al. Sequential Latent Dirichlet Allocation [J]. Knowledge and Information Systems, 2012, 31(3): 475-503.
[6] Guo Q, Li N, Yang Y, et al. Supervised LDA for Image Annotation[C]. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC). 2011: 471-476.
[7] Tsang I W, Kocsor A, Kwok J T. Simpler Core Vector Machines with Enclosing Balls [C]. In: Proceedings of the 24th International Conference on Machine Learning. New York, NY, USA: ACM, 2007: 911-918.
[8] Strack R, Kecman V, Strack B, et al. Sphere Support Vector Machines for Large Classification Tasks [J]. Neurocom-puting, 2013, 101:59-67.
[9] Chau A L, Li X O, Yu W. Large Data Sets Classification Using Convex-Concave Hull and Support Vector Machine [J]. Soft Computing, 2013, 17(5): 793-804.
[10] Yun S W, Shu Y X, Ge B. An Algorithm of Sphere-Structure Support Vector Machine Multi-classification Recognition on the Basis of Weighted Relative Distances [C]. In: Proceedings of the International Conference on Life System Modeling and Simulation/International Conference on Intelligent Computing for Sustainable Energy and Environment. Berlin: Springer, 2010:506-514.
[11] 艾青, 秦玉平, 李迎春. 基于超球支持向量机的多主题文本分类算法[J]. 计算机工程与设计, 2010, 31(10): 2273-2275, 2279. (Ai Qing, Qin Yuping, Li Yingchun. Multi-subjects Text Classification Algorithm Based on Hyper-Sphere Support Vector Machines [J]. Computer Engineering and Design, 2010, 31(10): 2273-2275, 2279.)
[12] 王德成, 林辉. 一种SVM不平衡分类方法及在故障诊断的应用[J].电机与控制学报, 2012, 16(9): 48-52. (Wang Decheng, Lin Hui. Imbalanced Pattern Classification Method Based on Support Vector Machine and Its Application on Fault Diagnosis [J]. Electric Machines and Control, 2012, 16(9): 48-52.)
[13] 蒋华, 戚玉顺. 基于球结构支持向量机的多标签分类的主动学习[J]. 计算机应用, 2012, 32(5): 1359-1361. (Jiang Hua, Qi Yushun. Active Learning for Multi-label Classification Based on Sphere Structured Support Vector Machine[J]. Journal of Computer Applications, 2012, 32(5): 1359-1361.)
[14] 蒋华, 戚玉顺. 基于球结构SVM的多标签分类[J]. 计算机工程, 2013, 39(1): 294-297. (Jiang Hua, Qi Yushun. Multi-label Classification Based on Sphere Structured SVM [J]. Computer Engineering, 2013, 39(1): 294-297.)
[15] He Y H, Zhang K L. Support Vector Machines Based on Hyper-ball Clustering [C]. In: Proceedings of the International Conference on Machine Learning and Cybernetics, 2008: 840-844.
[16] Liu S, Shi G Y. Weighted Hyper-sphere SVM for Hypertext Classification [C]. In: Proceedings of the 5th International Symposium on Neural Networks: Advances in Neural Networks. Springer: Lecture Notes in Computer Science, 2008, 5263: 733-740.
[17] Han F, Li H, Wen C, et al. A New Incremental Support Vector Machine Algorithm [J]. Journal of Electrical Engineering, 2012, 10(6): 1171-1178.
[18] Cauwenberghs G, Poggio T. Incremental and Decremental Support Vector Machine Learning [C]. In: Proceedings of the 14th Annual Neural Information Processing Systems Conference (NIPS). MIT Press: Advances in Neural Information Processing Systems, 2001, 13: 409-415.
[19] Khreich W, Grangera E, Mirib A, et al. A Survey of Techniques for Incremental Learning of HMM Parameters [J]. Information Sciences, 2012, 197: 105-130.
[20] 刘爽, 陈鹏. 改进的超球支持向量机算法[J]. 计算机工程与应用, 2009, 45(16): 149-151. (Liu Shuang, Chen Peng. Improved Hyper-Sphere Support Vector Machine [J]. Computer Engineering and Applications, 2009, 45(16): 149-151.)
[21] 谭松波, 王月粉. 中文文本分类语料库-TanCorpV1.0 [EB/OL]. [2013-09-10]. http://www.searchforum.org.cn/ tansongbo/corpus.htm. (Tan Songbo, Wang Yuefen. The Corpus of Chinese Text Classification- TanCorpV1.0 [EB/ OL]. [2013-09-10]. http://www.searchforum.org.cn/tansongbo/ corpus.htm.)
[22] 中国科学院计算技术研究所. ICTCLAS2011[EB/OL]. [2013- 09-21]. http://ictclas.org/ictclas_download.aspx. (Institute of Computing Technology Chinese Academy of Sciences. ICTCLAS2011[EB/OL]. [2013-09-21]. http://ictclas.org/ ictclas_download.aspx.)
[23] Tsang I W, Kocsor A, Kwok J T. LibCVM Toolkit Version: 2.2 (beta)[EB/OL]. [2011-08-29]. http://c2inet.sce.ntu.edu.sg/ ivor/cvm.html.
[24] Yildírím E A. Two Algorithms for the Minimum Enclosing Ball Problem [J]. SIAM Journal on Optimization, 2008, 19(3): 1368-1391.
[25] Sebastiani F. Machine Learning in Automated Text Categorization [J]. ACM Computing Surveys, 2002, 34(1): 1-47.
[26] Mobasher B, Dai H,Luo T, et al. Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization [J]. Data Mining and Knowledge Discovery, 2002, 6(1): 61-82.

[1] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] 尤众喜,华薇娜,潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响 *[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] 关鹏,王曰芬,傅柱. 基于LDA的主题语义演化分析方法研究 * ——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法
研究 *
[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] 孔贝贝,谢靖,钱力,常志军,吴振新. 科技大数据增值丰富化方法研究与工具研发 *[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[6] 范雪雪, 王志荣, 徐晤, 梁银, 马小虎. 基于医学本体的术语相似度算法研究[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[7] 任海英, 于立婷. 一种基于维基百科的多策略词义消歧方法[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[8] 杜坤, 刘怀亮, 郭路杰. 结合复杂网络的特征权重改进算法研究[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[9] 叶川, 马静. 多媒体微博评论信息的主题发现算法研究[J]. 现代图书情报技术, 2015, 31(11): 51-59.
[10] 颉夏青, 吴旭. “经典阅读”网络平台可视化技术应用及实现[J]. 现代图书情报技术, 2015, 31(11): 96-103.
[11] 何宇, 吕学强, 徐丽萍. 新能源汽车领域中文术语抽取方法[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[12] 杜思奇, 李红莲, 吕学强. 汉语组块分析在产品特征提取中的应用研究[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[13] 许德山, 李辉, 张运良. 文献关键词链接标引方法研究[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[14] 敦文杰, 孙一钢, 朱先忠. 互联网络电视多媒体文档格式设计与实现[J]. 现代图书情报技术, 2015, 31(9): 82-89.
[15] 陈诗琴, 李文江. WebSocket在图书馆移动信息服务中的应用[J]. 现代图书情报技术, 2015, 31(9): 90-96.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn