Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (9): 74-80     https://doi.org/10.11925/infotech.1003-3513.2014.09.10
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
超球支持向量机文本分类方法改进
胡吉明, 陈果
武汉大学信息资源研究中心 武汉 430072
Study on Improvement of Text Classification Using HS-SVM
Hu Jiming, Chen Guo
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
全文: PDF (622 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 针对文本分类中类别特征向量改变和重叠等问题,对超球支持向量机(HS-SVM)分类算法进行改进。[方法] 基于增量学习和密度决策函数对原始HS-SVM 进行改进,实现超球类支持向量的动态改变,准确计算构造超球支持向量机的决策函数,从而达到提高文本分类效果的目的。[结果] 与原始超球支持向量机的文本分类实验对比表明,本文所提方法在准确率和召回率方面优于其他方案,建模时间减少且对预测精确度的影响不大。[局限] 应进行多种类型数据集上的实验验证,推广方法改进的适用性; 其次对分类算法的底层改进欠缺,需继续探索。[结论] 本研究有利于提高大规模文本分类的准确性和减少训练时间,从而提升文本分类效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈果
胡吉明
关键词 LDA 主题模型超球支持向量机增量学习密度决策函数    
Abstract

[Objective] In terms of the class features vector changing and overlapping, this paper improves the classification algorithm conducted by super ball supported vector machine. [Methods] Starting from combing the operational mechanism of LDA and HS-SVM, as well as the related studies, this paper constructs a text classification model based on LDA and HS-SVM. The traditional HS-SVM is improved considering incremental learning and intensive degree, and then the dynamic change of hyper-sphere class' support vector would be achieved and the decision function for constructing hyper-sphere support vector machine would be accurately calculated. [Results] The effect of text classification can be improved from the perspectives of precision rate and recall rate. Comparative experiments are conducted and the results demonstrate that methods in this article are feasible and effective which can effectively improve texts classification. In addition, this method reduces the time of modeling and has little influence on accuracy of predication. [Limitations] Noted that the proposal in this paper is comparatively more complex than the original algorithm that need continuous improvement; and the results needs experiments on more data sets. Meanwhile, the improvement on essence of algorithm is not optimal which is necessary to be further studied. [Conclusions] This study is helpful to improve the accuracy and reduce the training time in large-scale text categorization, and also improve the efficiency and performance of text classification.

Key wordsLDA topic model    Hyper-Sphere Support Vector Machine(HS-SVM)    Incremental learning    Intensive degree decision function
收稿日期: 2013-12-04      出版日期: 2014-10-20
:  TP391  
基金资助:

本文系教育部人文社会科学青年基金项目“社会网络环境下信息内容主题挖掘与语义分类研究”(项目编号:13YJC870008)和国家自然科学基金青年基金项目“社会网络环境下基于用户-资源关联的信息推荐研究(项目编号:71303178)的研究成果之一。

通讯作者: 胡吉明 E-mail:whuhujiming@qq.com     E-mail: whuhujiming@qq.com
作者简介: 作者贡献声明:胡吉明:提出研究思路,设计研究方案,实施研究过程,撰写和修正研究论文;陈果:采集、清洗和分析数据并进行对比实验。
引用本文:   
胡吉明, 陈果. 超球支持向量机文本分类方法改进[J]. 现代图书情报技术, 2014, 30(9): 74-80.
Hu Jiming, Chen Guo. Study on Improvement of Text Classification Using HS-SVM. New Technology of Library and Information Service, 2014, 30(9): 74-80.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.09.10      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I9/74

[1] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] 张玉峰, 何超.基于潜在语义分析和改进的HS-SVM的文本分类模型研究[J].图书情报工作, 2010, 54(10): 109-113. (Zhang Yufeng, He Chao. Research of Text Classification Model Based on Latent Semantic Analysis and Improved of HS-SVM [J]. Library and Information Service, 2010, 54(10): 109-113.)
[3] Lakshminarayanan B, Raich R. Inference in Supervised Latent Dirichlet Allocation [C]. In: Proceedings of the 21st IEEE International Workshop on Machine Learning for Signal Processing (MLSP). 2011.
[4] Momtazi S, Naumann F. Topic Modeling for Expert Finding Using Latent Dirichlet Allocation [J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2013, 3(5): 346-353.
[5] Du L, Buntine W, Jin H, et al. Sequential Latent Dirichlet Allocation [J]. Knowledge and Information Systems, 2012, 31(3): 475-503.
[6] Guo Q, Li N, Yang Y, et al. Supervised LDA for Image Annotation[C]. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC). 2011: 471-476.
[7] Tsang I W, Kocsor A, Kwok J T. Simpler Core Vector Machines with Enclosing Balls [C]. In: Proceedings of the 24th International Conference on Machine Learning. New York, NY, USA: ACM, 2007: 911-918.
[8] Strack R, Kecman V, Strack B, et al. Sphere Support Vector Machines for Large Classification Tasks [J]. Neurocom-puting, 2013, 101:59-67.
[9] Chau A L, Li X O, Yu W. Large Data Sets Classification Using Convex-Concave Hull and Support Vector Machine [J]. Soft Computing, 2013, 17(5): 793-804.
[10] Yun S W, Shu Y X, Ge B. An Algorithm of Sphere-Structure Support Vector Machine Multi-classification Recognition on the Basis of Weighted Relative Distances [C]. In: Proceedings of the International Conference on Life System Modeling and Simulation/International Conference on Intelligent Computing for Sustainable Energy and Environment. Berlin: Springer, 2010:506-514.
[11] 艾青, 秦玉平, 李迎春. 基于超球支持向量机的多主题文本分类算法[J]. 计算机工程与设计, 2010, 31(10): 2273-2275, 2279. (Ai Qing, Qin Yuping, Li Yingchun. Multi-subjects Text Classification Algorithm Based on Hyper-Sphere Support Vector Machines [J]. Computer Engineering and Design, 2010, 31(10): 2273-2275, 2279.)
[12] 王德成, 林辉. 一种SVM不平衡分类方法及在故障诊断的应用[J].电机与控制学报, 2012, 16(9): 48-52. (Wang Decheng, Lin Hui. Imbalanced Pattern Classification Method Based on Support Vector Machine and Its Application on Fault Diagnosis [J]. Electric Machines and Control, 2012, 16(9): 48-52.)
[13] 蒋华, 戚玉顺. 基于球结构支持向量机的多标签分类的主动学习[J]. 计算机应用, 2012, 32(5): 1359-1361. (Jiang Hua, Qi Yushun. Active Learning for Multi-label Classification Based on Sphere Structured Support Vector Machine[J]. Journal of Computer Applications, 2012, 32(5): 1359-1361.)
[14] 蒋华, 戚玉顺. 基于球结构SVM的多标签分类[J]. 计算机工程, 2013, 39(1): 294-297. (Jiang Hua, Qi Yushun. Multi-label Classification Based on Sphere Structured SVM [J]. Computer Engineering, 2013, 39(1): 294-297.)
[15] He Y H, Zhang K L. Support Vector Machines Based on Hyper-ball Clustering [C]. In: Proceedings of the International Conference on Machine Learning and Cybernetics, 2008: 840-844.
[16] Liu S, Shi G Y. Weighted Hyper-sphere SVM for Hypertext Classification [C]. In: Proceedings of the 5th International Symposium on Neural Networks: Advances in Neural Networks. Springer: Lecture Notes in Computer Science, 2008, 5263: 733-740.
[17] Han F, Li H, Wen C, et al. A New Incremental Support Vector Machine Algorithm [J]. Journal of Electrical Engineering, 2012, 10(6): 1171-1178.
[18] Cauwenberghs G, Poggio T. Incremental and Decremental Support Vector Machine Learning [C]. In: Proceedings of the 14th Annual Neural Information Processing Systems Conference (NIPS). MIT Press: Advances in Neural Information Processing Systems, 2001, 13: 409-415.
[19] Khreich W, Grangera E, Mirib A, et al. A Survey of Techniques for Incremental Learning of HMM Parameters [J]. Information Sciences, 2012, 197: 105-130.
[20] 刘爽, 陈鹏. 改进的超球支持向量机算法[J]. 计算机工程与应用, 2009, 45(16): 149-151. (Liu Shuang, Chen Peng. Improved Hyper-Sphere Support Vector Machine [J]. Computer Engineering and Applications, 2009, 45(16): 149-151.)
[21] 谭松波, 王月粉. 中文文本分类语料库-TanCorpV1.0 [EB/OL]. [2013-09-10]. http://www.searchforum.org.cn/ tansongbo/corpus.htm. (Tan Songbo, Wang Yuefen. The Corpus of Chinese Text Classification- TanCorpV1.0 [EB/ OL]. [2013-09-10]. http://www.searchforum.org.cn/tansongbo/ corpus.htm.)
[22] 中国科学院计算技术研究所. ICTCLAS2011[EB/OL]. [2013- 09-21]. http://ictclas.org/ictclas_download.aspx. (Institute of Computing Technology Chinese Academy of Sciences. ICTCLAS2011[EB/OL]. [2013-09-21]. http://ictclas.org/ ictclas_download.aspx.)
[23] Tsang I W, Kocsor A, Kwok J T. LibCVM Toolkit Version: 2.2 (beta)[EB/OL]. [2011-08-29]. http://c2inet.sce.ntu.edu.sg/ ivor/cvm.html.
[24] Yildírím E A. Two Algorithms for the Minimum Enclosing Ball Problem [J]. SIAM Journal on Optimization, 2008, 19(3): 1368-1391.
[25] Sebastiani F. Machine Learning in Automated Text Categorization [J]. ACM Computing Surveys, 2002, 34(1): 1-47.
[26] Mobasher B, Dai H,Luo T, et al. Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization [J]. Data Mining and Knowledge Discovery, 2002, 6(1): 61-82.

[1] 王鸿, 舒展, 高印权, 田文洪. 一种单分类器联合多任务网络的隐式句间关系分析方法*[J]. 数据分析与知识发现, 2021, 5(11): 80-88.
[2] 吴彦文, 蔡秋亭, 刘智, 邓云泽. 融合多源数据和场景相似度计算的数字资源推荐研究*[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[3] 李振宇, 李树青. 嵌入隐式相似群的深度协同过滤算法*[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[4] 董淼, 苏中琪, 周晓北, 兰雪, 崔志刚, 崔雷. 利用Text-CNN改进PubMedBERT在化学诱导性疾病实体关系分类效果的尝试[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[5] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[6] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] 华斌, 吴诺, 贺欣. 基于知识融合的政务信息化项目多专家审批意见整合*[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[8] 王媛, 时恺泽, 牛振东. 一种用于实体关系三元组抽取的位置辅助分步标记方法*[J]. 数据分析与知识发现, 2021, 5(10): 71-80.
[9] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[10] 戴志宏, 郝晓玲. 上下位关系抽取方法及其在金融市场的应用*[J]. 数据分析与知识发现, 2021, 5(10): 60-70.
[11] 汪雪锋, 任惠超, 刘玉琴. 融合聚类信息的技术主题图可视化方法研究 [J]. 数据分析与知识发现, 0, (): 1-.
[12] 王一钒,李博,史话,苗威,姜斌. 古汉语实体关系联合抽取的标注方法*[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[13] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[14] 周阳,李学俊,王冬磊,陈方,彭莉娟. 炸药配方设计知识图谱的构建与可视分析方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[15] 马江微, 吕学强, 游新冬, 肖刚, 韩君妹. 融合BERT与关系位置特征的军事领域关系抽取方法*[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn