Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (5): 71-81     https://doi.org/10.11925/infotech.2096-3467.2017.05.09
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于BPSO随机子空间的文本情感分类研究
张庆庆1,2(), 刘西林2
1西安工程大学管理学院 西安 710048
2西北工业大学管理学院 西安 710129
Classifying Sentiments Based on BPSO Random Subspace
Zhang Qingqing1,2(), Liu Xilin2
1School of Management, Xi’an Polytechnic University, Xi’an 710048, China
2School of Management, Northwestern Polytechnical University, Xi’an 710129, China
全文: PDF (1107 KB)   HTML ( 1
输出: BibTeX | EndNote (RIS)      
摘要 

目的】针对基于机器学习的文本情感分类研究中的文本特征表示向量高维性问题, 提出BPSO与随机子空间方法结合的选择性集成算法。【方法】在分析BPSO与随机子空间原理的基础上给出BPSO随机子空间的模型框架及算法流程。将中文评论语料进行特征化表示后, 使用BPSO随机子空间进行实验验证和分析。【结果】通过改变随机子空间中子空间率的取值, 研究标准随机子空间与BPSO随机子空间选择性集成对分类准确率和系统差异度的影响, 结果表明BPSO随机子空间无论在分类准确率还是在系统差异度上均高于标准随机子空间。【局限】尚未在英文数据上进行验证。【结论】将BPSO应用于随机子空间方法构成一种新颖的选择性集成模型, 不仅解决了特征向量空间高维性的问题, 而且提高了分类的准确率和泛化能力, 为中文文本情感分类提供了有效的方法。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张庆庆
刘西林
关键词 随机子空间BPSO文本情感分类子空间率    
Abstract

[Objective] This paper aims to solve the issue of representing high dimensional features in Chinese sentiment analysis, with the help of RS_BPSO, a selective ensemble algorithm. [Methods] First, we developed the framework and algorithm of the proposed RS_BPSO model based on the theory of Random Subspace and Binary Particle Optimization. Then, we transformed the Chinese review corpus into structured feature vectors and examined the new model. [Results] We found that the diversity and accuracy of the RS_BPSO model better than the standard RS model. [Limitations] We did not run the proposed model with corpus in foreign languages. [Conclusions] The RS_BPSO model could be an effective method to classify Chinese sentiments.

Key wordsRandom Subspace    BPSO    Text Sentiment Classification    Subspace Rate
收稿日期: 2017-03-28      出版日期: 2017-06-06
ZTFLH:  TP391.1  
引用本文:   
张庆庆, 刘西林. 基于BPSO随机子空间的文本情感分类研究[J]. 数据分析与知识发现, 2017, 1(5): 71-81.
Zhang Qingqing,Liu Xilin. Classifying Sentiments Based on BPSO Random Subspace. Data Analysis and Knowledge Discovery, 2017, 1(5): 71-81.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.05.09      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I5/71
  BPSO随机子空间算法流程
  基学习器选择结构示意图
数据集 三元组依存关系
酒店 140 911
图书 66 297
笔记本电脑 28 932
  三元组依存关系特征个数
质量法 公式 编号
Q统计 ${{Q}_{ij}}=\frac{{{N}^{11}}{{N}^{00}}-{{N}^{10}}{{N}^{01}}}{{{N}^{11}}{{N}^{00}}+{{N}^{10}}{{N}^{01}}}$ (6)
相关系数$\rho $ ${{\rho }_{ij}}=\frac{{{N}^{11}}{{N}^{00}}-{{N}^{10}}{{N}^{01}}}{\sqrt{({{N}^{11}}+{{N}^{10}})({{N}^{01}}+{{N}^{00}})({{N}^{11}}+{{N}^{01}})({{N}^{10}}+{{N}^{00}})}}$ (7)
不一致度量dis $di{{s}_{ij}}=({{N}^{10}}+{{N}^{01}})/N$ (8)
双次失败度量DF $D{{F}_{ij}}=\frac{{{N}^{00}}}{N}$ (9)
  集成系统差异度度量公式
k 酒店 图书 笔记本电脑
k=0.01 1 409 663 289
k=0.02 2 818 1 326 579
k=0.03 4 227 1 989 868
k=0.05 7 046 3 315 1 447
总个数 140 911 66 297 28 932
  随机子空间方法下特征子集维数
k RS RS_BPSO
0.01 0.6825 0.8342(17)
0.02 0.7183 0.8013(14)
0.03 0.7717 0.8293(13)
0.05 0.8075 0.8429(19)
  酒店评论数据分类准确率比较
k RS RS_BPSO
0.01 0.6867 0.8270(19)
0.02 0.7033 0.8434(19)
0.03 0.7633 0.8208(20)
0.05 0.785 0.8325(21)
  图书评论数据分类准确率比较
k RS RS_BPSO
0.01 0.7867 0.8517(24)
0.02 0.8267 0.8762(29)
0.03 0.8067 0.8717(28)
0.05 0.8233 0.8634(22)
  笔记本电脑评论数据分类准确率比较
k DF dis Q统计 相关系数$\rho $
RS RS_BPSO RS RS_BPSO RS RS_BPSO RS RS_BPSO
0.01 0.3668 0.3715 0.4378 0.466 0.1507 0.0127 0.0972 0.0263
0.02 0.4396 0.4437 0.3759 0.4153 0.3794 0.1699 0.1958 0.0864
0.03 0.4677 0.4862 0.3718 0.379 0.3612 0.2837 0.179 0.136
0.05 0.5289 0.5452 0.333 0.3266 0.4448 0.4434 0.2144 0.2099
  酒店评论数据集成系统差异度比较
k DF dis Q统计 相关系数$\rho $
RS RS_BPSO RS RS_BPSO RS RS_BPSO RS RS_BPSO
0.01 0.321 0.3174 0.4701 0.4963 0.0667 -0.0321 0.048 -0.0099
0.02 0.3751 0.3834 0.4383 0.4585 0.1594 0.0477 0.0903 0.0351
0.03 0.4094 0.4079 0.409 0.44 0.2615 0.1071 0.1368 0.0589
0.05 0.4543 0.4576 0.3895 0.4115 0.2935 0.1663 0.1448 0.079
  图书评论数据集成系统差异度比较
k DF dis Q统计 相关系数$\rho $
RS RS_BPSO RS RS_BPSO RS RS_BPSO RS RS_BPSO
0.01 0.3284 0.3271 0.4722 0.4986 0.0422 -0.0616 0.0399 -0.021
0.02 0.3753 0.3796 0.4559 0.4629 0.0482 0.0233 0.061 0.0265
0.03 0.4114 0.4073 0.428 0.441 0.1462 0.077 0.0875 0.057
0.05 0.4731 0.4764 0.3879 0.3909 0.2504 0.2225 0.1276 0.1146
  笔记本电脑评论文本集成系统差异度比较
  数据集差异度比较
  酒店评论数据适应值对迭代次数的变化
  图书评论数据适应值对迭代次数的变化
  笔记本电脑评论数据适应值对迭代次数的变化
  笔记本电脑数据集上粒子数目与分类准确率趋势图(k=0.01)
[1] Agarwal B, Mittal N.Machine Learning Approach for Sentiment Analysis [A]// Prominent Feature Extraction for Sentiment Analysis[M]. Springer, International Publishing, 2016: 21-45.
[2] Vinodhini G, Chandrasekaran R.Sentiment Analysis and Opinion Mining: A Survey[J]. International Journal of Advanced Research in Computer Science and Software Engineering, 2012, 2(6): 282-292.
doi: 10.1007/978-1-4899-7502-7_907-1
[3] Liu B, Zhang L.A Survey of Opinion Mining and Sentiment Analysis [A].// Mining Text Data[M]. Springer US, 2012.
[4] 张庆庆, 刘西林. 基于依存句法关系的文本情感分类研究[J]. 计算机工程与应用, 2015, 51(22): 28-32.
doi: 10.3778/j.issn.1002-8331.1508-0237
[4] (Zhang Qingqing, Liu Xilin.Sentiment Analysis Based on Dependency Sytactic Relation[J]. Computer Engineering and Applications, 2015, 51(22): 28-32.)
doi: 10.3778/j.issn.1002-8331.1508-0237
[5] Wang G, Sun J, Ma J, et al.Sentiment Classification: The Contribution of Ensemble Learning[J]. Decision Support Systems, 2014, 57(1): 77-93.
doi: 10.1016/j.dss.2013.08.002
[6] Wang G, Zhang Z, Sun J, et al.POS-RS: A Random Subspace Method for Sentiment Classification Based on Part-of-Speech Analysis[J]. Information Processing & Management, 2015, 51(4): 458-479.
doi: 10.1016/j.ipm.2014.09.004
[7] Dasarathy B V, Sheela B V.A Composite Classifier System Design: Concepts and Methodology[J]. Proceedings of the IEEE, 1979, 67(5): 708-713.
doi: 10.1109/PROC.1979.11321
[8] Polikar R.Ensemble Based Systems in Decision Making[J]. IEEE Circuits and Systems Magazine, 2006, 6(3): 21-45.
doi: 10.1109/MCAS.2006.1688199
[9] Dietterich T G.Ensemble Methods in Machine Leanring[C]// Proceedings of the 1st International Workshop on Multiple Classifier Systems.2000.
[10] Ho T K.The Random Subspace Method for Constructing Decision Forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832-844.
doi: 10.1109/34.709601
[11] 孙博, 王建东, 陈海燕, 等. 集成学习中的多样性度量[J]. 控制与决策, 2014, 29(3): 385-395.
doi: 10.13195/j.kzyjc.2013.1334
[11] (Sun Bo, Wang Jiandong, Chen Haiyan, et al.Diversity Measures in Ensemble Learning[J]. Control and Decision, 2014, 29(3): 385-395.
doi: 10.13195/j.kzyjc.2013.1334
[12] Zhou Z H, Wu J X, Jiang Y, et al.Genetic Algorithm Based Selective Neural Network Ensemble[C]// Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001.
[13] Tama B A, Rhee K H.A Combination of PSO-Based Feature Selection and Tree-Based Classifiers Ensemble for Intrusion Detection Systems [A].// Advances in Computer Science and Ubiquitous Computing[M]. Singapore: Springer, 2015.
[14] Hedeshi N G, Abadeh M S.Coronary Artery Disease Detection Using a Fuzzy-boosting PSO Approach [J]. Computational Intelligence and Neuroscience, 2014, 2014: Article No. 783734. .
[15] Tsai C Y, Chen C J.A PSO-AB Classifier for Solving Sequence Classification Problems[J]. Applied Soft Computing, 2015, 27: 11-27.
doi: 10.1016/j.asoc.2014.10.029
[16] Kennedy J, Eberhart R C.A Discrete Binary Version of the Particle Swarm Algorithm[C]//Proceedings of the 1997 Conference on Systems, Man, and Cybernetics. 1997: 4104-4108.
[17] Chandra A, Chen H, Yao X.Trade-off Between Diversity and Accuracy in Ensemble Generation [A]// Multi-objective Machine Learning[M]. Springer Berlin Heidelberg, 2006.
[18] Ko A H R, Sabourin R, De Souza Britt Jr A. Combining Diversity and Classification Accuracy for Ensemble Selection in Random Subspaces[C]//Proceedings of the International Joint Conference on Neural Networks.2006.
[19] Ko A H R, Sabourin R, De Souza Britto Jr A. Compound Diversity Functions for Ensemble Selection[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2009, 23(4): 659-686.
doi: 10.1142/S021800140900734X
[1] 张庆庆,贺兴时,王慧敏,蒙胜军. 基于深度信念网络的文本情感分类研究*[J]. 数据分析与知识发现, 2019, 3(4): 71-79.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn