Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (2): 98-107    DOI: 10.11925/infotech.2096-3467.2018.0578
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于中文社交媒体文本的领域情感词典构建方法研究*
蒋翠清1,2(),郭轶博1,刘尧1
1合肥工业大学管理学院 合肥 230009
2过程优化与智能决策教育部重点实验室 合肥 230009
Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text
Cuiqing Jiang1,2(),Yibo Guo1,Yao Liu1
1School of Management, Hefei University of Technology, Hefei 230009, China
2Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education, Hefei 230009, China
全文: PDF(632 KB)   HTML ( 7
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】从社交媒体用户生成内容中发现未知情感词, 构造领域情感词典, 应用于汽车评论的情感分析。【方法】选取HowNet情感词典作为种子, 以实际汽车评论作为语料, 分别利用PMI和Word2Vec算法识别新词情感极性, 根据集成规则对二者识别结果综合判定, 通过情感分类实验对比显示本文算法的有效性。【结果】按照该方法构造的情感词典准确率比HowNet情感词典提高21.6%, 较分别使用PMI和Word2Vec算法构建的词典分别提升3.7%和2.1%, 同时正面、负面情感词数量均有大幅增加。【局限】语料来源单一, 应用于其他领域具有一定局限性。【结论】该方法构造的情感词典可有效应用于社交媒体文本情感分析。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
蒋翠清
郭轶博
刘尧
关键词 社交媒体情感分析情感词典PMIWord2Vec    
Abstract

[Objective] This study aims to construct a domain sentiment lexicon by discovering unrecognized sentiment words from user-generated contents on Chinese social media to apply it to automotive comments sentiment analysis. [Methods] First, words in HowNet are selected as the seeds, and PMI and Word2Vec algorithm are used to calculate the sentiment polarity of the candidates respectively on real automative corpus. Then the results of the two discriminations are judged synthetically according to the ensemble rules. Finally the proposed method was shown effective by the comparison of the sentiment classification experiments. [Results] The accuracy rate of the lexicon constructed according to proposed method is 21.6% higher than that of HowNet. The lexicon constructed by PMI and Word2Vec respectively increase 3.7% and 2.1%. Meanwhile the number of positive and negative emotional words are greatly increased. [Limitations] The source of corpus is single, and it has certain limitations in guiding other fields. [Conclusions] The sentiment lexicon constructed by this method can be applied to sentiment analysis of social media texts effectively.

Key wordsSocial Media    Sentiment Analysis    Sentiment Lexicon    PMI    Word2Vec
收稿日期: 2018-05-23     
基金资助:*本文系国家自然科学基金项目“基于社交媒体用户生成内容的产品创新需求发现方法研究”(项目编号: 71571059)、国家自然科学基金重点项目“大数据环境下的微观信用评价理论与方法研究”(项目编号: 71731005)和安徽省教育厅高校人文社会科学重大研究项目“社会媒体环境下企业舆情演化机理与管控研究”(项目编号: SK2014ZD054)的研究成果之一
引用本文:   
蒋翠清,郭轶博,刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0578.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0578
[1] Liu B.Sentiment Analysis and Opinion Mining[A]//Synthesis Lectures on Human Language Technologies[M]. Morgan & Claypool Publishers, 2012: 152-153.
[2] Hogenboom A, Heerschop B, Frasincar F, et al.Multi-lingual Support for Lexicon-based Sentiment Analysis Guided by Semantics[J]. Decision Support Systems, 2014, 62(2): 43-53.
[3] Wu F, Huang Y, Song Y, et al.Towards Building a High-quality Microblog-specific Chinese Sentiment Lexicon[J]. Decision Support Systems, 2016, 87: 39-49.
[4] Fellbaum C, Miller G.WordNet: An Electronic Lexical Database[M]. MIT Press, 1998.
[5] Stone P J, Dunphy D C, Smith M S.The General Inquirer: A Computer Approach to Content Analysis[J]. Information Storage & Retrieval, 1966, 4(4): 375-376.
[6] Dong Z, Dong Q.HowNet - A Hybrid Language and Knowledge Resource[C]// Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. 2003.
[7] 王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016,42(4): 495-511.
[7] (Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511.)
[8] Loughran T, Mcdonald B.When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10‐Ks[J]. Journal of Finance, 2011, 66(1): 35-65.
[9] Church K W, Hanks P. Word Association Norms, Mutual Information,Lexicography[J]. Computational Linguistics, 1990, 16(1): 76-83.
[10] Turney P D, Littman M L.Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
[11] Deng S, Sinha A P, Zhao H.Adapting Sentiment Lexicons to Domain-Specific Social Media Texts[J]. Decision Support Systems, 2017, 94: 65-76.
[12] 郭顺利, 张向先. 面向中文图书评论的情感词典构建方法研究[J]. 现代图书情报技术, 2016(2): 67-74.
[12] (Guo Shunli, Zhang Xiangxian.Building Sentiment Analysis Dictionary for Chinese Book Reviews[J]. New Technology of Library and Information Service, 2016(2): 67-74.)
[13] 郗亚辉. 产品评论中领域情感词典的构建[J]. 中文信息学报, 2016, 30(5): 136-144.
[13] (Xi Yahui.Construction of Domain-specific Sentiment Lexicon in Product Reviews[J]. Journal of Chinese Information Processing, 2016, 30(5): 136-144.)
[14] 朱嫣岚, 闵锦, 周雅倩, 等. 基于HowNet的词汇语义倾向计算[J]. 中文信息学报, 2006, 20(1): 14-20.
[14] (Zhu Yanlan, Min Jin, Zhou Yaqian, et al.Semantic Orientation Computing Based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1): 14-20.)
[15] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.378.
[16] Mikolov T, KarafiáT M, BURGET L, et al. Recurrent Neural Network Based Language Model[C]//Proceedings of the 2010 Conference of the International Speech Communication Association, Makuhari, Chiba, Japan. 2010.
[17] 杨小平, 张中夏, 王良, 等. 基于Word2Vec的情感词典自动构建与优化[J]. 计算机科学, 2017, 44(1): 42-47.
[17] (Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J].Computer Science, 2017, 44(1): 42-47.)
[18] 王仁武, 宋家怡, 陈川宝. 基于Word2vec的情感分析在品牌认知中的应用研究[J]. 图书情报工作, 2017, 61(22): 6-12.
[18] (Wang Renwu, Song Jiayi, Chen Chuanbao.Application of Sentiment Analysis Based on Word2vec in Brand Cognition[J]. Library and Information Service, 2017, 61(22): 6-12.)
[19] Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C]// Proceedings of the 21st International Jont Conference on Artifical Intelligence. 2009.
[20] Filho J L, Canuto A P, Santiago R N.Investigating the Impact of Selection Criteria in Dynamic Ensemble Selection Methods[J]. Expert Systems with Applications, 2018,106: 141-153.
[21] Kittler J V, Hatef M, Duin R W, et al.On Combining Classfiers[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(3): 226-239.
[22] 黄伟, 范磊. 基于多分类器投票集成的半监督情感分类方法研究[J]. 中文信息学报, 2016, 30(2): 41-49.
[22] (Huang Wei, Fan Lei.Semi-supervised Sentiment Classification Based on Ensemble Learning with Voting[J]. Journal of Chinese Information Processing, 2016, 30(2): 41-49.)
[23] Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
[24] Li Y, Guo H, Liu X, et al.Adapted Ensemble Classification Algorithm Based on Multiple Classifier System and Feature Selection for Classifying Multi-class Imbalanced Data[J]. Knowledge-Based Systems, 2016, 94: 88-104.
[25] 汽车之家论坛[EB/OL]. [2018-03-01].
[25] (AutoHome Forum[EB/OL]. [2018-03-01]..)
[26] 结巴中文分词[CP/OL]. [2018-03-01]..
[26] (Jieba: Chinese Text Segmentation[CP/OL] . [2018-03-01]..)
[27] 搜狗词库-汽车[DB/OL]. [2018-03-01]..
[27] (Sougou Typewriting Lexicon-Car [DB/OL]. [2018-03-01]..)
[28] Chalothorn T, Ellman J.Sentiment Analysis of Web Forums: Comparison Between Sentiwordnet and Sentistrength[C]// Proceedings of the 2012 International Conference on Software Technology and Engineering. 2012.
[1] 尤众喜,华薇娜,潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响 *[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[2] 吴小兰,章成志. 学术社交媒体视角下学科知识流动规律研究*——以科学网为例[J]. 数据分析与知识发现, 2019, 3(4): 107-116.
[3] 王林,王可,吴江. 社交媒体中突发公共卫生事件舆情传播与演变*——以2018年疫苗事件为例[J]. 数据分析与知识发现, 2019, 3(4): 42-52.
[4] 王晰巍,王铎,郑晴晓,韦雅楠. 在线品牌社群环境下企业与用户的信息互动研究*——以虚拟现实产业为例[J]. 数据分析与知识发现, 2019, 3(3): 83-94.
[5] 余本功,张培行,许庆堂. 基于F-BiGRU情感分析的产品选择方法*[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[6] 李心蕾,王昊,刘小敏,邓三鸿. 面向微博短文本分类的文本向量化方法比较研究*[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[7] 曾子明,杨倩雯. 基于LDA和AdaBoost多特征组合的微博情感分析*[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[8] 景东,张大勇. 社交媒体环境下用户信任度评估与传播影响力研究*[J]. 数据分析与知识发现, 2018, 2(7): 26-33.
[9] 王秀芳,盛姝,路燕. 一种基于话题聚类及情感强度的微博舆情分析模型*[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
[10] 杨斯楠,徐健,叶萍萍. 网络评论情感可视化技术方法及工具研究*[J]. 数据分析与知识发现, 2018, 2(5): 77-87.
[11] 王婷婷,王凯平,戚桂杰. 基于情感分析的开放式创新平台创意采纳研究: 以Salesforce为例*[J]. 数据分析与知识发现, 2018, 2(4): 38-47.
[12] 赵杨,李齐齐,陈雨涵,曹文航. 基于在线评论情感分析的海淘APP用户满意度研究*[J]. 数据分析与知识发现, 2018, 2(11): 19-27.
[13] 胡家珩,岑咏华,吴承尧. 基于深度学习的领域情感词典自动构建*——以金融领域为例[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
[14] 高永兵,杨贵朋,张娣,马占飞. 基于突显词博文聚类的官微事件检测方法*[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[15] 何跃,朱灿. 基于微博的意见领袖网情感特征分析*——以“非法疫苗”事件为例[J]. 数据分析与知识发现, 2017, 1(9): 65-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn