Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (2): 98-107     https://doi.org/10.11925/infotech.2096-3467.2018.0578
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于中文社交媒体文本的领域情感词典构建方法研究*
蒋翠清1,2(),郭轶博1,刘尧1
1合肥工业大学管理学院 合肥 230009
2过程优化与智能决策教育部重点实验室 合肥 230009
Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text
Cuiqing Jiang1,2(),Yibo Guo1,Yao Liu1
1School of Management, Hefei University of Technology, Hefei 230009, China
2Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education, Hefei 230009, China
全文: PDF (632 KB)   HTML ( 13
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】从社交媒体用户生成内容中发现未知情感词, 构造领域情感词典, 应用于汽车评论的情感分析。【方法】选取HowNet情感词典作为种子, 以实际汽车评论作为语料, 分别利用PMI和Word2Vec算法识别新词情感极性, 根据集成规则对二者识别结果综合判定, 通过情感分类实验对比显示本文算法的有效性。【结果】按照该方法构造的情感词典准确率比HowNet情感词典提高21.6%, 较分别使用PMI和Word2Vec算法构建的词典分别提升3.7%和2.1%, 同时正面、负面情感词数量均有大幅增加。【局限】语料来源单一, 应用于其他领域具有一定局限性。【结论】该方法构造的情感词典可有效应用于社交媒体文本情感分析。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
蒋翠清
郭轶博
刘尧
关键词 社交媒体情感分析情感词典PMIWord2Vec    
Abstract

[Objective] This study aims to construct a domain sentiment lexicon by discovering unrecognized sentiment words from user-generated contents on Chinese social media to apply it to automotive comments sentiment analysis. [Methods] First, words in HowNet are selected as the seeds, and PMI and Word2Vec algorithm are used to calculate the sentiment polarity of the candidates respectively on real automative corpus. Then the results of the two discriminations are judged synthetically according to the ensemble rules. Finally the proposed method was shown effective by the comparison of the sentiment classification experiments. [Results] The accuracy rate of the lexicon constructed according to proposed method is 21.6% higher than that of HowNet. The lexicon constructed by PMI and Word2Vec respectively increase 3.7% and 2.1%. Meanwhile the number of positive and negative emotional words are greatly increased. [Limitations] The source of corpus is single, and it has certain limitations in guiding other fields. [Conclusions] The sentiment lexicon constructed by this method can be applied to sentiment analysis of social media texts effectively.

Key wordsSocial Media    Sentiment Analysis    Sentiment Lexicon    PMI    Word2Vec
收稿日期: 2018-05-23      出版日期: 2019-03-27
基金资助:*本文系国家自然科学基金项目“基于社交媒体用户生成内容的产品创新需求发现方法研究”(项目编号: 71571059)、国家自然科学基金重点项目“大数据环境下的微观信用评价理论与方法研究”(项目编号: 71731005)和安徽省教育厅高校人文社会科学重大研究项目“社会媒体环境下企业舆情演化机理与管控研究”(项目编号: SK2014ZD054)的研究成果之一
引用本文:   
蒋翠清,郭轶博,刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text. Data Analysis and Knowledge Discovery, 2019, 3(2): 98-107.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0578      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I2/98
[1] Liu B.Sentiment Analysis and Opinion Mining[A]//Synthesis Lectures on Human Language Technologies[M]. Morgan & Claypool Publishers, 2012: 152-153.
[2] Hogenboom A, Heerschop B, Frasincar F, et al.Multi-lingual Support for Lexicon-based Sentiment Analysis Guided by Semantics[J]. Decision Support Systems, 2014, 62(2): 43-53.
[3] Wu F, Huang Y, Song Y, et al.Towards Building a High-quality Microblog-specific Chinese Sentiment Lexicon[J]. Decision Support Systems, 2016, 87: 39-49.
[4] Fellbaum C, Miller G.WordNet: An Electronic Lexical Database[M]. MIT Press, 1998.
[5] Stone P J, Dunphy D C, Smith M S.The General Inquirer: A Computer Approach to Content Analysis[J]. Information Storage & Retrieval, 1966, 4(4): 375-376.
[6] Dong Z, Dong Q.HowNet - A Hybrid Language and Knowledge Resource[C]// Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. 2003.
[7] 王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016,42(4): 495-511.
[7] (Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511.)
[8] Loughran T, Mcdonald B.When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10‐Ks[J]. Journal of Finance, 2011, 66(1): 35-65.
[9] Church K W, Hanks P. Word Association Norms, Mutual Information,Lexicography[J]. Computational Linguistics, 1990, 16(1): 76-83.
[10] Turney P D, Littman M L.Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
[11] Deng S, Sinha A P, Zhao H.Adapting Sentiment Lexicons to Domain-Specific Social Media Texts[J]. Decision Support Systems, 2017, 94: 65-76.
[12] 郭顺利, 张向先. 面向中文图书评论的情感词典构建方法研究[J]. 现代图书情报技术, 2016(2): 67-74.
[12] (Guo Shunli, Zhang Xiangxian.Building Sentiment Analysis Dictionary for Chinese Book Reviews[J]. New Technology of Library and Information Service, 2016(2): 67-74.)
[13] 郗亚辉. 产品评论中领域情感词典的构建[J]. 中文信息学报, 2016, 30(5): 136-144.
[13] (Xi Yahui.Construction of Domain-specific Sentiment Lexicon in Product Reviews[J]. Journal of Chinese Information Processing, 2016, 30(5): 136-144.)
[14] 朱嫣岚, 闵锦, 周雅倩, 等. 基于HowNet的词汇语义倾向计算[J]. 中文信息学报, 2006, 20(1): 14-20.
[14] (Zhu Yanlan, Min Jin, Zhou Yaqian, et al.Semantic Orientation Computing Based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1): 14-20.)
[15] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.378.
[16] Mikolov T, KarafiáT M, BURGET L, et al. Recurrent Neural Network Based Language Model[C]//Proceedings of the 2010 Conference of the International Speech Communication Association, Makuhari, Chiba, Japan. 2010.
[17] 杨小平, 张中夏, 王良, 等. 基于Word2Vec的情感词典自动构建与优化[J]. 计算机科学, 2017, 44(1): 42-47.
[17] (Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J].Computer Science, 2017, 44(1): 42-47.)
[18] 王仁武, 宋家怡, 陈川宝. 基于Word2vec的情感分析在品牌认知中的应用研究[J]. 图书情报工作, 2017, 61(22): 6-12.
[18] (Wang Renwu, Song Jiayi, Chen Chuanbao.Application of Sentiment Analysis Based on Word2vec in Brand Cognition[J]. Library and Information Service, 2017, 61(22): 6-12.)
[19] Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C]// Proceedings of the 21st International Jont Conference on Artifical Intelligence. 2009.
[20] Filho J L, Canuto A P, Santiago R N.Investigating the Impact of Selection Criteria in Dynamic Ensemble Selection Methods[J]. Expert Systems with Applications, 2018,106: 141-153.
[21] Kittler J V, Hatef M, Duin R W, et al.On Combining Classfiers[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(3): 226-239.
[22] 黄伟, 范磊. 基于多分类器投票集成的半监督情感分类方法研究[J]. 中文信息学报, 2016, 30(2): 41-49.
[22] (Huang Wei, Fan Lei.Semi-supervised Sentiment Classification Based on Ensemble Learning with Voting[J]. Journal of Chinese Information Processing, 2016, 30(2): 41-49.)
[23] Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
[24] Li Y, Guo H, Liu X, et al.Adapted Ensemble Classification Algorithm Based on Multiple Classifier System and Feature Selection for Classifying Multi-class Imbalanced Data[J]. Knowledge-Based Systems, 2016, 94: 88-104.
[25] 汽车之家论坛[EB/OL]. [2018-03-01].
[25] (AutoHome Forum[EB/OL]. [2018-03-01]..)
[26] 结巴中文分词[CP/OL]. [2018-03-01]..
[26] (Jieba: Chinese Text Segmentation[CP/OL] . [2018-03-01]..)
[27] 搜狗词库-汽车[DB/OL]. [2018-03-01]..
[27] (Sougou Typewriting Lexicon-Car [DB/OL]. [2018-03-01]..)
[28] Chalothorn T, Ellman J.Sentiment Analysis of Web Forums: Comparison Between Sentiwordnet and Sentistrength[C]// Proceedings of the 2012 International Conference on Software Technology and Engineering. 2012.
[1] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[2] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[3] 马莹雪,赵吉昌. 自然灾害期间微博平台的舆情特征及演变*——以台风和暴雨数据为例[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[4] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[5] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[6] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[7] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[8] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[9] 韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[10] 吕华揆,刘政昊,钱宇星,洪旭东. 异质性财经新闻与股市关系研究*[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[11] 李跃艳,熊回香,李晓敏. 在线问诊平台中基于组合条件的医生推荐研究*[J]. 数据分析与知识发现, 2020, 4(8): 130-142.
[12] 刘倩, 李晨亮. 基于社交媒体的话题演变研究综述*[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[13] 李纲, 管为栋, 马亚雪, 毛进. 学术论文的社交媒体可见性预测研究*[J]. 数据分析与知识发现, 2020, 4(8): 63-74.
[14] 徐红霞,于倩倩,钱力. 基于主题模型和情感分析的话题交互数据观点对抗性分析 *[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[15] 唐晓波,高和璇. 基于关键词词向量特征扩展的健康问句分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn