基于主题模型和情感分析的话题交互数据观点对抗性分析 <sup>*</sup>

doi:10.11925/infotech.2096-3467.2018.1362

数据分析与知识发现

2020, Vol. 4

Issue (7): 110-117 https://doi.org/10.11925/infotech.2096-3467.2018.1362

研究论文

本期目录 | 过刊浏览 | 高级检索

基于主题模型和情感分析的话题交互数据观点对抗性分析 ^*

徐红霞,于倩倩(

),钱力

中国科学院大学经济与管理学院图书情报与档案管理系北京 100190;中国科学院大学经济与管理学院图书情报与档案管理系北京 100190

Studying Content Interaction Data with Topic Model and Sentiment Analysis

Xu Hongxia,Yu Qianqian(

),Qian Li

National Science Library, Chinese Academy of Sciences, Beijing 100190, China;Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (694 KB) HTML ( 18 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】研究面向开放网络社区话题交互数据的对抗性观点挖掘方法。【方法】构建基于情感分析和主题模型的观点情感对抗性挖掘模型。通过该模型,考虑知乎社区、话题、交互数据等特征,加入交互数据筛选和关键词筛选,以知乎AlphaGo话题为例进行实证研究。【结果】本文方法可有效挖掘观点及其情感对抗性。研究发现在AlphaGo话题讨论中,“挺AlphaGo”和“反AlphaGo”的对抗性显著。“挺AlphaGo”的主要表现有人类智慧、比赛、能力,“反AlphaGo”的主要表现有AI产品及其产品、理解能力。【局限】仅针对AlphaGo主题进行实证分析,在模型泛化性验证上有待提高。【结论】本文方法具有可操作性和可解释性,可挖掘交互数据潜在的对抗性信息,从而使观点挖掘的结果更具针对性,为情报分析、观点挖掘提供借鉴。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	徐红霞
	于倩倩
	钱力

关键词 ：观点挖掘, 情感分析, 对抗性分析

Abstract：

[Objective] This paper explores data mining techniques for confrontational opinions from interaction data of online community.[Methods] First, we constructed a new algorithm to analyze emotional confrontations based on sentiment analysis and topic model. Then, we included the characteristics of knowledge, topic, and interaction data to the new model. Finally, we conducted an empirical study on the topic of AlphaGo.[Results] There was significant “Pro-AlphaGo” and “Anti-AlphaGo” confrontations online. The “Pro-AlphaGo” topics included human intelligence, competition and ability. The “Anti-AlphaGo” opinions covered AI companies, products and comprehension abilities.[Limitations] We only examined the proposed model with the topic of AlphaGo.[Conclusions] The proposed method benefits intelligence analysis.

Key words： Opinion Mining Sentiment Analysis Confrontation Analysis

收稿日期: 2018-12-03 出版日期: 2020-07-25

ZTFLH:

TP391

基金资助:*本文系国防科技创新特区项目“基于创新构想话题交互数据的问题求解”(JK1702-3);中国科学院文献情报中心青年人才领域前沿项目“基于深度学习的名称规范方法研究”的研究成果之一(G180171001)

通讯作者: 于倩倩 E-mail: yuqianqian@mail.las.ac.cn

引用本文:

徐红霞,于倩倩,钱力. 基于主题模型和情感分析的话题交互数据观点对抗性分析 ^*[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis. Data Analysis and Knowledge Discovery, 2020, 4(7): 110-117.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1362 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I7/110

Fig.1 技术流程

Fig.2 交互数据网络

Table 1 知乎交互数据（部分）

Fig.3 积极文本主题分布可视化

Fig.4 消极文本主题分布可视化

Table 2 观点对抗性表

[1]	Jurczyk P, Agichtein E. Discovering Authorities in Question Answer Communities by Using Link Analysis[C] //Proceedings of the 16th ACM Conference on Information and Knowledge Management. 2007: 919-922.
[2]	John B M, Chua A Y K, Goh D H L. What Makes a High-quality User-generated Answer?[J]. IEEE Internet Computing, 2011,15(1):66-71.
[3]	Fu H Y, Wu S H, Oh S H. Evaluating Answer Quality Across Knowledge Domains: Using Textual and Non-textual Features in Social Q&A[C] // Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community. 2015: Article No. 88.
[4]	Agichtein E, Castillo C, Donato D, et al. Finding High-quality Content in Social Media[C] // Proceedings of the 2008 International Conference on Web Search and Data Mining. 2008: 183-194.
[5]	Hatzivassiloglou V, McKeown K R. Predicting the Semantic Orientation of Adjectives[C] //Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. 1997: 174-181.
[6]	程亚男, 王宇. 基于语义情感相似度的问答社区答案排序研究[J]. 情报科学, 2018,36(8):72-76,83.
[6]	( Cheng Ya’nan, Wang Yu. Research on Ranking Q&A Community Answers Based on Semantic Emotional and Similarity[J]. Information Science, 2018,36(8):72-76, 83.)
[7]	姜雯, 许鑫, 武高峰. 附加情感特征的在线问答社区信息质量自动化评价[J]. 图书情报工作, 2015,59(4):100-105.
[7]	( Jiang Wen, Xu Xin, Wu Gaofeng. Online Q&A Community Automatically Information Quality Evaluation with Sentiment Feature[J]. Library and Information Service, 2015,59(4):100-105.)
[8]	刘渊杰. 社区问答系统最佳回答机制的研究[D]. 上海:上海交通大学, 2010.
[8]	( Liu Yuanjie. Study on Best Answer Policies in Community-based Question Answering Services[D]. Shanghai: Shanghai JiaoTong University, 2010.)
[9]	邹杰. 面向编程问答网站的主题挖掘及其应用研究[D]. 重庆:重庆大学, 2017.
[9]	( Zou Jie. Research on Topic Mining on Programming Question Answering Sites and Its Application[D]. Chongqing: Chongqing University, 2017.)
[10]	战学刚, 王晓. 基于LDA的问答网站话题抽取算法[J]. 计算机应用与软件, 2016,33(4):95-98.
[10]	( Zhan Xuegang, Wang Xiao. LDA-based Q&A Websites Question Label Extraction Algorithm[J]. Computer Applications and Software, 2016,33(4):95-98.)
[11]	倪兴良. 问答系统中的短文本聚类研究与应用[D]. 合肥:中国科学技术大学, 2011.
[11]	( Ni Xingliang. Short Text Clustering Research and Application in Q&A System[D]. Hefei: University of Science and Technology of China, 2011.)
[12]	Madria S K, Bhowmick S S, Ng W K, et al. Research Issues in Web Data Mining[C] // Proceedings of the 1st International Conference on Data Warehousing and Knowledge Discovery. 1999: 303-312.
[13]	Ortigosa-Hernández J, Rodríguez J D, Alzate L, et al. Approaching Sentiment Analysis by Using Semi-supervised Learning of Multi-dimensional Classifiers[J]. Neurocomputing, 2012,92(3):98-115.
[14]	Socher R, Pennington J, Huang E H, et al. Semi-supervised Recursive Autoencoders for Predicting Sentiment Distributions[C] //Proceedings of the 8th Conference on Empirical Methods in Natural Language Processing. 2011: 151-161.

[1]	钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述^*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[2]	刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法^*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[3]	王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 ^*[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[4]	常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析^*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[5]	张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 ^*[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[6]	韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析^*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[7]	吕华揆,刘政昊,钱宇星,洪旭东. 异质性财经新闻与股市关系研究*[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[8]	姜霖,张麒麟. 基于引文细粒度情感量化的学术评价研究*[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[9]	石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述^*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[10]	李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[11]	沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[12]	薛福亮,刘丽芳. 一种基于CRF与ATAE-LSTM的细粒度情感分析方法^*[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[13]	张翼鹏,马敬东. 突发公共卫生事件误导信息受众情感分析及传播特征研究^*[J]. 数据分析与知识发现, 2020, 4(12): 45-54.
[14]	谭荧,张进,夏立新. 社交媒体情境下的情感分析研究综述[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[15]	聂卉,何欢. 引入词向量的隐性特征识别研究*[J]. 数据分析与知识发现, 2020, 4(1): 99-110.

Viewed

Full text

Abstract

Cited

Shared

Discussed