Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (11): 33-40     https://doi.org/10.11925/infotech.1003-3513.2015.11.06
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于EM-LDA综合模型的电商微博热点话题发现
伍万坤1, 吴清烈1, 顾锦江1,2
1 东南大学经济管理学院 南京 211189;
2 江苏经贸职业技术学院信息技术学院 南京 211168
Hot Topic Extraction from E-commerce Microblog Based on EM-LDA Integrated Model
Wu Wankun1, Wu Qinglie1, Gu Jinjiang1,2
1 School of Economics and Management, Southeast University, Nanjing 211189, China
2 Department of Information Technology, Jiangsu Institute of Commerce, Nanjing 211168, China
全文: PDF (620 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]在社交营销环境下, 准确且有效地挖掘电商微博中的热点话题。[方法]提出一种综合模型EM-LDA对电商微博文本数据进行主题挖掘。EM-LDA综合模型包含两个子模型: ET-LDA模型和IT-LDA模型, 前者对含有哈希标签的微博进行主题挖掘, 后者对不含有哈希标签的微博进行主题挖掘。[结果]在确定合适的主题个数之后, 标准LDA模型和EM-LDA综合模型均被用来挖掘电商微博文本数据的热点话题, 与标准LDA模型相比, EM-LDA综合模型的热词挖掘准确率和有效性均较高, 且能提高主题可解释性。[局限]在ET-LDA模型中, 未考虑微博联系人之间的关联关系, 即模型中未引入用户特征; 在IT-LDA模型中没有考虑如何处理那些既是转发式又是对话式的电商微博。[结论]EM-LDA综合模型根据数据的特点, 改进了标准LDA模型, 能够提升电商微博热点话题识别的准确性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] Extract hot topics from e-commerce microblog in social marketing.[Methods] This paper proposes an integrated model, EM-LDA (E-commerce Microblog-LDA) to extract hot topics from e-commerce microblog. The integrated model contains two submodels, that is, ET-LDA model and IT-LDA model. The former is to extract hot topics from those e-commerce microblog with Hashtag, and the latter is to extract hot topics from those e-commerce microblog without Hashtag.[Results] The standard LDA model and EM-LDA integrated model are both used to extract hot topics from e-commerce microblog text after the number of topics is determined. Compared with the standard LDA model, EM-LDA model extract hot topics more accurately and effectively, also can improve interpretability.[Limitations] ET-LDA model is not considered about the relationship between microblog contacts, that is, user feature is neglected. IT-LDA model does not concern how to deal with those e-commerce microblog both belong to conversation and retweet.[Conclusions] According to the special features of e-commerce microblog text, EM-LDA integrated model ameliorates the standard LDA model to improve the accuracy of hot topic extraction from e-commerce microblog.

收稿日期: 2015-05-27      出版日期: 2016-04-06
:  TP393  
  G356  
基金资助:

本文系江苏省高校哲学与社会科学重点项目“江苏网络经济发展现状与对策研究”(项目编号:2013ZDIXM017)的研究成果之一。

通讯作者: 伍万坤, ORCID: 0000-0002-7872-6536, E-mail: wuwankunseu@qq.com。     E-mail: wuwankunseu@qq.com
作者简介: 作者贡献声明:伍万坤: 文献调研, 细化研究方向及技术方法路线, 提出改进的ET-LDA模型, 设计实验方案, 清洗数据, 实验结果分析, 论文撰写与最终版本修订; 吴清烈: 提出论文研究方向和思路, 设计研究方案及技术路线, 建立IT-LDA模型, 修改文章; 顾锦江: 数据采集、编程及实验结果分析, 修改文章。
引用本文:   
伍万坤, 吴清烈, 顾锦江. 基于EM-LDA综合模型的电商微博热点话题发现[J]. 现代图书情报技术, 2015, 31(11): 33-40.
Wu Wankun, Wu Qinglie, Gu Jinjiang. Hot Topic Extraction from E-commerce Microblog Based on EM-LDA Integrated Model. New Technology of Library and Information Service, 2015, 31(11): 33-40.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.11.06      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I11/33

[1] Wei X, Croft W B. LDA-based Document Models for Ad-hoc Retrieval [C]. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 178-185.
[2] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802. (Zhang Chenyi, Sun Jianling, Ding Yiqun. Topic Mining for Microblog Based on MB-LDA Model [J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
[3] 张晓艳, 王挺, 梁晓波. LDA模型在话题追踪中的应用[J]. 计算机科学, 2011, 38(10A): 136-139, 152. (Zhang Xiaoyan, Wang Ting, Liang Xiaobo. Use of LDA Model in Topic Tracking [J]. Computer Science, 2011, 38(10A): 136-139, 152.)
[4] 张培晶, 宋蕾. 基于LDA的微博文本主题建模方法研究述评[J]. 图书情报工作, 2012, 56(24): 120-126. (Zhang Peijing, Song Lei. Overview on Topic Modeling of Microblogs Text Based on LDA [J]. Library and Information Service, 2012, 56(24): 120-126.)
[5] Weng J, Lim E P, Jiang J, et al. TwitterRank: Finding Topic-sensitive Influential Twitterers [C]. In: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010: 261-270.
[6] Hong L, Davison B D. Empirical Study of Topic Modeling in Twitter [C]. In: Proceedings of the 1st Workshop on Social Media Analytics. ACM, 2010: 80-88.
[7] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The Author-topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2004: 487-494.
[8] Zhao W X, Jiang J, Weng J, et al. Comparing Twitter and Traditional Media Using Topic Models [C]. In: Proceedings of the 33rd European Conference on Informatin Retrieval. Springer Berlin Heidelberg, 2011: 338-349.
[9] Ramage D, Hall D, Nallapati R, et al. Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora [C]. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009: 248-256.
[10] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63. (Tang Xiaobo, Xiang Kun. Hotspot Mining Based on LDA Model and Microblog Heat [J]. Library and Information Service, 2014, 58(5): 58-63.)
[11] 朱颖. 基于微博的热点话题发现[D]. 重庆: 西南大学, 2014. (Zhu Ying. Hot Topic Extraction from Microblogs [D]. Chongqing: Southwest University, 2014.)
[12] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[13] Rosen-Zvi M, Chemudugunta C, Griffiths T, et al. Learning Author-topic Models from Text Corpora [J]. ACM Transactions on Information Systems, 2010, 28(1): Article No.4.
[14] Zhao W X, Jiang J, He J, et al. Topical Keyphrase Extraction from Twitter [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 2011: 379-388.
[15] Ramage D, Dumais S T, Liebling D J. Characterizing Microblogs with Topic Models [C]. In: Proceedings of the 4th International Conference on Weblogs and Social Media. 2010.
[16] 王星. 大数据分析: 方法与应用[M]. 北京: 清华大学出版社, 2013: 287-289. (Wang Xing. Big Data Analysis: Methods and Applications [M]. Beijing: Tsinghua University Press, 2013: 287-289.)
[17] 数据堂. 50条热门微博的所有转发和评论[EB/OL]. [2015-03-29]. http://www.datatang.com/data/46423. (Datatang. All Retweets and Comments of 50 Hot Microblogs [EB/OL]. [2015-03-29]. http://www.datatang.com/data/46423.)
[18] 数据堂. 63641个用户的新浪微博数据集[EB/OL]. [2015-03-30]. http://www.datatang.com/data/46758. (Datatang. Sina Microblog Datasets of 63641 Users [EB/OL]. [2015-03-30]. http://www.datatang.com/data/46758.)
[19] Toyabe T, Asai S. Analytical Models of Threshold Voltage and Breakdown Voltage of Short-channel MOSFET's Derived from Two-dimensional Analysis [J]. IEEE Transactions on Electron Devices, 1979, 26(4): 453-461.
[20] Cao J, Xia T, Li J, et al. A Density-based Method for Adaptive LDA Model Selection [J]. Neurocomputing, 2009, 72(7-9): 1775-1781.

[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 李文娜,张智雄. 基于置信学习的知识库错误检测方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[3] 孙羽, 裘江南. 基于网络分析和文本挖掘的意见领袖影响力研究 [J]. 数据分析与知识发现, 0, (): 1-.
[4] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] 李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[6] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[7] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[9] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[10] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[11] 阮小芸,廖健斌,李祥,杨阳,李岱峰. 基于人才知识图谱推理的强化学习可解释推荐研究*[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[12] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[13] 陈文杰,文奕,杨宁. 基于节点向量表示的模糊重叠社区划分算法*[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[14] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[15] 闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 *[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn