Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (3): 95-101    DOI: 10.11925/infotech.2096-3467.2018.0625
Current Issue | Archive | Adv Search |
Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM
Peiyao Zhang(),Dongsu Liu
School of Economics and Management, Xidian University, Xi’an 710126, China
Download: PDF(709 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      

[Objective] This paper aims to correctly grasp the topic development trend by constructing a microblog topic evolution method, and it is of great significance for public sentiment warning. [Methods] Firstly, the Ship-gram model is used to train the word vector model on the text set. Input the text of each time slice into the BTM to get the candidate theme. In BTM thematic dimension, the theme word vector is constructed. Secondly, k-means algorithm is used to cluster the theme word vector to get the fused theme. And the topic evolution of the text set on time slice is established. [Results] The experimental results show that the F value of this method is 75%, which is about 10% higher than that of the topic model. This proves the feasibility of the proposed method. [Limitations] There is no definite measuring standard for topic evolution, and there is no comparison between various methods of topic evolution. [Conclusions] The proposed method can effectively extract topics at all stages and provide an effective way for network public opinion analysis.

Key wordsBiterm Topic Model      Word Embedding      Topic Similarity      Topic Evolution     
Received: 06 June 2018      Published: 17 April 2019

Cite this article:

Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM. Data Analysis and Knowledge Discovery, 2019, 3(3): 95-101.

URL:     OR

[1] 陈福集, 马梅兰. 网络舆情事件的话题演化分析——以成都女司机为例[J]. 情报杂志, 2016, 35(5): 58-64.
[1] (Chen Fuji, Ma Meilan.A Subtopic Detection Method of Specific Events for Network Public Opinion: Taking News about a Female Driver in Chengdu as Example[J]. Journal of Intelligence, 2016, 35(5): 58-64.)
[2] 赵爱华, 刘培玉, 郑燕. 基于LDA的新闻话题子话题划分方法[J]. 小型微型计算机系统, 2013, 34(4): 732-737.
[2] (Zhao Aihua, Liu Peiyu, Zheng Yan.Subtopic Division in News Topic Based on Latent Dirichlet Allocation[J]. Journal of Chinese Computer Systerms, 2013, 34(4): 732-737.)
[3] 徐佳俊, 杨飏, 姚天防, 等. 基于LDA模型的论坛热点话题识别和追踪[J]. 中文信息学报, 2016, 30(1): 43-49.
[3] (Xu Jiajun, Yang Yang, Yao Tianfang, et al.LDA Based Hot Topic Detection and Tracking for the Forum[J]. Journal of Chinese Information Processing, 2016, 30(1): 43-49.)
[4] 王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 116-124, 140.
[4] (Wang Yamin, Hu Yue.Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model[J]. Journal of Intelligence, 2016, 35(11): 116-124, 140.)
[5] Wang X, McCallum A. Topic over Time: A Non-Markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
[6] Blei D M, Lafferty J D.Dynamic Topic Model[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
[7] 齐亚双, 祝娜, 翟羽佳. 基于DTM的国内外情报学研究主题热度演化对比研究[J]. 图书情报工作, 2016, 60(16): 99-109.
[7] (Qi Yashuang, Zhu Na, Zhai Yujia.A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016, 60(16): 99-109.)
[8] Alsumait L, Barbar D, Domeniconi C.On-line LDA: Adaptive Topic Models for Mining Text Streams with Application to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
[9] 胡艳丽, 白亮, 张维明. 网络舆情中一种基于OLDA 的在线话题演化方法[J]. 国防科技大学学报, 2012, 34(1): 150-154.
[9] (Hu Yanli, Bai Liang, Zhang Weiming.OLDA-based Method for Online Topic Evolution in Network Public Opinion Analysis[J]. Journal of National University of Defense Technology, 2012, 34(1): 150-154.)
[10] 唐晓波, 王洪艳. 基于潜在狄利克雷分配模型的微博主题演化分析[J]. 情报学报, 2013, 32(3): 281-287.
[10] (Tang Xiaobo, Wang Hongyan.Analysis of Microblog Topic Evolution Based on Latent Dirichlet Allocation Model[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(3): 281-287.)
[11] 史庆伟, 刘雨诗, 张丰田. 基于微博文本的词对主题演化模型[J]. 计算机应用, 2017, 37(5): 1407-1412.
[11] (Shi Qingwei, Liu Yushi, Zhang Fengtian.Biterm Topic Evolution Model of Microblog[J]. Journal of Computer Applications, 2017, 37(5): 1407-1412.)
[12] 李帅彬, 李亚星, 冯旭鹏, 等. 基于词向量的微博话题发现方法[J]. 计算机应用软件, 2017, 34(12): 47-52.
[12] (Li Shuaibin, Li Yaxing, Feng Xupeng, et al.Microblogging Topic Detection Based on the Word Distributed Representation[J]. Computer Application and Software, 2017, 34(12): 47-52.)
[13] 张佳明, 席耀一, 王波, 等. 基于词向量的微博事件追踪方法[J]. 计算机工程与应用, 2016, 52(17): 73-78.
[13] (Zhang Jiaming, Xi Yaoyi, Wang Bo, et al.Method of Micro-blog Event Tracking Based on Word Vector[J]. Computer Engineering and Applications, 2016, 52(17): 73-78.)
[14] Hinton G E.Learning Distributed Representations of Concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society. 1986.
[15] Yan X, Guo J, Lan Y, et al.A Biterm Topic Model for Short Texts[C]//Proceedings of the 22nd International Conference on World Wide Web. 2013: 1445-1456.
[16] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[17] Gensim. Gensim Word2Vec Framework[EB/OL]. [2017-11-10]. .
[18] 搜狗实验室全网新闻数据[EB/OL].[2017-11-10]. .
[18] (SogouCA[EB/OL]. [2017-11-10].
[19] 翟羽佳. 特定事件微博子话题特征提取研究[J]. 情报科学, 2016, 34(3): 145-150, 172.
[19] (Zhai Yujia.Subtopic Feature Extraction for Specified Event Microblogs[J]. Information Science, 2016, 34(3): 145-150, 172.)
[20] Gooseeker[EB/OL]. [2017-05-25]. .
[21] Jieba[EB/OL]. [2017-10-20]..
[1] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[2] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[3] Hongqinling Wang,Zhichao Ba,Gang Li. Conversational Topic Intensity Calculation and Evolution Analysis of WeChat Group[J]. 数据分析与知识发现, 2019, 3(2): 33-42.
[4] Yuemei Xu,Sining Lv,Lianqiao Cai,Xiaoya Zhang. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. 数据分析与知识发现, 2018, 2(9): 31-41.
[5] Lin Li,Hui Li. Computing Text Similarity Based on Concept Vector Space[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[6] Jingqi Wang,Rui Li,Huayi Wu. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[7] Weilin He,Guohe Feng,Hongling Xie. Analyzing Scientific Literature with Content Similarity - Topics over Time Model[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[8] Tingting Wang,Man Han,Yu Wang. Optimizing LDA Model with Various Topic Numbers: Case Study of Scientific Literature[J]. 数据分析与知识发现, 2018, 2(1): 29-40.
[9] Qin Zhang,Hongmei Guo,Zhixiong Zhang. Extracting Entity Relationship with Word Embedding Representation Features[J]. 数据分析与知识发现, 2017, 1(9): 8-15.
[10] Tian Xia. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[11] Wang Yuefen,Jin Jialin. Characteristics and Development Trends of Papers from “New Technology of Library and Information Service”[J]. 现代图书情报技术, 2016, 32(9): 1-16.
[12] Qun Zhang, Hongjun Wang, Lunwen Wang. Classifying Short Texts with Word Embedding and LDA Model[J]. 数据分析与知识发现, 2016, 32(12): 27-35.
[13] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
[14] Xu Yuemei,Li Yang,Liang Ye,Cai Lianqiao. Analyzing Evolution of News Topics with Manifold Learning[J]. 现代图书情报技术, 2016, 32(10): 59-69.
[15] Qin Xiaohui, Le Xiaoqiu. Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. 现代图书情报技术, 2015, 31(3): 18-25.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938