Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (3): 95-101    DOI: 10.11925/infotech.2096-3467.2018.0625
Current Issue | Archive | Adv Search |
Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM
Peiyao Zhang(),Dongsu Liu
School of Economics and Management, Xidian University, Xi’an 710126, China
Download: PDF (709 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      

[Objective] This paper aims to correctly grasp the topic development trend by constructing a microblog topic evolution method, and it is of great significance for public sentiment warning. [Methods] Firstly, the Ship-gram model is used to train the word vector model on the text set. Input the text of each time slice into the BTM to get the candidate theme. In BTM thematic dimension, the theme word vector is constructed. Secondly, k-means algorithm is used to cluster the theme word vector to get the fused theme. And the topic evolution of the text set on time slice is established. [Results] The experimental results show that the F value of this method is 75%, which is about 10% higher than that of the topic model. This proves the feasibility of the proposed method. [Limitations] There is no definite measuring standard for topic evolution, and there is no comparison between various methods of topic evolution. [Conclusions] The proposed method can effectively extract topics at all stages and provide an effective way for network public opinion analysis.

Key wordsBiterm Topic Model      Word Embedding      Topic Similarity      Topic Evolution     
Received: 06 June 2018      Published: 17 April 2019

Cite this article:

Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM. Data Analysis and Knowledge Discovery, 2019, 3(3): 95-101.

URL:     OR

[1] 陈福集, 马梅兰. 网络舆情事件的话题演化分析——以成都女司机为例[J]. 情报杂志, 2016, 35(5): 58-64.
[1] (Chen Fuji, Ma Meilan.A Subtopic Detection Method of Specific Events for Network Public Opinion: Taking News about a Female Driver in Chengdu as Example[J]. Journal of Intelligence, 2016, 35(5): 58-64.)
[2] 赵爱华, 刘培玉, 郑燕. 基于LDA的新闻话题子话题划分方法[J]. 小型微型计算机系统, 2013, 34(4): 732-737.
[2] (Zhao Aihua, Liu Peiyu, Zheng Yan.Subtopic Division in News Topic Based on Latent Dirichlet Allocation[J]. Journal of Chinese Computer Systerms, 2013, 34(4): 732-737.)
[3] 徐佳俊, 杨飏, 姚天防, 等. 基于LDA模型的论坛热点话题识别和追踪[J]. 中文信息学报, 2016, 30(1): 43-49.
[3] (Xu Jiajun, Yang Yang, Yao Tianfang, et al.LDA Based Hot Topic Detection and Tracking for the Forum[J]. Journal of Chinese Information Processing, 2016, 30(1): 43-49.)
[4] 王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 116-124, 140.
[4] (Wang Yamin, Hu Yue.Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model[J]. Journal of Intelligence, 2016, 35(11): 116-124, 140.)
[5] Wang X, McCallum A. Topic over Time: A Non-Markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
[6] Blei D M, Lafferty J D.Dynamic Topic Model[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
[7] 齐亚双, 祝娜, 翟羽佳. 基于DTM的国内外情报学研究主题热度演化对比研究[J]. 图书情报工作, 2016, 60(16): 99-109.
[7] (Qi Yashuang, Zhu Na, Zhai Yujia.A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016, 60(16): 99-109.)
[8] Alsumait L, Barbar D, Domeniconi C.On-line LDA: Adaptive Topic Models for Mining Text Streams with Application to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
[9] 胡艳丽, 白亮, 张维明. 网络舆情中一种基于OLDA 的在线话题演化方法[J]. 国防科技大学学报, 2012, 34(1): 150-154.
[9] (Hu Yanli, Bai Liang, Zhang Weiming.OLDA-based Method for Online Topic Evolution in Network Public Opinion Analysis[J]. Journal of National University of Defense Technology, 2012, 34(1): 150-154.)
[10] 唐晓波, 王洪艳. 基于潜在狄利克雷分配模型的微博主题演化分析[J]. 情报学报, 2013, 32(3): 281-287.
[10] (Tang Xiaobo, Wang Hongyan.Analysis of Microblog Topic Evolution Based on Latent Dirichlet Allocation Model[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(3): 281-287.)
[11] 史庆伟, 刘雨诗, 张丰田. 基于微博文本的词对主题演化模型[J]. 计算机应用, 2017, 37(5): 1407-1412.
[11] (Shi Qingwei, Liu Yushi, Zhang Fengtian.Biterm Topic Evolution Model of Microblog[J]. Journal of Computer Applications, 2017, 37(5): 1407-1412.)
[12] 李帅彬, 李亚星, 冯旭鹏, 等. 基于词向量的微博话题发现方法[J]. 计算机应用软件, 2017, 34(12): 47-52.
[12] (Li Shuaibin, Li Yaxing, Feng Xupeng, et al.Microblogging Topic Detection Based on the Word Distributed Representation[J]. Computer Application and Software, 2017, 34(12): 47-52.)
[13] 张佳明, 席耀一, 王波, 等. 基于词向量的微博事件追踪方法[J]. 计算机工程与应用, 2016, 52(17): 73-78.
[13] (Zhang Jiaming, Xi Yaoyi, Wang Bo, et al.Method of Micro-blog Event Tracking Based on Word Vector[J]. Computer Engineering and Applications, 2016, 52(17): 73-78.)
[14] Hinton G E.Learning Distributed Representations of Concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society. 1986.
[15] Yan X, Guo J, Lan Y, et al.A Biterm Topic Model for Short Texts[C]//Proceedings of the 22nd International Conference on World Wide Web. 2013: 1445-1456.
[16] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[17] Gensim. Gensim Word2Vec Framework[EB/OL]. [2017-11-10]. .
[18] 搜狗实验室全网新闻数据[EB/OL].[2017-11-10]. .
[18] (SogouCA[EB/OL]. [2017-11-10].
[19] 翟羽佳. 特定事件微博子话题特征提取研究[J]. 情报科学, 2016, 34(3): 145-150, 172.
[19] (Zhai Yujia.Subtopic Feature Extraction for Specified Event Microblogs[J]. Information Science, 2016, 34(3): 145-150, 172.)
[20] Gooseeker[EB/OL]. [2017-05-25]. .
[21] Jieba[EB/OL]. [2017-10-20]..
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[3] Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model[J]. 数据分析与知识发现, 2021, 5(3): 35-44.
[4] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[5] Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[6] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[7] Wei Tingxin,Bai Wenlei,Qu Weiguang. Sense Prediction for Chinese OOV Based on Word Embedding and Semantic Knowledge[J]. 数据分析与知识发现, 2020, 4(6): 109-117.
[8] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[9] Su Chuandong,Huang Xiaoxi,Wang Rongbo,Chen Zhiqun,Mao Junyu,Zhu Jiaying,Pan Yuhao. Identifying Chinese / English Metaphors with Word Embedding and Recurrent Neural Network[J]. 数据分析与知识发现, 2020, 4(4): 91-99.
[10] Wang Sili,Zhu Zhongming,Yang Heng,Liu Wei. Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning[J]. 数据分析与知识发现, 2020, 4(11): 15-25.
[11] Xinyu Zai,Xuedong Tian. Retrieving Scientific Documents with Formula Description Structure and Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 131-138.
[12] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[13] Yan Yu,Lei Chen,Jinde Jiang,Naixuan Zhao. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[14] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[15] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938