Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (9): 31-41    DOI: 10.11925/infotech.2096-3467.2018.0068
Current Issue | Archive | Adv Search |
Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec
Xu Yuemei1(), Lv Sining1, Cai Lianqiao1, Zhang Xiaoya2
1Department of Computer Science, Beijing Foreign Studies University, Beijing 100089, China
2School of International Journalism and Communication, Beijing Foreign Studies University, Beijing 100089, China
Download: PDF (1934 KB)   HTML ( 9
Export: BibTeX | EndNote (RIS)      

[Objective] This study analyzes the evolution of news topics, aiming to identify the public opinion and media coverage of certain events. [Methods] We proposed a word distributed representation method based on Topic2Vec to improve the semantic distance of topics. Then, we introduced the convolutional neural networks model to learn the topic vectors and cluster the similar ones. Finally, we obtained the topics’ evolution trends, focus events and related key sub-topics. [Results] We collected news reports on China from the website of CNN between 2015 and 2017 as datasets to examine the proposed method, which effectively revealed the evolution of topics and sentiments. [Limitations] We did not explore the impacts of time window length. [Conclusions] Compared with previous models, the proposed method improves the accuracy of topic clustering by 10% and helps us explore the topic evolution of news.

Key wordsNews Topic      Convolutional Neural Networks      Topic Evolution      Topic2Vec     
Received: 18 January 2018      Published: 25 October 2018
ZTFLH:  分类号: TP393  

Cite this article:

Xu Yuemei,Lv Sining,Cai Lianqiao,Zhang Xiaoya. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec. Data Analysis and Knowledge Discovery, 2018, 2(9): 31-41.

URL:     OR

方法 引入时间方式 代表模型 话题数目 优点 缺点
方法1 作为可观测连续变量 ToT 固定 得到主题的连续时间分布, 不需考虑
主题数目固定, 要求主题在所有时间分布
方法2 按时间后离散 Topic Entropy 固定 获得全局的主题信息, 较为全面 主题数目固定, 不能检测到新主题
方法3 按时间先离散 ODTM 不固定 主题数目可变, 可以检测到新主题 需设计合适的时间粒度
T5,2 T5,3 T10,1 T7,4
民生类(飞船发射) 军事类(南海问题) 军事类(南海问题) 经济类(股市)
space 0.0671 sea 0.0215 sea 0.0199 market 0.0233
mission 0.0288 south 0.0184 island 0.0189 economy 0.0191
shenzhou 0.0205 navy 0.0136 south 0.0147 stock 0.01637
astronaut 0.0192 military 0.0136 build 0.0121 growth 0.0130
Chinese 0.0091 island 0.0132 warn 0.0079 month 0.0103
launch 0.0071 flight 0.0110 aircraft 0.0079 rate 0.0088
opportunity 0.0071 aircraft 0.0088 dispute 0.0073 currency 0.0085
center 0.0064 dispute 0.0071 issue 0.0068 global 0.0075
station 0.0064 claim 0.0066 military 0.0068 bank 0.0075
spaceflight 0.0052 reef 0.0066 surveillance 0.0063 treasury 0.0067
T10,4 T6,6 T9,4 T11,1
经济类(股市) 民生类(长江沉船) 民生类(混杂) 政治类(“习马”会面)
market 0.0412 ship 0.0563 GDP 0.0475 china 0.0438
stock 0.0293 yangtze 0.0228 dinosaur 0.0368 taiwan 0.0421
economy 0.0208 river 0.0196 Sale 0.0245 ma 0.0193
bank 0.0117 sink 0.0149 quarter 0.0231 xi 0.0158
rate 0.0107 eastern 0.0146 smartphone 0.0223 meeting 0.0152
government 0.0096 cruise 0.0142 brand 0.0192 party 0.0123
month 0.0096 capsize 0.0138 democracy 0.0169 president 0.0117
economist 0.0085 report 0.0102 Status 0.0138 leader 0.0094
crash 0.0081 rescue 0.0095 product 0.0101 relation 0.0088
fall 0.0061 passenger 0.0081 feather 0.0101 close 0.0070
激活函数 ReLU
dropout 0.6
Batch 15
滤波器滑动窗口大小h 3, 4, 5
训练迭代次数 50
测试集分配 模型 4类准确率 3类准确率 2类准确率
随机分配 Word2Vec 60.67% 68.89% 83.33%
SVM-LDA 57.98% 63.54% 82.76%
Topic2Vec 73.33% 82.22% 95.00%
按时间分配 Word2Vec 53.33% 54.55% 85.71%
SVM-LDA 56.79% 64.23% 83.47%
Topic2Vec 66.67% 72.72% 100.00%
[1] Hoffman M, Bach F R, Blei D M.Online Learning for Latent Dirichlet Allocation[C]//Proceedings of the Neural Information Processing Systems Conference. 2010: 1-9.
[2] Chen F, Chiu P, Lim S.Topic Modeling of Document Metadata for Visualizing Collaborations over Time[C]//Proceedings of the 21st International Conference on Intelligent User Interfaces, California, USA. ACM, 2016:108-117.
[3] He Y, Lin C.Joint Sentiment/Topic Model for Sentiment Analysis[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China,2009: 375-384.
[4] Lin C, He Y, Everson R, et al.Weakly Supervised Joint Sentiment-Topic Detection from Text[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 1134-1145.
doi: 10.1109/TKDE.2011.48
[5] Hofmann T.Probabilistic Latent Semantic Indexing[J]. ACM SIGIR Forum-SIGIR Test-of-Time Awardees 1978-2001, 2017, 51(2): 211-218.
[6] Kim S, Zhang J, Chen Z, et al.A Hierarchical Aspect-Sentiment Model for Online Reviews[C]//Proceedings of the 27th AAAI Conference on Artificial Intelligence. 2013: 526-533.
[7] Ma C, Wang M, Chen X.Topic and Sentiment Unification Maximum Entropy Model for Online Review Analysis[C]//Proceedings of International World Wide Web Conference, Florence, Italy. 2015: 649-654.
[8] Zhu C, Zhu H, Ge Y, et al.Tracking the Evolution of Social Emotions with Topic Models[J].Knowledge and Information Systems, 2016, 47(3): 517-544.
doi: 10.1007/s10115-015-0865-0
[9] 黄卫东, 陈凌云, 吴美蓉. 网络舆情话题情感演化研究[J]. 情报杂志, 2014, 33(1): 102-107.
doi: 10.3969/j.issn.1002-1965.2014.01.019
[9] (Huang Weidong, Chen Lingyun, Wu Meirong.Research on Sentiment Evaluation of Online Public Opinion Topic[J]. Journal of Intelligence,2014, 33(1): 102-107.)
doi: 10.3969/j.issn.1002-1965.2014.01.019
[10] Hall D, Jurafsky D, Manning C D.Studying the History of Ideas Using Topic Models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, USA. 2008: 363-371.
[11] Iwata T, Yamada T, Sakurai Y, et al.Online Multiscale Dynamic Topic Models[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, USA. 2010: 663-672.
[12] Kim Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. 2014:1746-1751.
[13] Hutto C J, Gilbert E.VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text[C]//Proceedings of the 8th International AAAI Conference on Weblogs and Social Media, Michigan, USA. 2014: 216-225.
[14] Jonathon S.Notes on Kullback-Leibler Divergence and Likelihood[OL]. arXiv Preprint, arXiv: 1404.2000.
[15] GooSeeker[OL]. [2017-02-14]. .
[16] Zhao W, Chen J J, Perkins R.A Heuristic Approach to Determine an Appropriate Number of Topics in Topic Modeling[C]//Proceedings of the 12th Annual MCBIOS Conference, Arkansas, USA. 2017: 123-131.
[17] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26(13): 3111-3119.
[18] Yang B, Xiang M, Zhang Y.Multi-manifold Discriminant Isomap for Visualization and Classification[J]. Pattern Recognition, 2016, 55(1): 215-230.
doi: 10.1016/j.patcog.2016.02.001
[1] Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model[J]. 数据分析与知识发现, 2021, 5(3): 35-44.
[2] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[3] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[4] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[5] Liu Weijiang,Wei Hai,Yun Tianhe. Evaluation Model for Customer Credits Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[6] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[7] Hongqinling Wang,Zhichao Ba,Gang Li. Conversational Topic Intensity Calculation and Evolution Analysis of WeChat Group[J]. 数据分析与知识发现, 2019, 3(2): 33-42.
[8] Gang Li,Sijing Chen,Jin Mao,Yansong Gu. Spatio-Temporal Comparison of Microblog Trending Topics on Natural Disasters[J]. 数据分析与知识发现, 2019, 3(11): 1-15.
[9] Wang Jingqi,Li Rui,Wu Huayi. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[10] He Weilin,Feng Guohe,Xie Hongling. Analyzing Scientific Literature with Content Similarity - Topics over Time Model[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[11] Wang Yuefen,Jin Jialin. Characteristics and Development Trends of Papers from “New Technology of Library and Information Service”[J]. 现代图书情报技术, 2016, 32(9): 1-16.
[12] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
[13] Xu Yuemei,Li Yang,Liang Ye,Cai Lianqiao. Analyzing Evolution of News Topics with Manifold Learning[J]. 现代图书情报技术, 2016, 32(10): 59-69.
[14] Qin Xiaohui, Le Xiaoqiu. Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. 现代图书情报技术, 2015, 31(3): 18-25.
[15] Zhao Yingguang, Hong Na, An Xinying. A Survey of the Approach of Topic Evolution Model Based on Topic Model[J]. 现代图书情报技术, 2014, 30(10): 63-69.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938