School of Cyber Science and Engineering, Wuhan University, Wuhan 430075, China)(CETC Key Laboratory of Aerospace Information Applications, Shijiazhuang 050081, China
[Objective] This paper analyzes and summarizes recent researches about topic evolution on social media, and mainly introduces the relevant analysis techniques. [Coverage] Relevant literatures were collected in DBLP, Semantic Scholar and CNKI with the use of keywords "Social" and "Topic Evolution". Finally, a total of 83 representative literatures were cited. [Methods] According to the research objects and the methods of topic extraction, the topic evolution techniques are analyzed. [Results] The techniques are divided into two categories and six subcategories, and the prediction of the topic’s trend is analyzed. [Limitations] We didn’t discuss the detailed comparative analysis of the way these techniques introduce time. [Conclusions] This paper analyzed and summarized the techniques of topic evolution on social media, and found the challenges and future directions of this research.
刘倩, 李晨亮. 基于社交媒体的话题演变研究综述*[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media. Data Analysis and Knowledge Discovery, 2020, 4(8): 1-14.
Ipsos. Social Media Usage Report[R/OL]. (2019-09-04). https://www.ipsos.com/en/social-media-usage-report
[2]
Allan J, Carbonell J G, Doddington G, et al. Topic Detection and Tracking Pilot Study: Final Report[C]// Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. 1998: 194-218.
( Zhang Yangsen, Duan Yuxiang, Huang Gaijuan, et al. A Survey on Topic Detection and Tracking Methods in Social Media[J]. Journal of Chinese Information Processing, 2019,33(7):1-10, 30.)
( Shan Bin, Li Fang. A Survey of Topic Evolution Based on LDA[J]. Journal of Chinese Information Processing, 2010,24(6):43-49,68.)
[5]
Zhou H K, Yu H M, Hu R, et al. A Survey on Trends of Cross-media Topic Evolution Map[J]. Knowledge-Based Systems, 2017,124(C):164-175.
doi: 10.1016/j.knosys.2017.03.009
[6]
Fiscus J G, Doddington G R. Topic Detection and Tracking Evaluation Overview[A]// Topic Detection and Tracking: Event-Based Information Organization[M]. Kluwer Academic Publishers, 2002: 17-31.
[7]
Srijith P K, Hepple M, Bontcheva K, et al. Sub-story Detection in Twitter with Hierarchical Dirichlet Processes[J]. Information Processing & Management, 2016,53(4):989-1003.
doi: 10.1016/j.ipm.2016.10.004
[8]
Dehghani N, Asadpour M. SGSG: Semantic Graph-based Storyline Generation in Twitter[J]. Journal of Information Science, 2019,45(3):304-321.
doi: 10.1177/0165551518775304
[9]
Wang P, Zhang P, Zhou C, et al. Hierarchical Evolving Dirichlet Processes for Modeling Nonlinear Evolutionary Traces in Temporal Data[J]. Data Mining and Knowledge Discovery, 2017,31(1):32-64.
doi: 10.1007/s10618-016-0454-1
[10]
Zhang X C, Zhao L, Chen Z Q, et al. Trendi: Tracking Stories in News and Microblogs via Emerging, Evolving and Fading Topics[C]// Proceedings of 2017 IEEE International Conference on Big Data. IEEE, 2017: 1590-1599.
[11]
İlhan N, Öğüdücü Ş G. Predicting Community Evolution Based on Time Series Modeling[C]// Proceedings of 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2015: 1509-1516.
[12]
Abulaish M, Fazil M. Modeling Topic Evolution in Twitter: An Embedding-based Approach[J]. IEEE Access, 2018,6:64847-64857.
doi: 10.1109/ACCESS.2018.2878494
[13]
Momeni R E, Karunasekera S, Goyal P, et al. Modeling Evolution of Topics in Large-scale Temporal Text Corpora[C]// Proceedings of the 12th International AAAI Conference on Web and Social Media. 2018.
[14]
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3(4/5):993-1022.
[15]
Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
[16]
Wang X R, McCallum A. Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
[17]
AlSumait L, Barbará D, Domeniconi C. Online LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
[18]
Wang Y, Agichtein E, Benzi M. TM-LDA: Efficient Online Modeling of Latent Topic Transitions in Social Media[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 123-131.
[19]
Sasaki K, Yoshikawa T, Furuhashi T. Online Topic Model for Twitter Considering Dynamics of User Interests and Topic Trends[C]// Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1977-1985.
[20]
Liang S S, Yilmaz E, Kanoulas E. Dynamic Clustering of Streaming Short Documents[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 995-1004.
[21]
Alam M H, Ryu W J, Lee S K. Hashtag-based Topic Evolution in Social Media[J]. World Wide Web, 2017,20(6):1527-1549.
doi: 10.1007/s11280-017-0451-3
[22]
Yan X H, Guo J F, Lan Y Y, et al. A Biterm Topic Model for Short Texts[C]// Proceedings of the 22nd International Conference on World Wide Web. 2013: 1445-1456.
[23]
Huang J J, Peng M, Wang H, et al. A Probabilistic Method for Emerging Topic Tracking in Microblog Stream[J]. World Wide Web, 2017,20(2):325-350.
doi: 10.1007/s11280-016-0390-4
[24]
Pennington J, Socher R, Manning C D, et al. GloVe: Global Vectors for Word Representation[C]// Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[25]
Song J, Huang Y, Qi X, et al. Discovering Hierarchical Topic Evolution in Time-Stamped Documents[J]. Journal of the Association for Information Science and Technology, 2016,67(4):915-927.
doi: 10.1002/asi.2016.67.issue-4
[26]
Ahmed A, Xing E P. Dynamic Non-parametric Mixture Models and the Recurrent Chinese Restaurant Process: With Applications to Evolutionary Clustering[C]// Proceedings of the SIAM International Conference on Data Mining. 2008.
[27]
Zhang Y H, Mao W J, Lin J J. Modeling Topic Evolution in Social Media Short Texts[C]// Proceedings of 2017 IEEE International Conference on Big Knowledge. 2017.
[28]
Zhang Y H, Mao W J, Zeng D, et al. Topic Evolution Modeling in Social Media Short Texts Based on Recurrent Semantic Dependent CRP[C]// Proceedings of 2017 IEEE International Conference on Intelligence and Security Informatics. 2017: 119-124.
[29]
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735
pmid: 9377276
[30]
Lu Z Y, Tan H H, Li W J, et al. An Evolutionary Context-aware Sequential Model for Topic Evolution of Text Stream[J]. Information Sciences, 2019: 166-177.
[31]
Yan X H, Guo J F, Liu S H, et al. Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix[C]// Proceedings of 2013 SIAM International Conference on Data Mining. 2013: 749-757.
[32]
Saha A, Sindhwani V. Learning Evolving and Emerging Topics in Social Media: A Dynamic NMF Approach with Temporal Regularization[C]// Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 2012: 693-702.
[33]
Chen Y, Zhang H, Wu J J, et al. Modeling Emerging , Evolving and Fading Topics Using Dynamic Soft Orthogonal NMF with Sparse Representation[C]// Proceedings of 2015 IEEE International Conference on Data Mining (ICDM). IEEE, 2015: 61-70.
[34]
Bahargam S, Papalexakis E E. A Constrained Coupled Matrix-tensor Factorization for Learning Time-evolving and Emerging Topics[[OL]. arXiv Preprint, arXiv: 1807. 00122.
[35]
Singh A P, Gordon G J. Relational Learning via Collective Matrix Factorization[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008: 650-658.
[36]
Kalyanam J, Mantrach A, Saeztrumper D, et al. Leveraging Social Context for Modeling Topic EvolutionC]// [Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015: 517-526.
[37]
Yu W R, Aggarwal C C, Ma S, et al. On Anomalous Hotspot Discovery in Graph Streams[C]// Proceedings of 2013 IEEE International Conference on Data Mining (ICDM). IEEE, 2013: 1271-1276.
[38]
Palla G, Derenyi I, Farkas I J, et al. Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society[J]. Nature, 2005,435(7043):814-818.
doi: 10.1038/nature03607
pmid: 15944704
[39]
Lu Z Y, Yu W R, Zhang R C, et al. Discovering Event Evolution Chain in Microblog[C]// Proceedings of the 17th International Conference on High Performance Computing and Communications. 2015: 635-640.
[40]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301-3781.
[41]
Liu Y P, Peng H, Li J X, et al. vent Detection and Evolution in Multi-lingual Social Streams[J]. Frontiers of Computer Science. 2020, 14(5): 145612.
doi: 10.1007/s11704-019-8201-6
[42]
Blondel V D, Guillaume J-L, Lambiotte R, et al. Fast Unfolding of Communities in Large Networks[J]. Frontiers of Computer Science. 2008(10):P10008.
[43]
Fedoryszak M, Frederick B, Rajaram V, et al. Real-time Event Detection on Social Data Streams[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019: 2774-2782.
[44]
Hashimoto T, Okamoto H, Kuboyama T, et al. Topic Life Cycle Extraction from Big Twitter Data Based on Community Detection in Bipartite Networks[C]// Proceedings of 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017: 2740-2745.
[45]
Ester M, Kriegel H, Sander J, et al. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]// Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1996: 226-231.
[46]
MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations[C]// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967: 281-297.
[47]
Zhang T, Ramakrishnan R, Livny M, et al. BIRCH: An Efficient Data Clustering Method for Very Large Databases[C]// Proceedings of 1996 ACM SIGMOD International Conference on Management of Data. 1996: 103-114.
[48]
Fisher D H. Knowledge Acquisition via Incremental Conceptual Clustering[J]. Machine Learning, 1987,2(2):139-172.
[49]
Cai H Y, Huang Z, Srivastava D, et al. Indexing Evolving Events from Tweet Streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2015,27(11):3001-3015.
doi: 10.1109/TKDE.2015.2445773
[50]
Feng W, Zhang C, Zhang W, et al. STREAMCUBE: Hierarchical Spatio-temporal Hashtag Clustering for Event Exploration over the Twitter Stream[C]// Proceedings of the 31st IEEE International Conference on Data Engineering. 2015: 1561-1572.
[51]
Alsaedi N, Burnap P, Rana O F, et al. Can We Predict a Riot? Disruptive Event Detection Using Twitter[J]. ACM Transactions on Internet Technology, 2017,(2):Article No.18.
[52]
Hasan M, Orgun M A, Schwitter R. Real-time Event Detection from the Twitter Data Stream Using the Twitternews+Framework[J]. Information Processing & Management, 2019,55(3):1146-1165.
[53]
Ozdikis O, Karagoz P, Oğuztüzün H, et al. Incremental Clustering with Vector Expansion for Online Event Detection in Microblogs[J]. Social Network Analysis and Mining,2017,7(1):Article No.56.
[54]
Comito C, Forestiero A, Pizzuti C. Bursty Event Detection in Twitter Streams[J]. ACM Transactions on Knowledge Discovery from Data, 2019,13(4):1-28.
[55]
Becker H, Naaman M, Gravano L. Beyond Trending Topics: Real-world Event Identification on Twitter[C]// Proceedings of the 5th International Conference on Weblogs and Social Media. 2011: 438-441.
[56]
Zhou Y W, Kanhabua N, Cristea A I. Real-time Timeline Summarisation for High-impact Events in Twitter[C]// Proceedings of the 22nd European Conference on Artificial Intelligence. 2016: 1158-1166.
[57]
Erkan G, Radev D R. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization[J]. Journal of Artificial Intelligence Research, 2004,22(1):457-479.
doi: 10.1613/jair.1523
[58]
Wang Z H, Shou L D, Chen K, et al. On Summarization and Timeline Generation for Evolutionary Tweet Streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2015,27(5):1301-1315.
doi: 10.1109/TKDE.2014.2345379
[59]
Chang Y, Tang J L, Yin D W, et al. Timeline Summarization from Social Media with Life Cycle Models[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016: 3698-3704.
[60]
Friedman J H. Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001,29(5):1189-1232.
[61]
Friedman J M. Hubs, Authorities, and Communities[J]. ACM Computing Surveys, 1999,31(4). DOI: 10.1145/345966.345982.
[62]
Charikar M, Chekuri C, Cheung T, et al. Approximation Algorithms for Directed Steiner Problems[J]. Journal of Algorithms, 1999,33(1):73-91.
doi: 10.1006/jagm.1999.1042
[63]
Sun W J, Wang Y H, Gao Y Q, et al. Comprehensive Event Storyline Generation from Microblogs[C]// Proceedings of the ACM Multimedia Asia, 2019: Article 48.
[64]
Guo B, Ouyang Y, Zhang C, et al. CrowdStory: Fine-grained Event Storyline Generation by Fusion of Multi-modal Crowdsourced Data[C]// Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2017, 1(3):Article No. 55.
doi: 10.1145/3130936
pmid: 30417164
[65]
Ansah J, Liu L, Kang W, et al. A Graph is Worth a Thousand Words: Telling Event Stories Using Timeline Summarization Graphs[C]// Proceedings of the 28th International Conference on World Wide Web. 2019: 2565-2571.
[66]
Goyal P, Kaushik P, Gupta P, et al. Multilevel Event Detection, Storyline Generation, and Summarization for Tweet Streams[J]. IEEE Transactions on Computational Social Systems, 2020,7(1):8-23.
doi: 10.1109/TCSS.6570650
[67]
Lin C, Lin C, Li J X, et al. Generating Event Storylines from Microblogs[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 2012: 175-184.
[68]
Endo Y, Toda H, Koike Y. What’s Hot in The Theme: Query Dependent Emerging Topic Extraction from Social Streams[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 31-32.
[69]
Zhao L, Chen F, Lu C T, et al. Dynamic Theme Tracking in Twitter[C]// Proceedings of 2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015: 561-570.
[70]
Tonon A, Cudre-Mauroux P, Blarer A, et al. ArmaTweet: Detecting Events by Semantic Tweet Analysis[C]// Proceedings of Extended Semantic Web Conference-The Semantic Web. 2017: 138-153.
[71]
Bhardwaj A, Blarer A, Cudremauroux P, et al. Event Detection on Microposts: A Comparison of Four Approaches[J]. IEEE Transactions on Knowledge and Data Engineering, 2019. DOI: 10.1109/TKDE.2019.2944815.
doi: 10.1109/TKDE.2010.148
pmid: 21617742
[72]
Brigadir I, Greene D, Cunningham P. Adaptive Representations for Tracking Breaking News on Twitter[OL]. arXiv Preprint, arXiv: 1403. 2923.
[73]
Lu R, Xu Z H, Zhang Y, et al. Trends Predicting of Topics on Twitter based on MACD[C]// Proceedings of the 4th International Conference on Machine Learning and Computing. 2012: 44-49.
[74]
Liu W W, Deng Z H, Gong X W, et al. Effectively Predicting Whether and When a Topic Will Become Prevalent in a Social Network[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 210-216.
[75]
Ma X, Gao X F, Chen G H. BEEP: A Bayesian Perspective Early Stage Event Prediction Model for Online Social Networks[C]// Proceedings of 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 2017. DOI: 10.1109/ICDM.2017.124.
[76]
Wang C K, Xin X, Shang J W. When to Make a Topic Popular Again? A Temporal Model for Topic Re-hotting Prediction in Online Social Networks[J]. IEEE Transactions on Signal & Information Processing Over Networks, 2017,4(1):202-216.
[77]
Zhang X M, Chen X M, Chen Y, et al. Event Detection and Popularity Prediction in Microblogging[J]. Neurocomputing, 2015,149:1469-1480.
doi: 10.1016/j.neucom.2014.08.045
[78]
Fang A J, Ounis I, MacDonald C, et al. An Effective Approach for Modelling Time Features for Classifying Bursty Topics on Twitter[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018: 1547-1550.
[79]
Wang X, Wang C, Ding Z Y, et al. Predicting the Popularity of Topics Based on User Sentiment in Microblogging Websites[J]. Journal of Intelligent Information Systems, 2018,51(1):97-114.
doi: 10.1007/s10844-017-0486-z
[80]
Wu Q T, Yang C Q, Zhang H R, et al. Adversarial Training Model Unifying Feature Driven and Point Process Perspectives for Event Popularity Prediction[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018: 517-526.
[81]
Chen G D, Kong Q C, Mao W J. An Attention-based Neural Popularity Prediction Model for Social Media Events[C]// Proceedings of 2017 IEEE International Conference on Intelligence & Security Informatics. IEEE, 2017. DOI: 10.1109/ISI.2017.8004898.
[82]
Huang J Y, Tang Y Y, Hu Y, et al. Predicting the Active Period of Popularity Evolution: A Case Study on Twitter Hashtags[J]. Information Sciences, 2019. DOI: 10.1016/j.ins.2019.04.028.
doi: 10.1016/j.ins.2008.10.021
pmid: 32226108
[83]
Yu H, Hu Y, Shi P. A Prediction Method of Peak Time Popularity Based on Twitter Hashtags[J]. IEEE Access, 2020,8:61453-61461.
doi: 10.1109/Access.6287639