|
|
Topic Clustering for Social Media Texts with Heterogeneous Graph Neural Networks |
Feng Xiaodong( ),Hui Kangxin |
School of Public Affairs and Administration, University of Electronic Science and Technology of China, Chengdu 611731, China |
|
|
Abstract [Objective] This paper develops an effective topic clustering method to address the issues of semantic sparsity and multiple interactions of social media texts. [Methods] We constructed a model for the multiple interaction relationship between social media users and online contents with the help of heterogeneous information network. First, we used word embedding method to obtain the representation of texts as the initial input features. Then, we propagated and aggregated representations of nodes with the heterogeneous graph neural network. Finally, we trained the model with representation of text nodes, and conducted an unsupervised clustering for the topics. [Results] We examined our model on the English benchmark data set, and found its NMI for original posts and comments reached 0.837 2 and 0.868 9 respectively, which were higher than those of the traditional LDA or directly clustering method with words or text embedding vectors by Word2Vec, Doc2Vec, or GolVe. [Limitations] Due to the limits of data, we did not examine the social relationship among users and multimedia contents online. [Conclusions] The proposed model can effectively improve the topic clustering for social media texts.
|
Received: 13 January 2022
Published: 16 November 2022
|
|
Fund:Humanities and Social Sciences Foundation of the Ministry of Education, China(20YJAZH027);National Natural Science Foundation of China(72004021) |
Corresponding Authors:
Feng Xiaodong, ORCID:0000-0001-9975-9807
E-mail: fengxd1988@hotmail.com
|
[1] |
颜端武, 梅喜瑞, 杨雄飞, 等. 基于主题模型和词向量融合的微博文本主题聚类研究[J]. 现代情报, 2021, 41(10): 67-74.
doi: 10.3969/j.issn.1008-0821.2021.10.008
|
[1] |
(Yan Duanwu, Mei Xirui, Yang Xiongfei, et al. Research on Microblog Text Topic Clustering Based on the Fusion of Topic Model and Word Embedding[J]. Journal of Modern Information, 2021, 41(10): 67-74.)
doi: 10.3969/j.issn.1008-0821.2021.10.008
|
[2] |
Li X M, Li C C, Chi J J, et al. Short Text Topic Modeling by Exploring Original Documents[J]. Knowledge and Information Systems, 2018, 56(2): 443-462.
doi: 10.1007/s10115-017-1099-0
|
[3] |
Mehrotra R, Sanner S, Buntine W, et al. Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013: 889-892.
|
[4] |
Vavliakis K N, Symeonidis A L, Mitkas P A. Event Identification in Web Social Media Through Named Entity Recognition and Topic Modeling[J]. Data & Knowledge Engineering, 2013, 88: 1-24.
doi: 10.1016/j.datak.2013.08.006
|
[5] |
Curiskis S A, Drake B, Osborn T R, et al. An Evaluation of Document Clustering and Topic Modelling in Two Online Social Networks: Twitter and Reddit[J]. Information Processing & Management, 2020, 57(2): 102034.
doi: 10.1016/j.ipm.2019.04.002
|
[6] |
Wu S Z, Zhang H P, Xu C C, et al. Text Clustering on Short Message by Using Deep Semantic Representation[C]// Proceedings of the 4th International Conference on Computer, Communication and Computational Sciences. 2019: 133-145.
|
[7] |
Zhang C X, Song D J, Huang C, et al. Heterogeneous Graph Neural Network[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 793-803.
|
[8] |
Xu S Y, Yang C, Shi C, et al. Topic-Aware Heterogeneous Graph Neural Network for Link Prediction[C]// Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021: 2261-2270.
|
[9] |
Allan J. Topic Detection and Tracking: Event-Based Information Organization[M]. Springer Science & Business Media, 2012.
|
[10] |
Yang Y M, Pierce T, Carbonell J. A Study of Retrospective and On-Line Event Detection[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998: 28-36.
|
[11] |
Pons-Porrata A, Berlanga-Llavori R, Ruiz-Shulcloper J. Topic Discovery Based on Text Mining Techniques[J]. Information Processing & Management, 2007, 43(3): 752-768.
doi: 10.1016/j.ipm.2006.06.001
|
[12] |
蔡永明, 长青. 共词网络LDA模型的中文短文本主题分析[J]. 情报学报, 2018, 37(3): 305-317.
|
[12] |
(Cai Yongming, Chang Qing. Chinese Short Text Topic Analysis by Latent Dirichlet Allocation Model with Co-Word Network Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(3): 305-317.)
|
[13] |
王曰芬, 许杜娟, 杨振怡, 等. 舆情评论与新闻报道的话题识别及其主题关联分析[J]. 现代情报, 2018, 38(6): 3-10.
doi: 10.3969/j.issn.1008-0821.2018.06.001
|
[13] |
(Wang Yuefen, Xu Dujuan, Yang Zhenyi, et al. Topic Detection and Subject Association Analysis on Public Opinions and News Reports[J]. Journal of Modern Information, 2018, 38(6): 3-10.)
doi: 10.3969/j.issn.1008-0821.2018.06.001
|
[14] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 2013 Annual Conference on Neural Information Processing System. 2013: 3111-3119.
|
[15] |
Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2014: 1532-1543.
|
[16] |
Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
|
[17] |
Li C Z, Guo J Y, Lu Y, et al.LDA Meets Word2Vec: A Novel Model for Academic Abstract Clustering[C]// Proceedings of the 2018 Web Conference Companion. 2018: 1699-1706.
|
[18] |
阮光册, 夏磊. 基于Doc2Vec的期刊论文热点选题识别[J]. 情报理论与实践, 2019, 42
|
[18] |
(Ruan Guangce, Xia Lei. Hot Topic Detection in Journal Papers Based on Doc2Vec[J]. Information Studies: Theory & Application, 2019, 42(4): 107-111.)
|
[19] |
高永兵, 杨贵朋, 张娣, 等. 基于突显词博文聚类的官微事件检测方法[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
|
[19] |
(Gao Yongbing, Yang Guipeng, Zhang Di, et al. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. Data Analysis and Knowledge Discovery, 2017, 1(9): 57-64.)
|
[20] |
Kipf T N, Welling M. Semi-Supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv: 1609.02907.
|
[21] |
Hamilton W L, Ying R, Leskovec J. Inductive Representation Learning on Large Graphs[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 1025-1035.
|
[22] |
Wang X, Ji H Y, Shi C, et al. Heterogeneous Graph Attention Network[C]// Proceedings of the 2019 World Wide Web Conference. ACM, 2019: 2022-2032.
|
[23] |
Fu X Y, Zhang J N, Meng Z Q, et al. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding[C]// Proceedings of the 2020 World Wide Web Conference. ACM, 2020: 2331-2341.
|
[24] |
Hu Z N, Dong Y X, Wang K S, et al. Heterogeneous Graph Transformer[C]// Proceedings of the 2020 World Wide Web Conference. ACM, 2020: 2704-2710.
|
[25] |
Wang X, Liu N, Han H, et al. Self-Supervised Heterogeneous Graph Neural Network with Co-Contrastive Learning[C]// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021: 1726-1736.
|
[26] |
Jin D, Huo C Y, Liang C D, et al. Heterogeneous Graph Neural Network via Attribute Completion[C]// Proceedings of the 2020 World Wide Web Conference. ACM, 2021: 391-400.
|
[27] |
Bastings J, Titov I, Aziz W, et al. Graph Convolutional Encoders for Syntax-Aware Neural Machine Translation[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1957-1967.
|
[28] |
Yao L, Mao C S, Luo Y. Graph Convolutional Networks for Text Classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 7370-7377.
doi: 10.1609/aaai.v33i01.33017370
|
[29] |
Yang T C, Hu L M, Shi C, et al. HGAT: Heterogeneous Graph Attention Networks for Semi-Supervised Short Text Classification[J]. ACM Transactions on Information Systems, 2021, 39(3): 1-29.
|
[30] |
Lai Y N, Zhang L F, Han D H, et al. Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks[J]. World Wide Web, 2020, 23(5): 2771-2787.
doi: 10.1007/s11280-020-00803-0
|
[31] |
范涛, 王昊, 吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
|
[31] |
(Fan Tao, Wang Hao, Wu Peng. Sentiment Analysis of Online Users’ Negative Emotions Based on Graph Convolutional Network and Dependency Parsing[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 97-106.)
|
[32] |
周泽聿, 王昊, 赵梓博, 等. 融合关联信息的GCN文本分类模型构建及其应用研究[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
|
[32] |
(Zhou Zeyu, Wang Hao, Zhao Zibo, et al. Construction and Application of GCN Model for Text Classification with Associated Information[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 31-41.)
|
[33] |
Zhou J, Cui G Q, Hu S D, et al. Graph Neural Networks: A Review of Methods and Applications[J]. AI Open, 2020, 1: 57-81.
doi: 10.1016/j.aiopen.2021.01.001
|
[34] |
Dong Y X, Chawla N V, Swami A.Metapath2Vec: Scalable Representation Learning for Heterogeneous Networks[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 135-144.
|
[35] |
Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 701-710.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|