1Key Laboratory of Trustworthy Distributed Computing and Service, Beijing University of Posts and Telecommunications, Beijing 100876, China 2School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China 3School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China 4School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China 5Beijing University of Posts and Telecommunications Library, Beijing 100876, China 6School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
[Objective] This paper proposes a classification model for sensitive texts in online communities based on a graph neural network, which supports public opinion governance and information security. [Methods] First, we constructed a heterogeneous graph based on sensitive entities of texts and words, which included the existing knowledge about sensitive information of online public opinion. Second, we adopted BERT and GCN to capture high-level semantic information of the text and global co-occurrence features. Third, we combined the complementary advantages of pre-training and graph models to address heterogeneous issues due to structural differences between long and short texts. Finally, we classified sensitive texts based on features of online public opinion. [Results] We examined the proposed model on a self-made sensitive text dataset of online public opinion. The accuracy of our method reached 70.80%, which was 3.52% higher than that of other models. [Limitations] Large heterogeneous graphs built on long texts will reduce the computing speed. [Conclusions] The proposed model could effectively identify and classify sensitive content from different online texts.
Maron M E. Automatic Indexing: An Experimental Inquiry[J]. Journal of the ACM, 1961, 8(3): 404-417.
doi: 10.1145/321075.321084
[2]
Cover T, Hart P. Nearest Neighbor Pattern Classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27.
doi: 10.1109/TIT.1967.1053964
[3]
Drucker H, Wu D, Vapnik V N. Support Vector Machines for Spam Categorization[J]. IEEE Transactions on Neural Networks, 1999, 10(5): 1048-1054.
doi: 10.1109/72.788645
pmid: 18252607
[4]
Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[5]
Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-Task Learning[OL]. arXiv Preprint, arXiv: 1605.05101.
[6]
Tai K S, Socher R, Manning C D. Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks[OL]. arXiv Preprint, arXiv:1503.00075.
[7]
Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2267-2273.
[8]
Wu Z H, Pan S R, Chen F W, et al. A Comprehensive Survey on Graph Neural Networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.
doi: 10.1109/TNNLS.5962385
[9]
Kipf T N, Welling M. Semi-Supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv:1609.02907.
[10]
Yao L, Mao C S, Luo Y. Graph Convolutional Networks for Text Classification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019: 7370-7377.
[11]
Huang L Z, Ma D H, Li S J, et al. Text Level Graph Neural Network for Text Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3444-3450.
[12]
Zhang Y F, Yu X L, Cui Z Y, et al. Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks[OL]. arXiv Preprint, arXiv: 2004.13826.
[13]
Hu L M, Yang T C, Shi C, et al. Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4821-4830.
[14]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[15]
Lin Y X, Meng Y X, Sun X F, et al. BertGCN: Transductive Text Classification by Combining GCN and BERT[OL]. arXiv Preprint, arXiv: 2105.05727.
[16]
Yu Z X, Wu X, Xie X Q, et al. Hot Event Detection for Social Media Based on Keyword Semantic Information[C]// Proceedings of 2019 IEEE 4th International Conference on Data Science in Cyberspace. 2019: 410-415.
[17]
Gao L, Wu X, Wu J C, et al. Sensitive Image Information Recognition Model of Network Community Based on Content Text[C]// Proceedings of 2021 IEEE 6th International Conference on Data Science in Cyberspace. 2021: 47-52.
(Chen Zuqin, Jiang Xun, Ge Jike. Emergency Scenario Analysis Based on Sensitive Information of Online Public Opinion[J]. Journal of Modern Information, 2021, 41(5): 25-32.)
doi: 10.3969/j.issn.1008-0821.2021.05.003
(Zhang Zefeng, Mao Cunli, Yu Zhengtao, et al. Sensitive Judicial Public Opinion Information Recognition with the Domain Terminology Dictionary[J]. Journal of Chinese Information Processing, 2022, 36(9): 76-83, 92.)
[20]
Zeng J C, Li J, Song Y, et al. Topic Memory Networks for Short Text Classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2018: 3120-3131.
[21]
Wang X, Chen R H, Jia Y, et al. Short Text Classification Using Wikipedia Concept Based Document Representation[C]// Proceedings of the International Conference on Information Technology and Applications. 2013: 471-474.
[22]
Lan G, Li Y, Hu M T, et al. Knowledge Graph Integrated Graph Neural Networks for Chinese Medical Text Classification[C]// Proceedings of IEEE International Conference on Bioinformatics and Biomedicine. 2021: 682-687.
[23]
Li Q M, Han Z C, Wu X M. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018.
[24]
Zhou P, Shi W, Tian J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers). 2016: 207-212.
[25]
Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2. 2017: 427-431.
[26]
Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 562-570.
[27]
Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv:1412.6980.