Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network

doi:10.11925/infotech.2096-3467.2022.1250

Data Analysis and Knowledge Discovery

2023, Vol. 7

Issue (11): 26-36 DOI: 10.11925/infotech.2096-3467.2022.1250

Current Issue | Archive | Adv Search

Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network

Gao Haoxin^1,²,Sun Lijuan^1,³,Wu Jingchen^1,⁴,Gao Yutong⁶,Wu Xu^1,^2,⁵(

)

¹Key Laboratory of Trustworthy Distributed Computing and Service, Beijing University of Posts and Telecommunications, Beijing 100876, China
²School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
³School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China
⁴School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
⁵Beijing University of Posts and Telecommunications Library, Beijing 100876, China
⁶School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Download: PDF (1006 KB) HTML ( 24 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This paper proposes a classification model for sensitive texts in online communities based on a graph neural network, which supports public opinion governance and information security. [Methods] First, we constructed a heterogeneous graph based on sensitive entities of texts and words, which included the existing knowledge about sensitive information of online public opinion. Second, we adopted BERT and GCN to capture high-level semantic information of the text and global co-occurrence features. Third, we combined the complementary advantages of pre-training and graph models to address heterogeneous issues due to structural differences between long and short texts. Finally, we classified sensitive texts based on features of online public opinion. [Results] We examined the proposed model on a self-made sensitive text dataset of online public opinion. The accuracy of our method reached 70.80%, which was 3.52% higher than that of other models. [Limitations] Large heterogeneous graphs built on long texts will reduce the computing speed. [Conclusions] The proposed model could effectively identify and classify sensitive content from different online texts.

Key words： Graph Convolutional Network Sensitive Text Classification Heterogeneous Graph BERT

Received: 23 November 2022 Published: 22 March 2023

ZTFLH:

TP183 G350

Fund:National Natural Science Foundation of China(72293583);China Postdoctoral Science Foundation(2022M710463)

Corresponding Authors: Wu Xu，ORCID：0000-0002-1297-2726，E-mail： wux@bupt.edu.cn。

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Haoxin Gao
	Lijuan Sun
	Jingchen Wu
	Yutong Gao
	Xu Wu

Cite this article:

Gao Haoxin, Sun Lijuan, Wu Jingchen, Gao Yutong, Wu Xu. Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network. Data Analysis and Knowledge Discovery, 2023, 7(11): 26-36.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1250 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I11/26

STC-HGCN Model

An Example of Heterogeneous Graph

Schematic of GCN for Sensitive Text

Classification Scheme of Public Opinion Sensitive Texts in the Field of Education

Examples of Sensitive Text

Dataset Information

Results of Contrast Experiment

Results of Ablation Study

[1]	Maron M E. Automatic Indexing: An Experimental Inquiry[J]. Journal of the ACM, 1961, 8(3): 404-417. doi: 10.1145/321075.321084
[2]	Cover T, Hart P. Nearest Neighbor Pattern Classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27. doi: 10.1109/TIT.1967.1053964
[3]	Drucker H, Wu D, Vapnik V N. Support Vector Machines for Spam Categorization[J]. IEEE Transactions on Neural Networks, 1999, 10(5): 1048-1054. doi: 10.1109/72.788645 pmid: 18252607
[4]	Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[5]	Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-Task Learning[OL]. arXiv Preprint, arXiv: 1605.05101.
[6]	Tai K S, Socher R, Manning C D. Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks[OL]. arXiv Preprint, arXiv:1503.00075.
[7]	Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2267-2273.
[8]	Wu Z H, Pan S R, Chen F W, et al. A Comprehensive Survey on Graph Neural Networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24. doi: 10.1109/TNNLS.5962385
[9]	Kipf T N, Welling M. Semi-Supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv:1609.02907.
[10]	Yao L, Mao C S, Luo Y. Graph Convolutional Networks for Text Classification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019: 7370-7377.
[11]	Huang L Z, Ma D H, Li S J, et al. Text Level Graph Neural Network for Text Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3444-3450.
[12]	Zhang Y F, Yu X L, Cui Z Y, et al. Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks[OL]. arXiv Preprint, arXiv: 2004.13826.
[13]	Hu L M, Yang T C, Shi C, et al. Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4821-4830.
[14]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[15]	Lin Y X, Meng Y X, Sun X F, et al. BertGCN: Transductive Text Classification by Combining GCN and BERT[OL]. arXiv Preprint, arXiv: 2105.05727.
[16]	Yu Z X, Wu X, Xie X Q, et al. Hot Event Detection for Social Media Based on Keyword Semantic Information[C]// Proceedings of 2019 IEEE 4th International Conference on Data Science in Cyberspace. 2019: 410-415.
[17]	Gao L, Wu X, Wu J C, et al. Sensitive Image Information Recognition Model of Network Community Based on Content Text[C]// Proceedings of 2021 IEEE 6th International Conference on Data Science in Cyberspace. 2021: 47-52.
[18]	陈祖琴, 蒋勋, 葛继科. 基于网络舆情敏感信息的突发事件情景分析[J]. 现代情报, 2021, 41(5): 25-32. doi: 10.3969/j.issn.1008-0821.2021.05.003
[18]	(Chen Zuqin, Jiang Xun, Ge Jike. Emergency Scenario Analysis Based on Sensitive Information of Online Public Opinion[J]. Journal of Modern Information, 2021, 41(5): 25-32.) doi: 10.3969/j.issn.1008-0821.2021.05.003
[19]	张泽锋, 毛存礼, 余正涛, 等. 融入领域术语词典的司法舆情敏感信息识别[J]. 中文信息学报, 2022, 36(9): 76-83, 92.
[19]	(Zhang Zefeng, Mao Cunli, Yu Zhengtao, et al. Sensitive Judicial Public Opinion Information Recognition with the Domain Terminology Dictionary[J]. Journal of Chinese Information Processing, 2022, 36(9): 76-83, 92.)
[20]	Zeng J C, Li J, Song Y, et al. Topic Memory Networks for Short Text Classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2018: 3120-3131.
[21]	Wang X, Chen R H, Jia Y, et al. Short Text Classification Using Wikipedia Concept Based Document Representation[C]// Proceedings of the International Conference on Information Technology and Applications. 2013: 471-474.
[22]	Lan G, Li Y, Hu M T, et al. Knowledge Graph Integrated Graph Neural Networks for Chinese Medical Text Classification[C]// Proceedings of IEEE International Conference on Bioinformatics and Biomedicine. 2021: 682-687.
[23]	Li Q M, Han Z C, Wu X M. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018.
[24]	Zhou P, Shi W, Tian J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers). 2016: 207-212.
[25]	Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2. 2017: 427-431.
[26]	Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 562-570.
[27]	Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv:1412.6980.

[1]	He Chaocheng, Huang Qian, Li Xinru, Wang Chunying, Wu Jiang. Trending Topics on Metaverse: A Microblog Text Analysis with BERT and DTM[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[2]	Zhao Xuefeng, Wu Delin, Wu Weiwei, Sun Zhuoluo, Hu Jinjin, Lian Ying, Shan Jiayu. Identifying High-Quality Technology Patents Based on Deep Learning and Multi-Category Polling Mechanism——Case Study of Patent Applications[J]. 数据分析与知识发现, 2023, 7(8): 30-45.
[3]	Liu Yang, Ding Xingchen, Ma Lili, Wang Chunyang, Zhu Lifang. Usefulness Detection of Travel Reviews Based on Multi-dimensional Graph Convolutional Networks[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[4]	Xu Guixian, Zhang Zixin, Yu Shaona, Dong Yushuang, Tian Yuan. Tibetan News Text Classification Based on Graph Convolutional Networks[J]. 数据分析与知识发现, 2023, 7(6): 73-85.
[5]	Xu Kang, Yu Shengnan, Chen Lei, Wang Chuandong. Linguistic Knowledge-Enhanced Self-Supervised Graph Convolutional Network for Event Relation Extraction[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[6]	Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information[J]. 数据分析与知识发现, 2023, 7(5): 123-132.
[7]	Su Mingxing, Wu Houyue, Li Jian, Huang Ju, Zhang Shunxiang. AEMIA:Extracting Commodity Attributes Based on Multi-level Interactive Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(2): 108-118.
[8]	Zhang Zhengang, Yu Chuanming. Knowledge Graph Completion Model Based on Entity and Relation Fusion[J]. 数据分析与知识发现, 2023, 7(2): 15-25.
[9]	Zhao Yiming, Pan Pei, Mao Jin. Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[10]	Wang Yufei, Zhang Zhixiong, Zhao Yang, Zhang Mengting, Li Xuesi. Designing and Implementing Automatic Title Generation System for Sci-Tech Papers[J]. 数据分析与知识发现, 2023, 7(2): 61-71.
[11]	Zhang Siyang, Wei Subo, Sun Zhengyan, Zhang Shunxiang, Zhu Guangli, Wu Houyue. Extracting Emotion-Cause Pairs Based on Multi-Label Seq2Seq Model[J]. 数据分析与知识发现, 2023, 7(2): 86-96.
[12]	Liu Shang, Shen Yifan. Detecting Fake News Based on Title-Content Difference[J]. 数据分析与知识发现, 2023, 7(2): 97-107.
[13]	Qiang Zishan,Gu Yijun. Detecting Social Media Rumors Based on Multimodal Heterogeneous Graph[J]. 数据分析与知识发现, 2023, 7(11): 68-78.
[14]	Li Nan, Wang Bo. Recognition and Visual Analysis of Interdisciplinary Semantic Drift[J]. 数据分析与知识发现, 2023, 7(10): 15-24.
[15]	Pan Xiaoyu, Ni Yuan, Jin Chunhua, Zhang Jian. Extracting Value Elements and Constructing Index System for Calligraphy Works Based on Hyperplane-BERT-Louvain Optimized LDA Model[J]. 数据分析与知识发现, 2023, 7(10): 109-118.

Viewed

Full text

Abstract

Cited

Shared

Discussed