1School of Information Management, Nanjing University, Nanjing 210023, China 2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China 3Jinling Library, Nanjing 210023, China
[Objective] This paper tries to learn the text contexts and the polysemy of words, aiming to improve the performance of automatic text classification. [Objective] We proposed a GCN model for long text classification with associated information. First, we used BERT to obtain the initial features of word vectors of the long texts. Then, we input these initial features into the BiLSTM model to capture their semantic relationship. Third, we represented the word features as nodes of the graph convolutional network SGCN. Fourth, we used the vector similarity between words as the edge to connect the nodes, and construct a graph structure. Finally, we input the long text representation from SGCN into the fully connected layers to finish the classification tasks. [Results] We examined our model with Chinese scientific literature having multiple subjects. The accuracy of our model is 0.834 09, which is better than the benchmark model. [Limitations] We only treated the texts as single topic ones for multi-classification tasks. [Conclusions] The proposed model based on BERT, BiLSTM and SGCN algorithms could effectively classify long texts.
周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information. Data Analysis and Knowledge Discovery, 2021, 5(9): 31-41.
( Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3):80-87.)
( Wang Hao, Deng Sanhong, Zhu Liping, et al. A Study of Intelligence Value and Employment of Political Data in Big Data Environment: The Risk Avoidance of Customs Declaration Commodities[J]. Scientific Information Research, 2020, 2(4):74-89.)
( Zhang Chengzhi, Li Zhuo, Chu Heting. Using Full Content to Automatically Classify the Research Methods of Academic Articles[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8):852-862.)
( Shi Qin, Li Yang. Research on Text Resource Classification of Humanities and Social Sciences Thematic Database Based on Deep Learning: Taking “XinHua Silkroad” Database and “One Belt One Road” Database as Examples[J]. Journal of Information Resources Management, 2020, 10(5):23-29, 37.)
( Wang Xiwei, Xing Yunfei, Wei Ya'nan, et al. Research on the Topic Model Construction of Sentiment Classification of Public Opinion Users in Social Networks Driven by Big Data: Taking “Immigration” as the Topic[J]. Journal of Information Resources Management, 2020, 10(1):29-38, 48.)
( Lu Wei, Li Pengcheng, Zhang Guobiao, et al. Recognition of Lexical Functions in Academic Texts: Automatic Classification of Keywords Based on BERT Vectorization[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(12):1320-1329.)
( Zhao Yang, Zhang Zhixiong, Liu Huan, et al. Classification of Chinese Medical Literature with BERT Model[J]. Data Analysis and Knowledge Discovery, 2020, 4(8):41-49.)
Tang H L, Mi Y, Xue F, et al. An Integration Model Based on Graph Convolutional Network for Text Classification[J]. IEEE Access, 2020, 8:148865-148876.
Li G H, Müller M, Thabet A, et al. DeepGCNs: Can GCNs Go as Deep as CNNs? [C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019: 9266-9275.
Liu J X, Meng F R, Zhou Y, et al. Character-Level Neural Networks for Short Text Classification [C]//Proceedings of 2017 International Smart Cities Conference. IEEE, 2017. DOI: 10.1109/ISC2.2017.8090812.
( Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10):1046-1059.)
Khatri A, Pranav P, Anand K M. Sarcasm Detection in Tweets with BERT and GloVe Embeddings[OL]. arXiv Preprint,arXiv: 2006. 11512.
Sharfuddin A A, Tihami M N, Islam M S. A Deep Recurrent Neural Network with BiLSTM Model for Sentiment Classification [C]//Proceedings of 2018 International Conference on Bangla Speech and Language Processing. IEEE, 2018.
Lu Z B, Du P, Nie J Y. VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification [C]//Proceedings of European Conference on Information Retrieval. Springer, Cham, 2020: 369-382.
Chen H Y, Lin Y S, Lee C C. Through the Words of Viewers: Using Comment-Content Entangled Network for Humor Impression Recognition [C]//Proceedings of 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021.