1School of Information Management, Nanjing University, Nanjing 210023, China 2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China 3Jinling Library, Nanjing 210023, China
[Objective] This paper tries to learn the text contexts and the polysemy of words, aiming to improve the performance of automatic text classification. [Objective] We proposed a GCN model for long text classification with associated information. First, we used BERT to obtain the initial features of word vectors of the long texts. Then, we input these initial features into the BiLSTM model to capture their semantic relationship. Third, we represented the word features as nodes of the graph convolutional network SGCN. Fourth, we used the vector similarity between words as the edge to connect the nodes, and construct a graph structure. Finally, we input the long text representation from SGCN into the fully connected layers to finish the classification tasks. [Results] We examined our model with Chinese scientific literature having multiple subjects. The accuracy of our model is 0.834 09, which is better than the benchmark model. [Limitations] We only treated the texts as single topic ones for multi-classification tasks. [Conclusions] The proposed model based on BERT, BiLSTM and SGCN algorithms could effectively classify long texts.
周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information. Data Analysis and Knowledge Discovery, 2021, 5(9): 31-41.
( He Ming, Sun Jianjun, Cheng Ying. Text Classification Based on Naive Bayes: A Review[J]. Information Science, 2016, 34(7):147-154.)
[2]
雷飞. 基于神经网络和决策树的文本分类及其应用研究[D]. 成都: 电子科技大学, 2018.
[2]
( Lei Fei. Research on Text Classification Based on Neural Network and Decision Tree and Its Application[D]. Chengdu: University of Electronic Science and Technology of China, 2018.)
( Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3):80-87.)
( Wan Qibin, Dong Fangmin, Sun Shuifa. Text Classification Method Based on BiLSTM-Attention-CNN Hybrid Neural Network[J]. Computer Applications and Software, 2020, 37(9):94-98, 201.)
( Shao Liangshan, Zhou Yu. Semantic Rules and RNN Based Sentiment Classification for Online Reviews[J]. Journal of Chinese Information Processing, 2019, 33(6):124-131.)
[6]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[7]
Yao L, Mao C S, Luo Y. Graph Convolutional Networks for Text Classification [C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 7370-7377.
[8]
Gao L C, Wang J K, Pi Z X, et al. A Hybrid GCN and RNN Structure Based on Attention Mechanism for Text Classification[J]. Journal of Physics: Conference Series, 2020, 1575:Article No. 012130.
( Fan Tao, Wu Peng, Cao Qi. The Research of Sentiment Recognition of Online Users Based on DNNs Multimodal Fusion[J]. Journal of Information Resources Management, 2020, 10(1):39-48.)
( Du Ruopeng, Xian Guojian, Kou Yuantao. Improvement and Application of TF-IDF-CHI in Agricultural Science Text Feature Extraction[J]. Digital Library Forum, 2019(8):18-24.)
( Jin Chunyan, Mu Dongmei, Wang Ping, et al. Research on Sentiment Analysis Method Integrating Emoticon Feature of Online Public Opinion[J]. Scientific Information Research, 2020, 2(4):13-22.)
( Wang Hao, Deng Sanhong, Zhu Liping, et al. A Study of Intelligence Value and Employment of Political Data in Big Data Environment: The Risk Avoidance of Customs Declaration Commodities[J]. Scientific Information Research, 2020, 2(4):74-89.)
( Zhang Chengzhi, Li Zhuo, Chu Heting. Using Full Content to Automatically Classify the Research Methods of Academic Articles[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8):852-862.)
( Lyu Lucheng, Han Tao, Zhou Jian, et al. Research on the Method of Chinese Patent Automatic Classification Based on Deep Learning[J]. Library and Information Service, 2020, 64(10):75-85.)
( Shi Qin, Li Yang. Research on Text Resource Classification of Humanities and Social Sciences Thematic Database Based on Deep Learning: Taking “XinHua Silkroad” Database and “One Belt One Road” Database as Examples[J]. Journal of Information Resources Management, 2020, 10(5):23-29, 37.)
( Wang Qian, Zeng Jin, Liu Jiawei, et al. Structure Function Recognition of Academic Text Paragraph Based on Deep Learning[J]. Information Science, 2020, 38(3):64-69.)
( Xu Xukan, Zhou Zeyu. A Multi-scale BiLSTM-CNN Based Emotion Classification Model for WeChat Tweets and Its Application[J]. Information Science, 2021, 39(5):130-137.)
( Wang Xiwei, Xing Yunfei, Wei Ya'nan, et al. Research on the Topic Model Construction of Sentiment Classification of Public Opinion Users in Social Networks Driven by Big Data: Taking “Immigration” as the Topic[J]. Journal of Information Resources Management, 2020, 10(1):29-38, 48.)
( Xu Tongyang, Yin Kai. Text Classification of Digital Library Based on Deep Learning[J]. Information Science, 2019, 37(10):13-19.)
[20]
Yu S S, Su J D, Luo D. Improving BERT-Based Text Classification with Auxiliary Sentence and Domain Knowledge[J]. IEEE Access, 2019, 7:176600-176612.
doi: 10.1109/Access.6287639
( Lu Wei, Li Pengcheng, Zhang Guobiao, et al. Recognition of Lexical Functions in Academic Texts: Automatic Classification of Keywords Based on BERT Vectorization[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(12):1320-1329.)
( Zhao Yang, Zhang Zhixiong, Liu Huan, et al. Classification of Chinese Medical Literature with BERT Model[J]. Data Analysis and Knowledge Discovery, 2020, 4(8):41-49.)
[23]
Tang H L, Mi Y, Xue F, et al. An Integration Model Based on Graph Convolutional Network for Text Classification[J]. IEEE Access, 2020, 8:148865-148876.
doi: 10.1109/Access.6287639
[24]
Li G H, Müller M, Thabet A, et al. DeepGCNs: Can GCNs Go as Deep as CNNs? [C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019: 9266-9275.
[25]
Liu J X, Meng F R, Zhou Y, et al. Character-Level Neural Networks for Short Text Classification [C]//Proceedings of 2017 International Smart Cities Conference. IEEE, 2017. DOI: 10.1109/ISC2.2017.8090812.
doi: 10.1109/ISC2.2017.8090812
( Zhang Xiaodan. The Application of Improved Graph Convolutional Neural Network in Big Data Classification of Scientific and Technological Documents[J]. Journal of Intelligence, 2021, 40(1):184-188.)
[27]
郭利敏. 基于卷积神经网络的文献自动分类研究[J]. 图书与情报, 2017(6):96-103.
[27]
( Guo Limin. Study of Automatic Classification of Literature Based on Convolution Neural Network[J]. Library & Information, 2017(6):96-103.)
( Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10):1046-1059.)
[29]
Khatri A, Pranav P, Anand K M. Sarcasm Detection in Tweets with BERT and GloVe Embeddings[OL]. arXiv Preprint,arXiv: 2006. 11512.
[30]
Sharfuddin A A, Tihami M N, Islam M S. A Deep Recurrent Neural Network with BiLSTM Model for Sentiment Classification [C]//Proceedings of 2018 International Conference on Bangla Speech and Language Processing. IEEE, 2018.
[31]
Lu Z B, Du P, Nie J Y. VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification [C]//Proceedings of European Conference on Information Retrieval. Springer, Cham, 2020: 369-382.
[32]
Chen H Y, Lin Y S, Lee C C. Through the Words of Viewers: Using Comment-Content Entangled Network for Humor Impression Recognition [C]//Proceedings of 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021.