|
|
Construction and Application of GCN Model for Text Classification with Associated Information |
Zhou Zeyu1,2,Wang Hao1,2( ),Zhao Zibo1,2,Li Yueyan1,2,Zhang Xiaoqin3 |
1School of Information Management, Nanjing University, Nanjing 210023, China 2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China 3Jinling Library, Nanjing 210023, China |
|
|
Abstract [Objective] This paper tries to learn the text contexts and the polysemy of words, aiming to improve the performance of automatic text classification. [Objective] We proposed a GCN model for long text classification with associated information. First, we used BERT to obtain the initial features of word vectors of the long texts. Then, we input these initial features into the BiLSTM model to capture their semantic relationship. Third, we represented the word features as nodes of the graph convolutional network SGCN. Fourth, we used the vector similarity between words as the edge to connect the nodes, and construct a graph structure. Finally, we input the long text representation from SGCN into the fully connected layers to finish the classification tasks. [Results] We examined our model with Chinese scientific literature having multiple subjects. The accuracy of our model is 0.834 09, which is better than the benchmark model. [Limitations] We only treated the texts as single topic ones for multi-classification tasks. [Conclusions] The proposed model based on BERT, BiLSTM and SGCN algorithms could effectively classify long texts.
|
Received: 16 March 2021
Published: 15 October 2021
|
|
Fund:*National Natural Science Foundation of China(72074108);2020 Wuxi Association for Science and Technology Soft Science Research Project(KT-20-C058);Innovative Research Project for Doctoral Candidates of Nanjing University(CXYJ21-69) |
Corresponding Authors:
Wang Hao
E-mail: ywhaowang@nju.edu.cn
|
[1] |
贺鸣, 孙建军, 成颖. 基于朴素贝叶斯的文本分类研究综述[J]. 情报科学, 2016, 34(7):147-154.
|
[1] |
( He Ming, Sun Jianjun, Cheng Ying. Text Classification Based on Naive Bayes: A Review[J]. Information Science, 2016, 34(7):147-154.)
|
[2] |
雷飞. 基于神经网络和决策树的文本分类及其应用研究[D]. 成都: 电子科技大学, 2018.
|
[2] |
( Lei Fei. Research on Text Classification Based on Neural Network and Decision Tree and Its Application[D]. Chengdu: University of Electronic Science and Technology of China, 2018.)
|
[3] |
王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类研究中的应用[J]. 现代图书情报技术, 2014(3):80-87.
|
[3] |
( Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3):80-87.)
|
[4] |
万齐斌, 董方敏, 孙水发. 基于BiLSTM-Attention-CNN混合神经网络的文本分类方法[J]. 计算机应用与软件, 2020, 37(9):94-98, 201.
|
[4] |
( Wan Qibin, Dong Fangmin, Sun Shuifa. Text Classification Method Based on BiLSTM-Attention-CNN Hybrid Neural Network[J]. Computer Applications and Software, 2020, 37(9):94-98, 201.)
|
[5] |
邵良杉, 周玉. 基于语义规则与RNN模型的在线评论情感分类研究[J]. 中文信息学报, 2019, 33(6):124-131.
|
[5] |
( Shao Liangshan, Zhou Yu. Semantic Rules and RNN Based Sentiment Classification for Online Reviews[J]. Journal of Chinese Information Processing, 2019, 33(6):124-131.)
|
[6] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
|
[7] |
Yao L, Mao C S, Luo Y. Graph Convolutional Networks for Text Classification [C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 7370-7377.
|
[8] |
Gao L C, Wang J K, Pi Z X, et al. A Hybrid GCN and RNN Structure Based on Attention Mechanism for Text Classification[J]. Journal of Physics: Conference Series, 2020, 1575:Article No. 012130.
|
[9] |
范涛, 吴鹏, 曹琪. 基于深度学习的多模态融合网民情感识别研究[J]. 信息资源管理学报, 2020, 10(1):39-48.
|
[9] |
( Fan Tao, Wu Peng, Cao Qi. The Research of Sentiment Recognition of Online Users Based on DNNs Multimodal Fusion[J]. Journal of Information Resources Management, 2020, 10(1):39-48.)
|
[10] |
杜若鹏, 鲜国建, 寇远涛. 基于改进TF-IDF-CHI算法的农业科技文献文本特征抽取[J]. 数字图书馆论坛, 2019(8):18-24.
|
[10] |
( Du Ruopeng, Xian Guojian, Kou Yuantao. Improvement and Application of TF-IDF-CHI in Agricultural Science Text Feature Extraction[J]. Digital Library Forum, 2019(8):18-24.)
|
[11] |
靳春妍, 牟冬梅, 王萍, 等. 融入表情特征的网络舆情情感分析方法研究[J]. 科技情报研究, 2020, 2(4):13-22.
|
[11] |
( Jin Chunyan, Mu Dongmei, Wang Ping, et al. Research on Sentiment Analysis Method Integrating Emoticon Feature of Online Public Opinion[J]. Scientific Information Research, 2020, 2(4):13-22.)
|
[12] |
王昊, 邓三鸿, 朱立平, 等. 大数据环境下政务数据的情报价值及其利用研究: 以海关报关商品归类风险规避为例[J]. 科技情报研究, 2020, 2(4):74-89.
|
[12] |
( Wang Hao, Deng Sanhong, Zhu Liping, et al. A Study of Intelligence Value and Employment of Political Data in Big Data Environment: The Risk Avoidance of Customs Declaration Commodities[J]. Scientific Information Research, 2020, 2(4):74-89.)
|
[13] |
章成志, 李卓, 储荷婷. 基于全文内容的学术论文研究方法自动分类研究[J]. 情报学报, 2020, 39(8):852-862.
|
[13] |
( Zhang Chengzhi, Li Zhuo, Chu Heting. Using Full Content to Automatically Classify the Research Methods of Academic Articles[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8):852-862.)
|
[14] |
吕璐成, 韩涛, 周健, 等. 基于深度学习的中文专利自动分类方法研究[J]. 图书情报工作, 2020, 64(10):75-85.
|
[14] |
( Lyu Lucheng, Han Tao, Zhou Jian, et al. Research on the Method of Chinese Patent Automatic Classification Based on Deep Learning[J]. Library and Information Service, 2020, 64(10):75-85.)
|
[15] |
是沁, 李阳. 基于深度学习的人文社科专题数据库文本资源分类研究: 以“新华丝路”数据库与“一带一路”专题库为例[J]. 信息资源管理学报, 2020, 10(5):23-29, 37.
|
[15] |
( Shi Qin, Li Yang. Research on Text Resource Classification of Humanities and Social Sciences Thematic Database Based on Deep Learning: Taking “XinHua Silkroad” Database and “One Belt One Road” Database as Examples[J]. Journal of Information Resources Management, 2020, 10(5):23-29, 37.)
|
[16] |
王倩, 曾金, 刘家伟, 等. 基于深度学习的学术文本段落结构功能识别研究[J]. 情报科学, 2020, 38(3):64-69.
|
[16] |
( Wang Qian, Zeng Jin, Liu Jiawei, et al. Structure Function Recognition of Academic Text Paragraph Based on Deep Learning[J]. Information Science, 2020, 38(3):64-69.)
|
[17] |
徐绪堪, 周泽聿. 基于多尺度BiLSTM-CNN的微信推文的情感分类模型及应用研究[J]. 情报科学, 2021, 39(5):130-137.
|
[17] |
( Xu Xukan, Zhou Zeyu. A Multi-scale BiLSTM-CNN Based Emotion Classification Model for WeChat Tweets and Its Application[J]. Information Science, 2021, 39(5):130-137.)
|
[18] |
王晰巍, 邢云菲, 韦雅楠, 等. 大数据驱动的社交网络舆情用户情感主题分类模型构建研究: 以“移民”主题为例[J]. 信息资源管理学报, 2020, 10(1):29-38, 48.
|
[18] |
( Wang Xiwei, Xing Yunfei, Wei Ya'nan, et al. Research on the Topic Model Construction of Sentiment Classification of Public Opinion Users in Social Networks Driven by Big Data: Taking “Immigration” as the Topic[J]. Journal of Information Resources Management, 2020, 10(1):29-38, 48.)
|
[19] |
徐彤阳, 尹凯. 基于深度学习的数字图书馆文本分类研究[J]. 情报科学, 2019, 37(10):13-19.
|
[19] |
( Xu Tongyang, Yin Kai. Text Classification of Digital Library Based on Deep Learning[J]. Information Science, 2019, 37(10):13-19.)
|
[20] |
Yu S S, Su J D, Luo D. Improving BERT-Based Text Classification with Auxiliary Sentence and Domain Knowledge[J]. IEEE Access, 2019, 7:176600-176612.
doi: 10.1109/Access.6287639
|
[21] |
陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别: 基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12):1320-1329.
|
[21] |
( Lu Wei, Li Pengcheng, Zhang Guobiao, et al. Recognition of Lexical Functions in Academic Texts: Automatic Classification of Keywords Based on BERT Vectorization[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(12):1320-1329.)
|
[22] |
赵旸, 张智雄, 刘欢, 等. 基于BERT模型的中文医学文献分类研究[J]. 数据分析与知识发现, 2020, 4(8):41-49.
|
[22] |
( Zhao Yang, Zhang Zhixiong, Liu Huan, et al. Classification of Chinese Medical Literature with BERT Model[J]. Data Analysis and Knowledge Discovery, 2020, 4(8):41-49.)
|
[23] |
Tang H L, Mi Y, Xue F, et al. An Integration Model Based on Graph Convolutional Network for Text Classification[J]. IEEE Access, 2020, 8:148865-148876.
doi: 10.1109/Access.6287639
|
[24] |
Li G H, Müller M, Thabet A, et al. DeepGCNs: Can GCNs Go as Deep as CNNs? [C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019: 9266-9275.
|
[25] |
Liu J X, Meng F R, Zhou Y, et al. Character-Level Neural Networks for Short Text Classification [C]//Proceedings of 2017 International Smart Cities Conference. IEEE, 2017. DOI: 10.1109/ISC2.2017.8090812.
doi: 10.1109/ISC2.2017.8090812
|
[26] |
张晓丹. 改进的图神经网络文本分类模型应用研究: 以NSTL科技期刊文献分类为例[J]. 情报杂志, 2021, 40(1):184-188.
|
[26] |
( Zhang Xiaodan. The Application of Improved Graph Convolutional Neural Network in Big Data Classification of Scientific and Technological Documents[J]. Journal of Intelligence, 2021, 40(1):184-188.)
|
[27] |
郭利敏. 基于卷积神经网络的文献自动分类研究[J]. 图书与情报, 2017(6):96-103.
|
[27] |
( Guo Limin. Study of Automatic Classification of Literature Based on Convolution Neural Network[J]. Library & Information, 2017(6):96-103.)
|
[28] |
罗鹏程, 王一博, 王继民. 基于深度预训练语言模型的文献学科自动分类研究[J]. 情报学报, 2020, 39(10):1046-1059.
|
[28] |
( Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10):1046-1059.)
|
[29] |
Khatri A, Pranav P, Anand K M. Sarcasm Detection in Tweets with BERT and GloVe Embeddings[OL]. arXiv Preprint,arXiv: 2006. 11512.
|
[30] |
Sharfuddin A A, Tihami M N, Islam M S. A Deep Recurrent Neural Network with BiLSTM Model for Sentiment Classification [C]//Proceedings of 2018 International Conference on Bangla Speech and Language Processing. IEEE, 2018.
|
[31] |
Lu Z B, Du P, Nie J Y. VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification [C]//Proceedings of European Conference on Information Retrieval. Springer, Cham, 2020: 369-382.
|
[32] |
Chen H Y, Lin Y S, Lee C C. Through the Words of Viewers: Using Comment-Content Entangled Network for Humor Impression Recognition [C]//Proceedings of 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|