Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 21-30    DOI: 10.11925/infotech.2096-3467.2021.0282
Short-Text Classification Method with Text Features from Pre-trained Models
Chen Jie,Ma Jing(),Li Xiaofeng
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
[Objective] This paper uses word vectors from different pre-trained models to enhance text semantics of Word2Vec, BERT and others, and then significantly improve the news classification. [Objective] We utilized the BERT and ERNIE models to extract context semantics, and the prior knowledge of entities and phrases through Domain-Adaptive Pretraining. Combined with the TextCNN model, the proposed method generated high-order text feature vectors. It also merged these features to achieve semantic enhancement and better short text classification. [Results] We examined the proposed method with public data sets from Today's Headline News and THUCNews. Compared with the traditional Word2Vec word vector representation, the accuracy of our new model improved by 6.37% and 3.50%. Compared with the BERT and ERNIE methods, the accuracy of our new model improved by 1.98% and 1.51% respectively. [Limitations] The news corpus in our study needs to be further expanded. [Conclusions] The proposed method could effectively classify massive short text data, which is of great significance to the follow-up text mining.

Key wordsBERT      ERNIE      Short Text Classification      Text Feature Fusion      Domain-Adaptive Pretraining     
Received: 22 March 2021      Published: 29 June 2021
ZTFLH:  分类号: TP393  
Fund:*National Social Science Fund of China(20ZDA092);Fundamental Research Fund for the Central Universities(NW2020001);Fund for Graduate Innovation Base (Laboratory)(kfjj20200905)
Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models. Data Analysis and Knowledge Discovery, 2021, 5(9): 21-30.

Structure Diagram of BERT Model
ERNIE's Knowledge Masking Strategies
Difference of Random Masking Strategies Between BERT and ERNIE
Research Framework
Method of Extracting and Fusing Text Feature Vector

Positive Negative
Positive 正确肯定
(True Positive, TP)
(False Negative, FN)
Negative 错误肯定
(False Positive, FP)
(True Negative, TN)
Confusion Matrix of Binary Classification Problem
Encoder层数(Number of Layer) 12 12
隐藏层单元数(Hidden Size) 768 768
自注意力机制中的头数 (Heads) 12 12
词典大小(Vocab Size) 21 128 18 000
隐藏层激活函数(Hidden_act) ReLU GELU
填充长度(Padding Size) 32 32
Parameters of BERT and ERNIE
参数名称 参数值
卷积核高度(Filter Size) (2,3,4)
卷积核数目(Number of Filter) 256
批尺寸(Batch Size) 128
随机失活率(Dropout) 0.4
学习率(Learning Rate) 5E-4
优化器(Optimizer) Adam
Parameters of TextCNN Network
方法 今日头条新闻数据集
Method 1 81.73% 87.93%
Method 2 86.55% 89.92%
Method 3 86.12% 89.99%
Method 4 88.06% 91.43%
Method 5 88.10% 91.13%
Test Set's Accuracy of Five Methods in Two Datasets
F1 of Each Category and F1 Average in Today's Headlines Dataset
F1 of Each Category and F1 Average in THUCNews Dataset
