|
|
Classifying Short-texts with Class Feature Extension |
Yunfei Shao(),Dongsu Liu |
School of Economics of Management, Xidian University, Xi’an 710126, China |
|
|
Abstract [Objective] This paper proposes a short text classification method based on category feature extension, aiming to address the issue of sparse content in short texts. [Methods] We used the improved TF-IDF model and LDA topic model to construct the keyword set and topic distribution set, which were all based on category features. Then, we expanded the content and vector representations of short texts. Finally, we classified short texts with the help of convolutional neural network. [Results] The classification precision rate of the proposed method was improved by 3.0%, and the recall rate was improved by 4.1%. [Limitations] Only examined the new method with convolutional neural network. [Conclusions] The proposed method can improve the effectiveness of categorization procedures for short texts.
|
Received: 18 December 2018
Published: 23 October 2019
|
|
[1] |
王峥, 刘师培, 彭艳兵 . 基于句法决策树和SVM的短文本语境识别模型[J]. 计算机与现代化, 2017(3):13-17.
|
[1] |
( Wang Zheng, Liu Shipei, Peng Yanbing . An Essay Context Recognition Model Based on Syntax Decision Tree and SVM Algorithm[J]. Computer and Modernization, 2017(3):13-17.)
|
[2] |
李静梅, 孙丽华, 张巧荣 , 等. 一种文本处理中的朴素贝叶斯分类器[J]. 哈尔滨工程大学学报, 2003,24(1):71-74.
|
[2] |
( Li Jingmei, Sun Lihua, Zhang Qiaorong , et al. Application of Navie Bayes Classifier to Text Classification[J]. Journal of Harbin Engineering University, 2003,24(1):71-74.)
|
[3] |
范云杰, 刘怀亮 . 基于维基百科的中文短文本分类研究[J]. 现代图书情报技术, 2012(3):47-52.
|
[3] |
( Fan Yunjie, Liu Huailiang . Research on Chinese Short Text Classification Based on Wikipedia[J]. New Technology of Library and Information Service, 2012(3):47-52.)
|
[4] |
李湘东, 阮涛, 刘康 . 基于维基百科的多种类型文献自动分类研究[J]. 数据分析与知识发现, 2017,1(10):43-52.
|
[4] |
( Li Xiangdong, Ruan Tao, Liu Kang . Research on Automatic Classification of Various Documents Based on Wikipedia[J]. Data Analysis and Knowledge Discovery, 2017,1(10):43-52.)
|
[5] |
丁连红, 孙斌, 张宏伟 . 基于知识图谱扩展的短文本分类方法[J]. 情报工程, 2018,4(5):38-46.
|
[5] |
( Ding Lianhong, Sun Bin, Zhang Hongwei . Short Text Classification Based on Knowledge Graph Extension[J]. Technology Intelligence Engineering, 2018,4(5):38-46.)
|
[6] |
Fan X, Hu H. A New Model for Chinese Short-text Classification Considering Feature Extension [C]// Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence. 2010,2:7-11.
|
[7] |
袁满, 欧阳元新, 熊璋 , 等. 一种基于频繁词集的短文本特征扩展方法[J]. 东南大学学报: 自然科学版, 2014,44(2):256-260.
|
[7] |
( Yuan Man, Ouyang Yuanxin, Xiong Zhang , et al. Short Text Feature Extension Method Based on Frequent Term Sets[J]. Journal of Southeast University: Natural Science Edition, 2014,44(2):256-260.)
|
[8] |
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
|
[9] |
吕超镇, 姬东鸿, 吴飞飞 . 基于LDA特征扩展的短文本分类[J]. 计算机工程与应用, 2015,51(4):123-127.
|
[9] |
( Lv Chaozhen, Ji Donghong, Wu Feifei . Short Text Classification Based on Expanding Feature of LDA[J]. Computer Engineering and Applications, 2015,51(4):123-127.)
|
[10] |
胡勇军, 江嘉欣, 常会友 . 基于LDA高频词扩展的中文短文本分类[J]. 现代图书情报技术, 2013(6):42-48.
|
[10] |
( Hu Yongjun, Jiang Jiaxin, Chang Huiyou . A New Method of Keywords Extraction for Chinese Short Text Classification[J]. New Technology of Library and Information Service, 2013(6):42-48.)
|
[11] |
张群, 王红军, 王伦文 . 词向量与LDA相融合的短文本分类方法[J]. 现代图书情报技术, 2016(12):27-35.
|
[11] |
( Zhang Qun, Wang Hongjun, Wang Lunwen . Classifying Short Texts with Word Embedding and LDA Model[J]. New Technology of Library and Information Service, 2016(12):27-35.)
|
[12] |
雷朔, 刘旭敏, 徐维祥 . 基于词向量特征扩展的中文短文本分类研究[J]. 计算机应用与软件, 2018,35(8):269-274.
|
[12] |
( Lei Shuo, Liu Xumin, Xu Weixiang . Chinese Short Text Classification Based on Word Vector Extension[J]. Computer Applications and Software, 2018,35(8):269-274.)
|
[13] |
覃世安, 李法运 . 文本分类中TF-IDF方法的改进研究[J]. 现代图书情报技术, 2013(10):27-30.
|
[13] |
( Qin Shian, Li Fayun . Improved TF-IDF Method in Text Classification[J]. New Technology of Library and Information Service, 2013(10):27-30.)
|
[14] |
Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408. 5882.
|
[15] |
GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation (LDA) Using Gibbs Sampling for Parameter Estimation and Inference[EB/OL]. [2016-05-15].https://sourceforge.net/projects/jgibblda/.
|
[16] |
黄贤英, 熊李媛, 刘英涛 , 等. 基于类别特征改进的KNN短文本分类算法[J]. 计算机工程与科学, 2018,40(1):148-154.
|
[16] |
( Huang Xianying, Xiong Liyuan, Liu Yingtao , et al. An Improved KNN Short Text Classification Algorithm Based on Category Feature Words[J]. Computer Engineering & Science, 2018,40(1):148-154.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|