|
|
Automatic Classification of Ancient Classics with Entity Features |
Heran Qin1,Liu Liu1,2,Bin Li3,Dongbo Wang1,2() |
1 College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China 2 Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University, Nanjing 210095, China 3 College of Literature, Nanjing Normal University, Nanjing 210097, China |
|
|
Abstract [Objective] This paper modifies the algorithm of traditional statistical feature words with entity features, aiming to classify ten classics from ancient China. [Methods] For the support vector machine model, we added the traditional TF-IDF, information gain, chi-square test and mutual information to calculate the feature words. Then, we used the named entity to evaluate the classification results. [Results] The highest accuracy of the proposed classifier reached 98.7%. The accuracy was improved by 12.4%, 12.4%, 12.3% and 22.8% respectively with traditional information gain, TF-IDF, mutual information and chi-square test feature calculations. [Limitations] We need to re-label the recognition entities before applying entity features to other texts. [Conclusions] Entity features could improve the effectiveness of text categorization models.
|
Received: 30 January 2019
Published: 23 October 2019
|
|
[1] |
Stein R A, Jaques P A, Valiati J F . An Analysis of Hierarchical Text Classification Using Word Embeddings[J]. Information Sciences, 2019,471:216-232.
|
[2] |
Dashtipour K, Gogate M, Adeel A, et al. A Comparative Study of Persian Sentiment Analysis Based on Different Feature Combinations [C]// Proceedings of the 2017 International Conference on Communications, Signal Processing, and Systems. 2017: 2288-2294.
|
[3] |
D’Andrea E, Ducange P, Bechini A , et al. Monitoring the Public Opinion About the Vaccination Topic from Tweets Analysis[J]. Expert Systems with Applications, 2019,116:209-226.
|
[4] |
Dong L, Ji S, Zhang C , et al. An Unsupervised Topic-Sentiment Joint Probabilistic Model for Detecting Deceptive Reviews[J]. Expert Systems with Applications, 2018,114:210-223.
|
[5] |
Tocoglu M A, Alpkocak A . TREMO: A Dataset for Emotion Analysis in Turkish[J]. Journal of Information Science, 2018,44(6):848-860.
|
[6] |
Manek A S, Shenoy P D, Mohan M C , et al. Aspect Term Extraction for Sentiment Analysis in Large Movie Reviews Using Gini Index Feature Selection Method and SVM Classifier[J]. World Wide Web, 2017,20(2):135-154.
|
[7] |
Liu Y, Bi J W, Fan Z P . A Method for Multi-Class Sentiment Classification Based on an Improved One-vs-One (OVO) Strategy and the Support Vector Machine (SVM) Algorithm[J]. Information Sciences, 2017, 394-395:38-52.
|
[8] |
段江丽 . 基于SVM的文本分类系统中特征选择与权重计算算法的研究[D]. 太原: 太原理工大学, 2011.
|
[8] |
( Duan Jiangli . Research on Feature Selection and Weighting Algorithm in Text Classification System Based on SVM[D]. Taiyuan: Taiyuan University of Technology, 2011.)
|
[9] |
都云琪 . 中文文本自动分类的研究与实现[D]. 西安: 西安电子科技大学, 2012.
|
[9] |
( Du Yunqi . The Research and Implementation of Chinese Text Classification[D]. Xi’an: Xi’an University of Electronic Science and Technology, 2012.)
|
[10] |
李玉雄 . 非凸在线支持向量机的研究与应用[D]. 北京: 北京工业大学, 2013.
|
[10] |
( Li Yuxiong . Research and Application of Non-convex Online Support Vector Machines[D]. Beijing: Beijing University of Technology, 2013.)
|
[11] |
王昊, 叶鹏, 邓三鸿 . 机器学习在中文期刊论文自动分类研究中的应用[J]. 现代图书情报技术, 2014(3):80-87.
|
[11] |
( Wang Hao, Ye Peng, Deng Sanhong . The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3):80-87.)
|
[12] |
董帅 . 基于半监督学习的文本分类算法研究[D]. 哈尔滨: 哈尔滨工程大学, 2014.
|
[12] |
( Dong Shuai . Research on the Text Classification Based on the Semi-supervised Learning[D]. Harbin: Harbin Engineering University, 2014.)
|
[13] |
王宗尧, 刘金岭 . 基于支持向量机的PU中文文本分类器构建[J]. 南京邮电大学学报: 自然科学版, 2015,35(6):100-105.
|
[13] |
( Wang Zongyao, Liu Jinling . PU Chinese Text Classifier Based on Support Vector Machine Construction[J]. Journal of Nanjing University of Posts and Telecommunications: Natural Science Edition, 2015,35(6):100-105.)
|
[14] |
郭东峰, 王东起 . 机器学习中文本分类处理研究[J]. 内江科技, 2016(9):115-116.
|
[14] |
( Guo Dongfeng, Wang Dongqi . Research on Text Classification and Processing in Machine Learning[J]. Neijiang Science and Technology, 2016(9):115-116.)
|
[15] |
谭建平 . 基于半监督的SVM迁移学习文本分类方法[D]. 广州: 广东工业大学, 2016.
|
[15] |
( Tan Jianping . Semi-supervised SVM-Based Transfer Learning for Text Classification[D]. Guangzhou: Guangdong University of Technology, 2016.)
|
[16] |
陶林润德 . 机器学习方法在文本分类中的应用[J]. 中国战略新兴产业, 2017(40):134-135.
|
[16] |
( Tao Linrunde . Application of Machine Learning Method in Text Classification[J]. China Strategic Emerging Industries, 2017(40):134-135.)
|
[17] |
薛峰, 胡越, 夏帅 , 等. 基于论文标题和摘要的短文本分类研究[J]. 合肥工业大学学报: 自然科学版, 2018,41(10):1343-1349.
|
[17] |
( Xue Feng, Hu Yue, Xia Shuai , et al. Research on Short Text Classification Based on Paper Title and Abstract[J]. Journal of Hefei University of Technology: Natural Science, 2018,41(10):1343-1349.)
|
[18] |
施瑞朗 . 基于社交平台数据的文本分类算法研究[J]. 电子科技, 2018,31(10):69-70, 75.
|
[18] |
( Shi Ruilang . Text Categorization Algorithm Based on Social Platform Data[J]. Electronic Science and Technology, 2018,31(10):69-70, 75.)
|
[19] |
刘测, 韩家新 . 面向新闻文本的分类方法的比较研究[J]. 智能计算机与应用, 2018,8(5):38-41.
|
[19] |
( Liu Ce, Han Jiaxin . A Comparative Study of Classification Methods for News Texts[J]. Intelligent Computer and Applications, 2018,8(5):38-41.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|