Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (4): 18-25    DOI: 10.11925/infotech.1003-3513.2015.04.03
Current Issue | Archive | Adv Search |
Feature Weighting Method Affected by Part of Speech in Text Classification
Lu Yonghe, Wang Hongbin
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] In order to get a higher precision, this paper is to improve the feature weighting method by introducing the effect of part of speech.[Methods]The effectiveness between introducing the part of speech into feature weighting and the classical TF-IDF is contrasted in text classification. In the approach of text classification introducing part of speech, the weights of part of speech is used forthe feature weighting calculation, and using Particle Swarm Optimization to find the best weights of the part of speech. The parallel tests all use SVM classifier.[Results] The experiment results show that the improved feature weighting method performs better than the classical TF-IDF method, and the precision of text classification achieves obvious improvement in different dimensions of feature space, and the increments are between 2% and 6%.[Limitations] Because of the lack of experimental conditions, the weights ensured in the experiment is only a result close to the best weights, it is needed to expand the scale of data and increase the number of iterations so as to get better weights.[Conclusions] Introducing part of speech into text classification can get a higher precision. The influence degree of part of speech is nouns, verbs and string in decreasing order. The modified feature weighting method is not only applicable to a particular corpus, but also the general one.

Key wordsText classification      Part of speech      Feature weighting      Particle swarm optimization     
Received: 29 September 2014      Published: 21 May 2015
:  TP391  

Cite this article:

Lu Yonghe, Wang Hongbin. Feature Weighting Method Affected by Part of Speech in Text Classification. New Technology of Library and Information Service, 2015, 31(4): 18-25.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.04.03     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I4/18

[1] Uysal A K,GunalS. The Impact ofPreprocessing on Text Classification [J]. Information Processing & Management, 2014, 50(1):104-112.
[2] Cooper W S.Getting Beyond Boole[J].Information Processing & Management, 1988,24(3):243-248.
[3] Fuhr N, Buekley C.A Probabilistic Learning Approach for Document Indexing[J].ACM Transactions on Information Systems,1991,9(3):223-248.
[4] Salton G, Lesk M E. Computer Evaluation of Indexing and Text Processing[J]. Journal of the ACM,1968,15(1):8-36.
[5] 鲁松,李晓黎,白硕.文档中词语权重计算方法的改进[J].中文信息学报,2000,14(6):8-14.(Lu Song,Li Xiaoli,Bai Shuo. An Improved Approach to Weighting Terms in Text[J].Journal of Chinese Information Processing,2000,14(6):8-14.)
[6] 熊忠阳,黎刚,陈小莉,等.文本分类中词语权重计算方法的改进与应用[J].计算机工程与应用,2008,44(5):187-189.(Xiong Zhongyang,Li Gang,Chen Xiaoli,et al. Improvement and Application to Weighting Terms Based on Text Classification[J]. Computer Engineering and Applications,2008,44(5):187-189.)
[7] Salton G,Buckley B.Term-weighting Approaches in Automatic TextRetrieval[J].Information Processing & Management,1998,24(5):513-523.
[8] Peng T, Liu L, Zou W. PU Text Classification Enhanced by Term Frequency-inverse Document Frequency-improved Weighting [J]. Concurrency and Computation: Practice and Experience, 2014, 26(3): 728-741.
[9] Kennedy J, Eberhart R. Particle Swarm Optimization[C]. In: Proceedings of IEEE International Conferenceon Neural Networks. IEEE, 1995: 1942-1948.
[10] 雷秀娟.群智能优化算法及其应用[M]. 北京:科学出版社,2012:87-109.(Lei Xiujuan.Swarm Intelligence Optimization Algorithms and Their Applications[M].Beijing:Science Press, 2012:87-109.)
[11] 李彦平,张佳骥.文本聚类中的降维技术研究[J].无线电工程,2005,35(6): 51-53.(Li Yanping,Zhang Jiaji.Feature Reduction for Document Clustering[J]. Radio Engineering of China,2005,35(6):51-53.)
[12] 胡燕,吴虎子,钟珞.中文文本分类中基于词性的特征提取方法研究[J].武汉理工大学学报,2007,29(4): 132-135.(Hu Yan,Wu Huzi,Zhong Luo. ResearchofFeatureExtraction Methods Based on Part of Speech in Chinese Documents Classification[J].Journal of Wuhan University of Technology,2007,29(4):132-135.)
[13] 李英.基于词性选择的文本预处理方法研究[J].情报科学,2009,27(5):717-719, 738.(Li Ying.Researchon the Text Pretreatment Based on Part of Speech Selection[J]. Information Science,2009,27(5):717-719,738.)
[14] 郑伟,吕建新,张建伟.文本分类中特征预抽取方法研究[J].情报科学,2011,29(1):86-88, 92.(Zheng Wei,Lv Jianxin,Zhang Jianwei. Research on Feature Preextraction Method in Text Classification[J]. Information Science,2011,29(1):86-88,92.)
[15] NLPIR汉语分词系统[EB/OL].[2013-12-23].http://ictclas.nlpir.org.(NLPIR Chinese Word Segmentation System[EB/OL].[2013-12-23].http://ictclas.nlpir.org.)
[16] 李荣陆.文本分类及其相关技术研究[D].上海:复旦大学,2005.(Li Ronglu.Research on Text Classification and Its Related Technologies[D].Shanghai:Fudan University, 2005.)

[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[4] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[5] Wang Sidi,Hu Guangwei,Yang Siyu,Shi Yun. Automatic Transferring Government Website E-Mails Based on Text Classification[J]. 数据分析与知识发现, 2020, 4(6): 51-59.
[6] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[7] Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[8] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[9] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[10] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[11] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[12] Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[13] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[14] Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
[15] Yuman Li,Zhibo Chen,Fu Xu. Classifying Texts with KACC Model[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn