New Technology of Library and Information Service  2016, Vol. 32 Issue (2): 59-66    DOI: 10.11925/infotech.1003-3513.2016.02.08
A New Automatic Categorization Method with Documents Based on HowNet
Li Xiangdong1,2(),Liu Kang1,Ding Cong1,Gao Fan1
1School of Information Management, Wuhan University, Wuhan 430072, China
2Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
[Objective] This paper aims to solve the feature mismatch problem caused by different document types and improve the performance of automatic classification technology. [Methods] We proposed a new method to extend the semantic features using documents of various types as the corpus, which were introduced the third-party resource HowNet and were different with the other un-categorized ones. [Results] Compared with the non-feature-extension classification method, the proposed method increased the F-measure by 1.2% to 11.0% in our classification experiment. Four document types, used in our study included webpages, books, non-academic periodicals and academic journals. [Limitations] Not every type of document was tested with the publicly accessible corpus, thus, more tests were needed to examine the generalization and objectiveness of the new method. [Conclusions] Our study showed that the proposed method was feasible. It could effectively eliminate the semantic differences among various types of collections and improve the performance of automatic text classification through corpus construction and feature extension.

Key wordsThird-party resource      HowNet      Feature extension      Semantic difference     
Received: 12 August 2015      Published: 08 March 2016

Li Xiangdong,Liu Kang,Ding Cong,Gao Fan. A New Automatic Categorization Method with Documents Based on HowNet. New Technology of Library and Information Service, 2016, 32(2): 59-66.

