New Technology of Library and Information Service  2013, Vol. Issue (6): 42-48    DOI: 10.11925/infotech.1003-3513.2013.06.07
A New Method of Keywords Extraction for Chinese Short-text Classification
Hu Yongjun1, Jiang Jiaxin2, Chang Huiyou3
1. Business School, Sun Yat-Sen University, Guangzhou 510275, China;
2. School of Information Science and Technology, Sun Yat-Sen University, Guangzhou 510006, China;
3. School of Software, Sun Yat-Sen University, Guangzhou 510006, China
Abstract  Short texts are different from traditional documents in their shortness and sparseness. Feature extension can ease the problem of high sparse in the vector space model, but feature extension inevitably introduces noise. To resolve the problem, this paper proposes a high-frequency words expansion method based on LDA. By extracting high-frequency words from each category as the feature space, using LDA to derive latent topics from the corpus, it extends the topic words into the short-text. Extensive experiments conducted on Chinese short messages and news titles show that the new method proposed for Chinese short-text classification can obtain a higher classification performance comparing with the conventional classification methods.
Key wordsShort-text classification      High frequency words      LDA      Feature expansion     
Received: 05 April 2013      Published: 24 July 2013
:  TP391  

Cite this article:

Hu Yongjun, Jiang Jiaxin, Chang Huiyou. A New Method of Keywords Extraction for Chinese Short-text Classification. New Technology of Library and Information Service, 2013, (6): 42-48.

