New Technology of Library and Information Service  2014, Vol. 30 Issue (7): 101-106    DOI: 10.11925/infotech.1003-3513.2014.07.14
Classification of Multi Topic Extraction Based on Chinese Short Information Text Message Flow
Zhang Yongjun, Liu Jinling, Ma Jialin
Chinese Information Processing Laboratory, Huaiyin Institute of Technology, Huai'an 223003, China
[Objective] A topic classification extraction model named SM_ F_ HT is proposed to find multiple topics more effectively in Chinese SMS text message Flow (SM少).[Methods] In this model, SM_ F is divided into SMS text subsets TF-IDF combined with the hierarchical Dirichlet processes of information extraction are used to build multiple probability distributions of the SMS text vector set. Finally topic classification on SM_ F is extracted using Gibbs sampling in conjunction with the probability of the characteristic words which belong to local topic.[Results]Experimental results show that SM_F_HT is superior to CCLDA and CCMix models in perplexity and log like lihood ratio.[Limitations] In fields of SMS text pre processing and keyword extraction, this algorithm still needs further optimization.[Conclusions] The SM_ F_ HT scheme is effective for multiple topics classification extraction of SM_F.

Key wordsShort message text      Message flow      Topic extract      Dirichlet      Gibbs sample     
Received: 05 January 2014      Published: 20 October 2014
:  TP391.1  

