New Technology of Library and Information Service  2016, Vol. 32 Issue (1): 48-54    DOI: 10.11925/infotech.1003-3513.2016.01.08
An Improved Topic Model Integrating Extra-Features
Ruyi Yang(),Dongsu Liu,Hui Li
School of Economics and Management, Xidian University, Xi’an 710126, China
[Objective] In order to reveal the relationships between contents, topics and authors of documents, this paper presents the Dynamic Author Topic (DAT) model which extends LDA model. [Context] Extracting features from large-scale texts is an important job for informatics researchers. [Methods] Firstly, collect the NIPS conference papers as data set and make preprocessing with them. Then divide data set into parts by published time, which forms a first-order Markov-chain. Then use perplexity to ensure the number of topics. At last, use Gibbs sampling to estimate the author-topic and topic-words distributions in each time slice. [Results] The results of experiments show that the document is represented as probability distributions of topics-words and authors-topics. On the dimension of time, the revolution of authors and topics can be observed. [Conclusions] DAT model can integrate contents and extra-features efficiently and accomplish text mining.

Key wordsLDA model      DAT model      Text mining      Gibbs sampling     
Received: 17 July 2015      Published: 04 February 2016

Ruyi Yang,Dongsu Liu,Hui Li. An Improved Topic Model Integrating Extra-Features. New Technology of Library and Information Service, 2016, 32(1): 48-54.

