New Technology of Library and Information Service  2014, Vol. 30 Issue (5): 18-25    DOI: 10.11925/infotech.1003-3513.2014.05.03
Research and Implementation of Bibliographic Information Classification System in LDA Model
Li Xiangdong1, Liao Xiangpeng1, Huang Li2
1 School of Information Management, Wuhan University, Wuhan 430072, China;
2 Wuhan University Library, Wuhan 430072, China
[Objective] To improve the classification effect of bibliographic information of books and journal articles etc. [Context] The classification performance under the traditional vector space model is not satisfied, and LDA model can effectively improve the classification effect by mining the implied semantic information. [Methods] Using LDA model to represent each text with implied topics, the optimal number of topics is determined on the classification result.Then the SVM classification algorithm is used. [Results] Experiments show that the Macro_F1 in Fudan and Sogou corpus reach 95.5% and 93.5% respectively; the Macro_F1 on the real data from catalogue and electronic journal database reach 77.4% and 87.6% respectively. [Conclusions] The classification performance on real data is increased by 10% and 3% respectively compared to the VSM, that reaches the practical level.

Key wordsLatent Dirichlet Allocation      Text categorization      Vector Space Model      Gibbs sampling      Support Vector Machine     
Received: 02 January 2014      Published: 06 June 2014
:  TP181  

Cite this article:

Li Xiangdong, Liao Xiangpeng, Huang Li. Research and Implementation of Bibliographic Information Classification System in LDA Model. New Technology of Library and Information Service, 2014, 30(5): 18-25.

