Please wait a minute...
New Technology of Library and Information Service  2012, Vol. Issue (12): 58-65    DOI: 10.11925/infotech.1003-3513.2012.12.11
Current Issue | Archive | Adv Search |
Review on the LDA-based Techniques Detection for the Field Emerging Topic
Fan Yunman1,2, Ma Jianxia1
1. The Lanzhou Branch of National Science Library, Chinese Academy of Sciences, Lanzhou 730000, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  Based on LDA,this paper reviews the development of the LDA model and several models which improve the LDA for the filed emerging topic detection.It describes two parameter inference algorithms of variational derivation and Gibbs sampling, and reviews the improvement of LDA in recent years,including the one modeling the evolution of the topics,the one modeling jointly with the content of document and meta data,the one with online learning, the topic evolution method combining LDA and citation analysis and so on;then compares and analyses different kinds of improvement models in details. The paper also reviews several main visualization techniques such as NIH-VB,TIARA and VxInsight. Finally,it discusses the key research problems of detecting the emerging topic by using LDA.
Key wordsTopic model      LDA      Citation analysis      Topical visualization     
Received: 15 October 2012      Published: 12 March 2013
:  TP393  

Cite this article:

Fan Yunman, Ma Jianxia. Review on the LDA-based Techniques Detection for the Field Emerging Topic. New Technology of Library and Information Service, 2012, (12): 58-65.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.12.11     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V/I12/58

[1] Blei D M. Probabilistic Topic Models[J]. Communications of the ACM, 2012, 55(4): 77-84.
[2] Nigam K, Mccallum A K, Thrun S, et al. Text Classification from Labeled and Unlabeled Documents Using EM[J]. Machine Learning, 2000, 39(2-3): 103-134.
[3] Hofmann T. Probabilistic Latent Semantic Indexing[C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99). New York: ACM, 1999: 50-57.
[4] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[5] Jordan M I, Ghahramani Z, Jaakkola T S, et al. An Introduction to Variational Methods for Graphical Models[J]. Machine learning, 1999, 37(2): 183-233.
[6] Teh Y W, Newman D, Welling M. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation[C]. In: Proceedings of Neural Information Processing Systems. 2006: 1353-1360.
[7] Griffiths T. Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation[OL]. [2012-06-09].http://people.cs.umass.edu/~wallach/courses/s11/cmpsci791ss/readings/griffiths02gibbs.pdf.
[8] Heinrich G. Parameter Estimation for Text Analysis[EB/OL]. [2012-06-09]. http://www. arbylon. net/publications/text-est. pdf.
[9] Wainwright M J, Jordan M I. Graphical Models, Exponential Families, and Variational Inference[J]. Foundations and Trends in Machine Learning, 2008,1 (1-2): 1-305.
[10] Ghahramani Z, Beal M J. Graphical Models and Variational Methods[A]. //Advanced Mean Field Methods:Theory and Practice[M]. Cambridge: MIT Press, 2001: 167-177.
[11] Blei D M, Lafferty J D. A Correlated Topic Model of Science[J]. Annals of Applied Statistics, 2007, 1(1):17-35.
[12] Aldous D J. Exchangeability and Related Topics[M].Berlin, Heidelberg: Springer, 1985: 1-198.
[13] Li W, Mccallum A. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations[C]. In: Proceedings of the 23rd International Conference on Machine Learning (ICML’06). New York: ACM, 2006: 577-584.
[14] Wang C, Blei D M. A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process[J/OL]. Computing Research Repository. [2012-09-24]. http://arxiv.org/abs/1201.1657.
[15] 曹娟,张勇东,李锦涛,等. 一种基于密度的自适应最优LDA模型选择方法[J]. 计算机学报, 2008, 31(10): 1780-1787. (Cao Juan, Zhang Yongdong, Li Jintao, et al. A Method of Adaptively Selecting Best LDA Model Based on Density[J]. Chinese Journal of Computers, 2008, 31(10): 1780-1787.)
[16] Blei D M, Lafferty J D. Dynamic Topic Models[C]. In: Proceedings of the 23rd International Conference on Machine Learning (ICML’06). New York: ACM, 2006: 113-120.
[17] Wang C, Blei D M, Heckerman D. Continuous Time Dynamic Topic Models[C]. In: Proceedings of Uncertainty in Artificial Intelligence. 2008: 579-586.
[18] Wang X R, McCallum A. Topics Over Time: A Non-Markov Continuous-time Model of Topical Trends[C]. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). New York: ACM, 2006: 424-433.
[19] Wallach H M. Topic Modeling: Beyond Bag-of-words[C]. In: Proceedings of the 23rd International Conference on Machine Learning (ICML’06). New York: ACM, 2006: 977-984.
[20] Wang X R, McCallum A, Wei X. Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval[C]. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). Washington, DC: IEEE Computer Society, 2007: 697-702.
[21] Wang X R, McCallum A. A Note onTopical N-grams[R]. 2005.
[22] Mann G S, Mimno D, McCallum A. Bibliometric Impact Measures Leveraging Topic Analysis[C]. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’06). New York: ACM, 2006: 65-74.
[23] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The Author-topic Model for Authors and Documents[C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI’04). Arlington: AUAI Press, 2004: 487-494.
[24] 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6): 583-590. (Wang Ping. Literature Knowledge Mining Based on Probabilistic Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(6): 583-590.)
[25] Mimno D, McCallum A. Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression[C]. In: Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence (UAI’08). 2008: 411-418.
[26] Nallapati R M, Ahmed A, Xing E P, et al. Joint Latent Topic Models for Text and Citations[C]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). New York: ACM, 2008: 542-550.
[27] Tu Y N, Seng J L. Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
[28] Goodrum A A, McCain K W, Lawrence S, et al. Scholarly Publishing in the Internet Age: A Citation Analysis of Computer Science Literature[J]. Information Processing & Management, 2001, 37(5): 661-675.
[29] Web of Knowledge [DB/OL]. [2012-08-14]. http://apps.webofknowledge.com.
[30] 中华人民共和国国家知识产权局.专利检索[EB/OL]. [2012-08-14]. http://www.sipo.gov.cn/zljs/. (State Intellectual Property Office of PRC. Patent Retrieval[EB/OL]. [2012-08-14]. http://www.sipo.gov.cn/zljs/.)
[31] Dietz L, Bickel S, Scheffer T. Unsupervised Prediction of Citation Influences[C]. In: Proceedings of the 24th International Conference on Machine Learning (ICML’07). New York: ACM, 2007: 233-240.
[32] He Q, Chen B, Pei J, et al. Detecting Topic Evolution in Scientific Literature: How Can Citations Help[C]. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). New York: ACM, 2009: 957-966.
[33] 贺亮, 李芳. 基于话题模型的科技文献话题发现和趋势分析[J]. 中文信息学报, 2012, 26(2): 109-115.(He Liang, Li Fang. Topic Discovery and Trend Analysis in Scientific Literature on Topic Model[J]. Journal of Chinese Information Processing, 2012, 26(2): 109-115.)
[34] Alsumait L, Barbará D, Domeniconi C. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008: 3-12.
[35] Hoffman M D, Blei D M, Bach F. Online Learning for Latent Dirichlet Allocation[A]. //Lafferty J,Williams C K I,Shawe-Taylor J,et al. Advances in Neural Information Processing Systems[M].2010: 856-864.
[36] Banerjee A, Basu S. Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning[C]. In: Proceedings of SDM-SIAM International Conference on Data Mining. 2007.
[37] Herr B W, Talley E M, Burns G, et al. The NIH Visual Browser: An Interactive Visualization of Biomedical Research[C]. In: Proceedings of the 13th International Conference Information Visualization (IV’09). Washington D C: IEEE Computer Society, 2009: 505-509.
[38] Talley E M, Newman D, Mimno D, et al. Database of NIH Grants Using Machine-learned Categories and Graphical Clustering[J]. Nature Methods, 2011, 8(6): 443-444.
[39] Wei F R, Liu S X, Song Y Q, et al. TIARA: A Visual Exploratory Text Analytic System[C]. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10), Washington DC, USA. New York: ACM, 2010: 153-162.
[40] Boyack K W, Wylie B N, Davidson G S. Domain Visualization Using VxInsight? For Science and Technology Management[J]. Journal of the American Society for Information Science and Technology, 2002, 53 (9): 764-774.
[1] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[2] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[3] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[4] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[5] Zhao Tianzi, Duan Liang, Yue Kun, Qiao Shaojie, Ma Zijuan. Generating News Clues with Biterm Topic Model[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[6] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[7] Chen Hao, Zhang Mengyi, Cheng Xiufeng. Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[8] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[9] Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[10] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[11] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[12] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[13] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[14] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[15] Ye Guanghui,Xu Tong,Bi Chongwu,Li Xinyue. Analyzing Evolution of City Tourism Portraits with Multi-Dimensional Features and LDA Model[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn