Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (10): 63-69     https://doi.org/10.11925/infotech.1003-3513.2014.10.10
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
主题模型在主题演化方法中的应用研究进展
赵迎光, 洪娜, 安新颖
中国医学科学院医学信息研究所 北京 100020
A Survey of the Approach of Topic Evolution Model Based on Topic Model
Zhao Yingguang, Hong Na, An Xinying
Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China
全文: PDF (436 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 对基于主题模型的演化方法进行梳理与分析, 总结各方法优缺点及在情报分析领域的适用性。[文献范围] 从Google Scholar、Web of Science中以"Topic/Theme Evolution"、"Time Topic Model"、"Dynamic Topic Model"为关键词/主题词进行文献检索, 结合引文查询, 经阅读后筛选出25篇作为本文的参考文献。[方法] 采用文献分析法, 对比各模型实现机制与功能特征, 总结不同种类模型的优缺点及适用领域。[结果] 目前的主题演化模型主要在可变主题数、支持在线分析、连续时间窗三个维度进行实现, 大多数系统具备1-2个功能, 基本可以满足情报分析的应用需求。[局限] 对一些模型的具体实现分析不够深入。[结论] 不同来源、不同粒度、不同时间窗的演化分析应该针对具体应用需求, 结合模型特点使用相应的主题模型演化方法。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
洪娜
安新颖
赵迎光
关键词 主题模型LDA主题演化    
Abstract

[Objective] Organize and analyze the approachs of topic evolution model based on topic model, summary the advantages and disadvantages of all models, then introduce this methods into the fields of information analysis. [Coverage] The literatures are obtained from "Google Scholar" and "Web of Science" by the keywords/topics of "Topic/Theme Evolution"、"Time Topic Model" and "Dynamic Topic Model" together with citation searching, and 25 literatures are used as references at last. [Methods] Explore the implementation mechanism, functional characteristics, advantages and disadvantages and the fields of application by literature analysis. [Results] The current models focus on researching the variable topic number, online processing and continuous time span, many models have one or two functions and could meet most of the applications. [Limitations] Some specific implementations of the models are lack of depth analysis. [Conclusions] The task about evolution analysis of various text source, granularity and time spans should take account of the concrete requirement, so as to apply the appropriate model according to its features.

Key wordsTopic model    LDA    Topic evolution
收稿日期: 2014-05-05      出版日期: 2014-11-28
:  TP391  
基金资助:

本文系"十二五"国家科技支撑计划课题"基于STKOS的科技监测应用示范"(项目编号: 2011BAH10B06-02)、国家自然科学基金项目"基于语义的医学领域前沿知识发现及演化机制研究"(项目编号:71303259)和教育部人文社会科学研究一般项目"基于决策树的热点识别与趋势预测方法研究"(项目编号:11YJC870008)的研究成果之一。

通讯作者: 安新颖 E-mail: an.xinying@imicams.ac.cn     E-mail: an.xinying@imicams.ac.cn
作者简介: 作者贡献声明: 洪娜: 提出研究思路, 设计研究方案; 赵迎光: 负责文献调研与整理, 论文起草; 安新颖: 最终版本修订。
引用本文:   
赵迎光, 洪娜, 安新颖. 主题模型在主题演化方法中的应用研究进展[J]. 现代图书情报技术, 2014, 30(10): 63-69.
Zhao Yingguang, Hong Na, An Xinying. A Survey of the Approach of Topic Evolution Model Based on Topic Model. New Technology of Library and Information Service, 2014, 30(10): 63-69.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.10.10      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I10/63

[1] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis [J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
[2] Hofmann T. Probabilistic Latent Semantic Indexing [C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1999: 50-57.
[3] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3(4-5): 993-1022.
[4] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The Author- Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2004.
[5] Blei D M, Lafferty J D. Correlated Topic Models [C]. In: Proceedings of the 23rd International Conference on Machine Learning. 2006.
[6] 单斌, 李芳. 基于 LDA 话题演化研究方法综述[J]. 中文信息学报, 2010, 24(6): 43-49. (Shan Bin, Li Fang. A Survey of Topic Evolution Based on LDA [J]. Journal of Chinese Information Processing, 2010, 24(6): 43-49.)
[7] Elshamy W S. Continuous-time Infinite Dynamic Topic Models [D]. Manhattan, Kansas: Kansas State University, 2013.
[8] Daud A, Li J, Zhou L, et al. Knowledge Discovery Through Directed Probabilistic Topic Models: A Survey [J]. Frontiers of Computer Science in China, 2010, 4(2): 280-301.
[9] Wang X, McCallum A. Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends [C]. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006: 424-433.
[10] Ding W, Chen C. Dynamic Topic Detection and Tracking: A Comparison of HDP, C-word, and Cocitation Methods [J]. Journal of the Association for Information Science and Technology, 2014. DOI: 10.1002/asi.23134.
[11] Griffiths T L, Steyvers M. Finding Scientific Topics [J]. Proceedings of the National Academiy of Sciences of the United States of America, 2004, 101(S1): 5228-5235.
[12] Blei D M, Lafferty J D. Dynamic Topic Models [C]. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 113-120.
[13] 楚克明, 李芳. 基于 LDA 模型的新闻话题的演化[J]. 计算机应用与软件, 2011, 28(4): 4-7. (Chu Keming, Li Fang. LDA Model-Based News Topic Evolution [J]. Computer Applications and Software, 2011, 28(4): 4-7.)
[14] 胡吉明, 陈果. 基于动态LDA主题模型的内容主题挖掘与演化[J]. 图书情报工作, 2014, 58(2): 138-142. (Hu Jiming, Chen Guo. Mining and Evolution of Content Topics Based on Dynamic LDA [J]. Library and Information Service, 2014, 58(2): 138-142.)
[15] Ahmed A, Xing E P. Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: With Applications to Evolutionary Clustering [C]. In: Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, USA. 2008: 219-230.
[16] Ahmed A, Xing E P. Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream [C]. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2010.
[17] Teh Y W, Jordan M I, Beal M J, et al. Hierarchical Dirichlet Processes [J]. Journal of the American Statistical Association, 2004, 101(476): 1566-1581.
[18] Cui W, Liu S, Tan L, et al. Textflow: Towards Better Understanding of Evolving Topics in Text [J]. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(12): 2412-2421.
[19] Xu T, Zhang Z, Yu P S, et al. Dirichlet Process Based Evolutionary Clustering [C]. In: Proceedings of the 8th International Conference on Data Mining. 2008: 648-657.
[20] Wang C, Blei D, Heckerman D. Continuous Time Dynamic Topic Models [OL]. arXiv: 1206.3298.
[21] Wei X, Sun J, Wang X. Dynamic Mixture Models for Multiple Time-Series [C]. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India. 2007: 2909-2914.
[22] AlSumait L, Barbará D, Domeniconi C. On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking [C]. In: Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
[23] 胡艳丽, 白亮, 张维明. 一种话题演化建模与分析方法[J]. 自动化学报, 2012, 38(10): 1690-1697.(Hu Yanli, Bai Liang, Zhang Weiming. Modeling and Analyzing Topic Evolution [J]. Acta Automatica Sinica, 2012, 38(10): 1690-1697)
[24] Iwata T, Yamada T, Sakurai Y, et al. Online Multiscale Dynamic Topic Models [C]. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2010: 663-672.
[25] Wang C, Paisley J W, Blei D M. Online Variational Inference for the Hierarchical Dirichlet Process [C]. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011: 752-760.

[1] 李跃艳,王昊,邓三鸿,王伟. 近十年信息检索领域的研究热点与演化趋势研究——基于SIGIR会议论文的分析[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[2] 伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[3] 沈思,李沁宇,叶媛,孙豪,叶文豪. 基于TWE模型的医学科技报告主题挖掘及演化分析研究*[J]. 数据分析与知识发现, 2021, 5(3): 35-44.
[4] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[5] 赵天资, 段亮, 岳昆, 乔少杰, 马子娟. 基于Biterm主题模型的新闻线索生成方法 *[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[6] 王伟, 高宁, 徐玉婷, 王洪伟. 基于LDA的众筹项目在线评论主题动态演化分析*[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[7] 陈浩, 张梦毅, 程秀峰. 融合主题模型与决策树的跨地区专利合作关系发现与推荐*——以广东省和武汉市高校专利库为例[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[8] 岳丽欣,刘自强,胡正银. 面向趋势预测的热点主题演化分析方法研究*[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[9] 蔡永明,刘璐,王科唯. 网络虚拟学习社区重要用户与核心主题联合分析*[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[10] 余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[11] 叶光辉,曾杰妍,胡婧岚,毕崇武. 城市画像视角下的社会公众情感演化研究*[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[12] 潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[13] 刘玉文,王凯. 面向地域的网络话题识别方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[14] 叶光辉,徐彤,毕崇武,李心悦. 基于多维度特征与LDA模型的城市旅游画像演化分析*[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[15] 黄微,赵江元,闫璐. 网络热点事件话题漂移指数构建与实证研究*[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn