Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (11): 64-72    DOI: 10.11925/infotech.2096-3467.2018.0292
Current Issue | Archive | Adv Search |
Analyzing Scientific Literature with Content Similarity - Topics over Time Model
Weilin He(),Guohe Feng,Hongling Xie
School of Economics & Management, South China Normal University, Guangzhou 510006, China
Download: PDF(1106 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper studies the topics of scientific literature and then tracks their changes.[Methods] We used the improved CSToT Model (Content Similarity - Topics over Time), to analyze scholarly papers from 9 information science journals in China published from 2012-2016. [Results] The CSToT model effectively revealed the subject structure of scientific literature and the evolution of topics. We also found that majority of the current information science research covers information services, online public opinion and data mining. Their evolution trends include rising, falling, stable and fluctuating patterns, which are particularly prominent in information services research. [Limitations] The training data set needs to be expanded. [Conclusions] The CSToT model could effectively identify the topics of scientific literature and their evolutionary trends, which provide new directions for future research.

Key wordsTopics over Time Topic Model      Topic Extraction      Topic Evolution     
Received: 16 March 2018      Published: 11 December 2018

Cite this article:

Weilin He,Guohe Feng,Hongling Xie. Analyzing Scientific Literature with Content Similarity - Topics over Time Model. Data Analysis and Knowledge Discovery, 2018, 2(11): 64-72.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0292     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I11/64

[1] 赵蓉英, 魏明坤. 基于引文分析视角的知识管理主题研究——以图书情报领域为例[J]. 情报科学, 2017, 35(6): 3-8.
[1] (Zhao Rongying, Wei Mingkun.Research on the Subject of Knowledge Management Based on Citation Analysis: From the Perspective of Library and Information Science[J]. Information Science, 2017, 35(6): 3-8.)
[2] 方瑀绅. 科技教育研究主题发展趋势的引文分析: 1994-2013[J]. 中国图书馆学报, 2016, 42(1): 109-125.
[2] (Fang Yushen.Trends of Research Topics in the Technology Education: A Citation Analysis from 1994 to 2013[J]. Journal of Library Science in China, 2016, 42(1): 109-125.)
[3] 储节旺, 钱倩. 基于词频分析的近10年知识管理的研究热点及研究方法[J]. 情报科学, 2014, 32(10): 156-160.
[3] (Chu Jiewang, Qian Qian.Analysis of Research Focus and Research Methods in the Field of Knowledge Management During the Past Decade[J]. Information Science, 2014, 32(10): 156-160.)
[4] 郑彦宁, 许晓阳, 刘志辉. 基于关键词共现的研究前沿识别方法研究[J]. 图书情报工作, 2016, 60(4): 85-92.
[4] (Zheng Yanning, Xu Xiaoyang, Liu Zhihui.Study on the Method of Identifying Research Fronts Based on Keywords Co-occurrence[J]. Library and Information Service, 2016, 60(4): 85-92.)
[5] 唐果媛. 基于共词分析法的学科主题演化研究方法的构建[J]. 图书情报工作, 2017, 61(23): 100-107.
[5] (Tang Guoyuan.Building the Method System of the Subject Theme Evolution Based on the Co-word Analysis Method[J]. Library and Information Service, 2017, 61(23): 100-107.)
[6] Deerwester S.Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
[7] Hofmann T.Probabilistic Latent Semantic Analysis[C]// Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. 1999: 289-296.
[8] Blei D M, Ng A Y, Jordan M L, et al.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3(2): 993-1022.
[9] Blei D M, Lafferty J D.Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
[10] 齐亚双, 祝娜, 翟羽佳. 基于DTM的国内外情报学研究主题热度演化对比研究[J]. 图书情报工作, 2016, 60(16): 99-109.
[10] (Qi Yashuang, Zhu Na, Zhai Yujia.A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016, 60(16): 99-109.)
[11] Wang C, Blei D M, Heckerman D.Continuous Time Dynamic Topic Models[OL]. arXiv Preprint, arXiv: 1206.3298.
[12] 刘良选, 黄梦醒. 一种面向词汇突发的连续时间主题模型[J]. 计算机工程, 2016, 42(11): 195-201.
[12] (Liu Liangxuan, Huang Mengxing.A Continuous-time Topic Model for Word Burstiness[J]. Computer Engineering, 2016, 42(11): 195-201.)
[13] Wang X, MCCallum A.Topics Over Time: A Non-Markov Continuous-time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006: 424-433.
[14] Alsumalt L, Barbara D, Domeniconi C.Online LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
[15] 何建云, 陈兴蜀, 杜敏, 等. 基于改进的在线LDA模型的主题演化分析[J]. 中南大学学报: 自然科学版, 2015, 46(2): 547-553.
[15] (He Jianyun, Chen Xingshu, Du Min, et al.Topic Evolution Analysis Based on Improved Online LDA Model[J]. Journal of Central South University: Science and Technology, 2015, 46(2): 547-553.)
[16] 陈兴蜀, 高悦, 江浩, 等. 基于OLDA的热点话题演化跟踪模型[J]. 华南理工大学学报: 自然科学版, 2016, 44(5): 130-136.
[16] (Chen Xingshu, Gao Yue, Jiang Hao, et al.OLDA-Based Model for Hot Topic Evolution and Tracking[J]. Journal of South China University of Technology: Natural Science Edition, 2016, 44(5): 130-136.)
[17] 裴可锋, 陈永洲, 马静. 基于OLDA的可变在线主题演化模型[J]. 情报科学, 2017, 35(5): 63-68.
[17] (Pei Kefeng, Chen Yongzhou, Ma Jing.Variable Online Theme Evolution Model Based on OLDA[J]. Information Science, 2017, 35(5): 63-68.)
[18] 史明哲, 吴国栋, 张倩, 等. 多主题受限玻尔兹曼机的长尾分布推荐研究[J]. 小型微型计算机系统, 2018, 39(2): 304-309.
[18] (Shi Mingzhe, Wu Guodong, Zhang Qian, et al.Research on the Long Tail Distribution Recommendation of the Multi-topic and RBM[J]. Journal of Chinese Computer Systems, 2018, 39(2): 304-309.)
[19] 王行甫, 付欢欢, 王琳. 基于余弦相似度和实例加权改进的贝叶斯算法[J]. 计算机系统应用, 2016, 25(8): 166-170.
[19] (Wang Xingfu, Fu Huanhuan, Wang Lin.Improved Na?ve Bayes Algorithm Based on Weighted Instance with Cosine Similarity[J]. Computer Systems and Applications, 2016, 25(8): 166-170.)
[20] 史庆伟, 乔晓东, 徐硕, 等. 作者主题演化模型及其在研究兴趣演化分析中的应用[J]. 情报学报, 2013, 32(9): 912-919.
[20] (Shi Qingwei, Qiao Xiaodong, Xu Shuo, et al.Author-Topic Evolution Model and Its Application in Analysis of Research Interests Evolution[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(9): 912-919.)
[21] Sugimoto C R, Li D, Russell T G, et al.The Shifting Sands of Disciplinary Development: Analyzing North American Library and Information Science Dissertations Using Latent Dirichlet Allocation[J]. Journal of the Association for Information Science & Technology, 2011, 62(1): 185-204.
[22] 徐路路, 王效岳, 白如江, 等. 基于DTM模型和文本特征分析的基金项目新兴趋势探测研究——以NSF石墨烯领域为例[J]. 数据分析与知识发现, 2018, 2(3): 87-97.
[22] (Xu Lulu, Wang Xiaoyue, Bai Rujiang, et al.Detecting Emerging Trends of Funds Based on DTM Model and Text Analytics: Case Study of NSF Graphene Field[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 87-97.)
[1] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[2] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[3] Hongqinling Wang,Zhichao Ba,Gang Li. Conversational Topic Intensity Calculation and Evolution Analysis of WeChat Group[J]. 数据分析与知识发现, 2019, 3(2): 33-42.
[4] Yuemei Xu,Sining Lv,Lianqiao Cai,Xiaoya Zhang. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. 数据分析与知识发现, 2018, 2(9): 31-41.
[5] Jingqi Wang,Rui Li,Huayi Wu. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[6] Wang Yuefen,Jin Jialin. Characteristics and Development Trends of Papers from “New Technology of Library and Information Service”[J]. 现代图书情报技术, 2016, 32(9): 1-16.
[7] Wang Yuefen,Fu Zhu,Chen Bikun. Analyzing Knowledge Structure Research with LDA Model[J]. 现代图书情报技术, 2016, 32(4): 8-19.
[8] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
[9] Xu Yuemei,Li Yang,Liang Ye,Cai Lianqiao. Analyzing Evolution of News Topics with Manifold Learning[J]. 现代图书情报技术, 2016, 32(10): 59-69.
[10] Qin Xiaohui, Le Xiaoqiu. Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. 现代图书情报技术, 2015, 31(3): 18-25.
[11] Wu Wankun, Wu Qinglie, Gu Jinjiang. Hot Topic Extraction from E-commerce Microblog Based on EM-LDA Integrated Model[J]. 现代图书情报技术, 2015, 31(11): 33-40.
[12] Zhao Yingguang, Hong Na, An Xinying. A Survey of the Approach of Topic Evolution Model Based on Topic Model[J]. 现代图书情报技术, 2014, 30(10): 63-69.
[13] He Liang, Li Fang. Topic Evolution in Scientific Literature[J]. 现代图书情报技术, 2012, 28(4): 61-67.
[14] Shan Bin, Li Fang. Topic Evolution Based on Seminal Document and Topic Model[J]. 现代图书情报技术, 2011, 27(7/8): 104-109.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn