Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (3): 98-106    DOI: 10.11925/infotech.2096-3467.2017.1058
Current Issue | Archive | Adv Search |
Visualizing Document Correlation Based on LDA Model
Li Wang(),Lixue Zou,Xiwen Liu
National Science Library, Chinese Academy of Sciences, Beijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(4133 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      

[Objective] This paper tries to construct data analysis model for the topics of scientific research based on machine learning. [Methods] First, we clustered data with the Latent Dirichlet Allocation model. Then, we investigated the correlation among year, institution and research types with the help of Python modules. Finally, we revealed and visualized the key research areas of every year or institution. [Results] We analyzed 101,813 papers and patents of graphene industray research. The proposed method finished the topic identification, correlation analysis, and visualization in about two miniutes. [Limitations] More research is needed to explore the network analysis issues. [Conclusions] Machine learning provides enormous potentiality for intelligence studies, especially the large volume text analytics and visualization.

Key wordsLDA Model      Data Analysis      Machine Learning      Python      Data Visualization     
Received: 24 October 2017      Published: 03 April 2018

Cite this article:

Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model. Data Analysis and Knowledge Discovery, 2018, 2(3): 98-106.

URL:     OR

[1] Blei M D, Ng Y A, Jordan I M.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] Lee W S, Han E J, Sohn S Y.Predicting the Pattern of Technology Convergence Using Big-Data Technology on Large-Scale Triadic Patents[J]. Technological Forecasting & Social Change, 2015, 100: 317-329.
[3] 王博, 刘盛博, 丁堃, 等. 基于LDA 主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3):111-117.
[3] (Wang Bo, Liu Shengbo, Ding Kun, et al.Patent Content Analysis Method Based on LDA Topic Model[J]. Science Research Management, 2015, 36(3): 111-117.)
[4] 任智军, 乔晓东, 张江涛. 新兴技术发现模型研究[J]. 现代图书情报技术, 2016(8): 60-69.
[4] (Ren Zhijun, Qiao Xiaodong, Zhang Jiangtao.Discover Emerging Technologies with LDA Model[J]. New Technology of Library and Information Service, 2016(8): 60-69.)
[5] 杨超, 朱东华, 汪雪锋, 等. 专利技术主题分析: 基于SAO 结构的LDA 主题模型方法[J]. 图书情报工作, 2017, 61(3):86-96.
[5] (Yang Chao, Zhu Donghua, Wang Xuefeng, et al.Technical Topic Analysis in Patents: SAO-based LDA Modeling[J]. Library and Information Service, 2017, 61(3): 86-96.)
[6] Suominen A, Toivanen H, Sepp?nen M.Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning[J]. Technological Forecasting & Social Change, 2017, 115: 131-142.
[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Xiaozhou Dong,Xinkang Chen. E-Coupon and Economic Performance of E-commerce[J]. 数据分析与知识发现, 2019, 3(6): 42-49.
[3] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[4] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[5] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[6] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[7] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[8] Yanhua Xu,Yujie Miao,Lin Miao,Xueqiang Lv. Generating HSK Writing Essays with LDA Model[J]. 数据分析与知识发现, 2018, 2(9): 80-87.
[9] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[10] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[11] Jingqi Wang,Rui Li,Huayi Wu. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[12] Xinyue Fan,Lei Cui. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[13] Yang Zhao,Xini Yuan,Yawen Chen,Liqiang Wu. Predicting Conversion Rate of APP Advertising with Machine Learning[J]. 数据分析与知识发现, 2018, 2(11): 2-9.
[14] Xin Wang,Wen’gang Feng. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[15] Zhen Li,Shengchun Ding,Nan Wang. Identifying Topics of Online Public Opinion[J]. 数据分析与知识发现, 2017, 1(8): 18-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938