|
|
Visualizing Document Correlation Based on LDA Model |
Wang Li(), Zou Lixue, Liu Xiwen |
National Science Library, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract [Objective] This paper tries to construct data analysis model for the topics of scientific research based on machine learning. [Methods] First, we clustered data with the Latent Dirichlet Allocation model. Then, we investigated the correlation among year, institution and research types with the help of Python modules. Finally, we revealed and visualized the key research areas of every year or institution. [Results] We analyzed 101,813 papers and patents of graphene industray research. The proposed method finished the topic identification, correlation analysis, and visualization in about two miniutes. [Limitations] More research is needed to explore the network analysis issues. [Conclusions] Machine learning provides enormous potentiality for intelligence studies, especially the large volume text analytics and visualization.
|
Received: 24 October 2017
Published: 03 April 2018
|
|
[1] |
Blei M D, Ng Y A, Jordan I M.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
|
[2] |
Lee W S, Han E J, Sohn S Y.Predicting the Pattern of Technology Convergence Using Big-Data Technology on Large-Scale Triadic Patents[J]. Technological Forecasting & Social Change, 2015, 100: 317-329.
doi: 10.1016/j.techfore.2015.07.022
|
[3] |
王博, 刘盛博, 丁堃, 等. 基于LDA 主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3):111-117.
|
[3] |
(Wang Bo, Liu Shengbo, Ding Kun, et al.Patent Content Analysis Method Based on LDA Topic Model[J]. Science Research Management, 2015, 36(3): 111-117.)
|
[4] |
任智军, 乔晓东, 张江涛. 新兴技术发现模型研究[J]. 现代图书情报技术, 2016(8): 60-69.
|
[4] |
(Ren Zhijun, Qiao Xiaodong, Zhang Jiangtao.Discover Emerging Technologies with LDA Model[J]. New Technology of Library and Information Service, 2016(8): 60-69.)
|
[5] |
杨超, 朱东华, 汪雪锋, 等. 专利技术主题分析: 基于SAO 结构的LDA 主题模型方法[J]. 图书情报工作, 2017, 61(3):86-96.
doi: 10.13266/j.issn.0252-3116.2017.03.012
|
[5] |
(Yang Chao, Zhu Donghua, Wang Xuefeng, et al.Technical Topic Analysis in Patents: SAO-based LDA Modeling[J]. Library and Information Service, 2017, 61(3): 86-96.)
doi: 10.13266/j.issn.0252-3116.2017.03.012
|
[6] |
Suominen A, Toivanen H, Seppänen M.Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning[J]. Technological Forecasting & Social Change, 2017, 115: 131-142.
doi: 10.1016/j.techfore.2016.09.028
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|