[Objective] This paper tries to construct data analysis model for the topics of scientific research based on machine learning. [Methods] First, we clustered data with the Latent Dirichlet Allocation model. Then, we investigated the correlation among year, institution and research types with the help of Python modules. Finally, we revealed and visualized the key research areas of every year or institution. [Results] We analyzed 101,813 papers and patents of graphene industray research. The proposed method finished the topic identification, correlation analysis, and visualization in about two miniutes. [Limitations] More research is needed to explore the network analysis issues. [Conclusions] Machine learning provides enormous potentiality for intelligence studies, especially the large volume text analytics and visualization.
Blei M D, Ng Y A, Jordan I M.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[2]
Lee W S, Han E J, Sohn S Y.Predicting the Pattern of Technology Convergence Using Big-Data Technology on Large-Scale Triadic Patents[J]. Technological Forecasting & Social Change, 2015, 100: 317-329.
doi: 10.1016/j.techfore.2015.07.022
(Ren Zhijun, Qiao Xiaodong, Zhang Jiangtao.Discover Emerging Technologies with LDA Model[J]. New Technology of Library and Information Service, 2016(8): 60-69.)
(Yang Chao, Zhu Donghua, Wang Xuefeng, et al.Technical Topic Analysis in Patents: SAO-based LDA Modeling[J]. Library and Information Service, 2017, 61(3): 86-96.)
doi: 10.13266/j.issn.0252-3116.2017.03.012
[6]
Suominen A, Toivanen H, Seppänen M.Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning[J]. Technological Forecasting & Social Change, 2017, 115: 131-142.
doi: 10.1016/j.techfore.2016.09.028