Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (3): 98-106    DOI: 10.11925/infotech.2096-3467.2017.1058
Current Issue | Archive | Adv Search |
Visualizing Document Correlation Based on LDA Model
Wang Li(), Zou Lixue, Liu Xiwen
National Science Library, Chinese Academy of Sciences, Beijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF (4133 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to construct data analysis model for the topics of scientific research based on machine learning. [Methods] First, we clustered data with the Latent Dirichlet Allocation model. Then, we investigated the correlation among year, institution and research types with the help of Python modules. Finally, we revealed and visualized the key research areas of every year or institution. [Results] We analyzed 101,813 papers and patents of graphene industray research. The proposed method finished the topic identification, correlation analysis, and visualization in about two miniutes. [Limitations] More research is needed to explore the network analysis issues. [Conclusions] Machine learning provides enormous potentiality for intelligence studies, especially the large volume text analytics and visualization.

Key wordsLDA Model      Data Analysis      Machine Learning      Python      Data Visualization     
Received: 24 October 2017      Published: 03 April 2018
ZTFLH:  TP393  

Cite this article:

Wang Li,Zou Lixue,Liu Xiwen. Visualizing Document Correlation Based on LDA Model. Data Analysis and Knowledge Discovery, 2018, 2(3): 98-106.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1058     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I3/98

主题1 石墨烯建模仿真 主题2 石墨烯电化学性能 主题3 石墨烯FET器件 主题4 石墨烯反常霍尔效应
特征词 概率 特征词 概率 特征词 概率 特征词 概率
monolayers 0.040 surface structure 0.075 electric current-potential relationship 0.100 magnetic field effects 0.037
simulation and modeling 0.033 nanoparticles 0.073 electric conductivity 0.052 electric conductivity 0.030
multilayers 0.024 cyclic voltammetry 0.051 electric resistance 0.050 electron transport 0.027
electric field effects 0.022 nanocomposites 0.049 electric capacitance 0.043 nanoribbons 0.026
phonon 0.017 glassy carbon electrodes 0.033 electrodes 0.043 band structure 0.025
electric conductivity 0.016 Nano sheets 0.032 field effect transistors 0.038 fermi level 0.023
electric current carriers 0.015 electron transfer 0.022 double layer capacitors 0.035 quantum hall effect 0.020
optical transmission 0.014 electric impedance 0.021 raman spectra 0.034 landau level 0.018
semiconductor materials 0.014 x-ray diffraction 0.020 solar cells 0.023 magnetization 0.017
dielectric constant 0.013 ph 0.019 electric impedance 0.019 tight-binding method 0.015
主题5 石墨烯/
碳纳米管复合材料
主题6石墨烯生物相容性 主题7 氧化石墨烯 主题8 石墨烯高
分子复合材料
特征词 概率 特征词 概率 特征词 概率 特征词 概率
films 0.045 human 0.031 reduction 0.039 polyesters 0.035
chemical vapor deposition 0.036 electronic device fabrication 0.021 oxidation 0.036 carbon nanotubes 0.034
carbon nanotubes 0.034 surface treatment 0.016 adsorption 0.033 epoxy resins 0.022
annealing 0.026 homo sapiens 0.014 surface area 0.033 polysiloxanes 0.020
electric conductors 0.025 chemically modified electrodes 0.014 Nano sheets 0.017 polyimides 0.020
etching 0.020 ph 0.014 nanostructured materials 0.017 polyurethanes 0.019
metals 0.020 quantum dots 0.013 sonication 0.017 polyamides 0.018
coating process 0.020 stability 0.013 exfoliation 0.016 polyoxyalkylenes 0.015
electrodes 0.018 fluorescence 0.012 pore size distribution 0.016 polyethers 0.015
sheet resistance 0.018 nanoscale surface modification 0.012 pore size 0.015 coating materials 0.014
主题9 石墨烯
纳米带
主题10 石墨烯
复合材料的光学性
主题11 石墨烯
复合材料的力学性
主题12 石墨烯
储能电池
特征词 概率 特征词 概率 特征词 概率 特征词 概率
density of states 0.057 raman spectra 0.093 nanocomposites 0.049 secondary batteries 0.063
band gap 0.051 microstructure 0.044 thermal conductivity 0.031 carbon nanotubes 0.057
nanoribbons 0.044 x-ray photoelectron spectra 0.041 thermal stability 0.030 composites 0.053
density functional theory 0.043 nanoparticles 0.041 polymer morphology 0.028 fluoropolymers 0.047
band structure 0.036 nanocomposites 0.038 young's modulus 0.028 carbon black 0.035
electronic structure 0.035 surface structure 0.037 electric conductivity 0.024 battery anodes 0.034
binding energy 0.026 nanostructures 0.036 carbon nanotubes 0.020 lithium-ion secondary batteries 0.023
electron density 0.023 uv and visible spectra 0.032 tensile strength 0.020 heat treatment 0.020
fermi level 0.020 nanosheets 0.029 molecular dynamics simulation 0.020 battery cathodes 0.015
band structure 0.020 ir spectra 0.025 strain 0.017 carbon fibers 0.015
[1] Blei M D, Ng Y A, Jordan I M.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] Lee W S, Han E J, Sohn S Y.Predicting the Pattern of Technology Convergence Using Big-Data Technology on Large-Scale Triadic Patents[J]. Technological Forecasting & Social Change, 2015, 100: 317-329.
doi: 10.1016/j.techfore.2015.07.022
[3] 王博, 刘盛博, 丁堃, 等. 基于LDA 主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3):111-117.
[3] (Wang Bo, Liu Shengbo, Ding Kun, et al.Patent Content Analysis Method Based on LDA Topic Model[J]. Science Research Management, 2015, 36(3): 111-117.)
[4] 任智军, 乔晓东, 张江涛. 新兴技术发现模型研究[J]. 现代图书情报技术, 2016(8): 60-69.
[4] (Ren Zhijun, Qiao Xiaodong, Zhang Jiangtao.Discover Emerging Technologies with LDA Model[J]. New Technology of Library and Information Service, 2016(8): 60-69.)
[5] 杨超, 朱东华, 汪雪锋, 等. 专利技术主题分析: 基于SAO 结构的LDA 主题模型方法[J]. 图书情报工作, 2017, 61(3):86-96.
doi: 10.13266/j.issn.0252-3116.2017.03.012
[5] (Yang Chao, Zhu Donghua, Wang Xuefeng, et al.Technical Topic Analysis in Patents: SAO-based LDA Modeling[J]. Library and Information Service, 2017, 61(3): 86-96.)
doi: 10.13266/j.issn.0252-3116.2017.03.012
[6] Suominen A, Toivanen H, Seppänen M.Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning[J]. Technological Forecasting & Social Change, 2017, 115: 131-142.
doi: 10.1016/j.techfore.2016.09.028
[1] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[5] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[6] Hong Pan,Li Tang. Qualitative Data Analysis in Chinese Social Science Studies——The Case of Nvivo[J]. 数据分析与知识发现, 2020, 4(1): 51-62.
[7] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[8] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[9] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[10] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[11] Xiaozhou Dong,Xinkang Chen. E-Coupon and Economic Performance of E-commerce[J]. 数据分析与知识发现, 2019, 3(6): 42-49.
[12] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[13] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[14] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[15] Jing Li,Shuxiao Pan,Xueyan Li,Lijing Jia,Yuzhuo Zhao. Screening Critical Patients with Optimized Classifier Based on Multi Objective Quantum[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn