Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (3): 35-44    DOI: 10.11925/infotech.2096-3467.2019.1030
Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model
Shen Si1,Li Qinyu1,Ye Yuan1,Sun Hao1(),Ye Wenhao2
1School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China
2School of Information Management, Nanjing University, Nanjing 210023, China
[Objective] The paper uses word embedding representation technology to better discover the implicit associations among topics of the medical science and technology reports, aiming to improve the analysis methods for medical topic evolution. [Methods] We adopted the TWE (Topical Word Embeddings) model to analyze the potential semantic association among topics of oncology studies, as well as their evolution. [Results] We found the splitting correlation of topics in 2006 and 2007, as well as the merging correlation of topics in 2011 and 2012. However, these TWE correlation results were not fully reflected in the topic evolution of generated by traditional LDA method. In 2009 and 2010, the results yielded by traditional LDA and word embedding were completely different. [Limitations] Our sample size is limited because we only collected Chinese reports. More research is needed to examine the proposed method with other medical research topics. [Conclusions] The topic mining and evolution analysis based on the word embeddings representation model could highlight the impacts of deep learning on topic association. It provides better results for topic evolution analysis of medical Sci-Tech reports.

Key wordsWord Embeddings Representation      Topic Evolution      Sci-Tech Report      Medical Field     
Received: 11 September 2019      Published: 12 April 2021
ZTFLH:  G255  
Fund:Natural Science Foundation of Jiangsu Province(BK20190450);National Natural Science Foundation of China(71974094);National Social Science Fund of China(19FTQB015)
Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model. Data Analysis and Knowledge Discovery, 2021, 5(3): 35-44.

Research Framework
Semantic Computing Framework for the TWE Model
Number Distribution of Technical Reports
主题 主题词
Topic0 细胞,重要,肿瘤,作用,基因,治疗,进一步,方法,疾病,疫苗
Topic1 分子,相关,蛋白质,作用,蛋白,方法,提供,技术,基因,肿瘤
Topic2 技术,基因,调控,方法,治疗,功能,模型,检测,蛋白,药物
Topic3 技术,T细胞,细胞,肿瘤,胃癌,检测,基因,治疗,平台,疾病
Topic4 技术,临床,肿瘤,检测,系统,基因,平台,分子,活性,相关
LDA Topic Extraction Results(2006)
文档编号 词-主题分布
1 0∶2 1∶2 2∶2 3∶2 5∶2 6∶2 7∶2 8∶3
2 34∶4 35∶3 36∶0 37∶3 38∶0 39∶0 40∶1 41∶0
3 76∶4 77∶2 33∶4 78∶3 79∶2 80∶4 81∶1 82∶4
4 190∶0 191∶2 192∶4 193∶4 194∶3 26∶2 195∶4 191∶0
5 251∶4 252∶4 253∶1 254∶0 255∶0 256∶0 257∶0 258∶0
6 172∶3 296∶0 297∶1 298∶3 265∶4 299∶3 300∶3 301∶2
Word-Topic Distribution(Partial)
参数名称 数值
初始学习率alpha 0.025
滑动窗口大小 10
词向量维度 400
hs 1
最小词频阈值 5
TWE Model Parameter Setting List
Annual Topical Word Statistics
0 1 2 3 4 5
1 0.519 0.156 0.887 -0.029 0.705
2 0.613 0.347 0.820 0.084 0.897
3 0.667 0.505 0.936 0.164 0.895
4 0.366 0.272 0.502 -0.018 0.195
5 0.480 0.288 0.459 -0.033 0.235
2006-2007 Topic Association Results Represented by Word Embedding (Partial)
LDA Topic Evolution Diagram
Topical Word Embedding Represents the Topic Evolution Diagram
2006年 2007年
Some Thematic Terms in 2006 and 2007
2011年 2012年
Topic1:作用,细胞,技术,机制,检测,肝癌,肿瘤,蛋白质,小鼠,相关 Topic2:细胞,肿瘤干细胞,肿瘤,作用,提供,基因,临床,调控,系统,治疗
Some Thematic Terms in 2011 and 2012
2009年 2010年
Topic1:药物,体内,抗体,作用,毒性,肿瘤,动物,化合物,剂量,提供 Topic3:基因,分子,功能,蛋白,重要,调控,特异性,基础,生物,检测
Some Thematic Terms in 2009 and 2010
2006年 2007年
Topic5:技术,临床,肿瘤,检测,系统,基因,平台,分子,活性,相关 Topic5:技术,miR,基因,方法,作用,检测,功能,肿瘤,蛋白,活性
Some Thematic Terms in 2006 and 2007
