Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (3): 35-44    DOI: 10.11925/infotech.2096-3467.2019.1030
Current Issue | Archive | Adv Search |
Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model
Shen Si1,Li Qinyu1,Ye Yuan1,Sun Hao1(),Ye Wenhao2
1School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China
2School of Information Management, Nanjing University, Nanjing 210023, China
Download: PDF (1401 KB)   HTML ( 11
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper uses word embedding representation technology to better discover the implicit associations among topics of the medical science and technology reports, aiming to improve the analysis methods for medical topic evolution. [Methods] We adopted the TWE (Topical Word Embeddings) model to analyze the potential semantic association among topics of oncology studies, as well as their evolution. [Results] We found the splitting correlation of topics in 2006 and 2007, as well as the merging correlation of topics in 2011 and 2012. However, these TWE correlation results were not fully reflected in the topic evolution of generated by traditional LDA method. In 2009 and 2010, the results yielded by traditional LDA and word embedding were completely different. [Limitations] Our sample size is limited because we only collected Chinese reports. More research is needed to examine the proposed method with other medical research topics. [Conclusions] The topic mining and evolution analysis based on the word embeddings representation model could highlight the impacts of deep learning on topic association. It provides better results for topic evolution analysis of medical Sci-Tech reports.

Key wordsWord Embeddings Representation      Topic Evolution      Sci-Tech Report      Medical Field     
Received: 11 September 2019      Published: 12 April 2021
ZTFLH:  G255  
Fund:Natural Science Foundation of Jiangsu Province(BK20190450);National Natural Science Foundation of China(71974094);National Social Science Fund of China(19FTQB015)
Corresponding Authors: Sun Hao     E-mail: 117107010889@njust.edu.cn

Cite this article:

Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model. Data Analysis and Knowledge Discovery, 2021, 5(3): 35-44.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.1030     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I3/35

Research Framework
Semantic Computing Framework for the TWE Model
Number Distribution of Technical Reports
主题 主题词
Topic0 细胞,重要,肿瘤,作用,基因,治疗,进一步,方法,疾病,疫苗
Topic1 分子,相关,蛋白质,作用,蛋白,方法,提供,技术,基因,肿瘤
Topic2 技术,基因,调控,方法,治疗,功能,模型,检测,蛋白,药物
Topic3 技术,T细胞,细胞,肿瘤,胃癌,检测,基因,治疗,平台,疾病
Topic4 技术,临床,肿瘤,检测,系统,基因,平台,分子,活性,相关
LDA Topic Extraction Results(2006)
文档编号 词-主题分布
1 0∶2 1∶2 2∶2 3∶2 5∶2 6∶2 7∶2 8∶3
2 34∶4 35∶3 36∶0 37∶3 38∶0 39∶0 40∶1 41∶0
3 76∶4 77∶2 33∶4 78∶3 79∶2 80∶4 81∶1 82∶4
4 190∶0 191∶2 192∶4 193∶4 194∶3 26∶2 195∶4 191∶0
5 251∶4 252∶4 253∶1 254∶0 255∶0 256∶0 257∶0 258∶0
6 172∶3 296∶0 297∶1 298∶3 265∶4 299∶3 300∶3 301∶2
Word-Topic Distribution(Partial)
参数名称 数值
初始学习率alpha 0.025
滑动窗口大小 10
词向量维度 400
hs 1
最小词频阈值 5
TWE Model Parameter Setting List
Annual Topical Word Statistics
0 1 2 3 4 5
1 0.519 0.156 0.887 -0.029 0.705
2 0.613 0.347 0.820 0.084 0.897
3 0.667 0.505 0.936 0.164 0.895
4 0.366 0.272 0.502 -0.018 0.195
5 0.480 0.288 0.459 -0.033 0.235
2006-2007 Topic Association Results Represented by Word Embedding (Partial)
LDA Topic Evolution Diagram
Topical Word Embedding Represents the Topic Evolution Diagram
2006年 2007年
Topic1:细胞,重要,肿瘤,作用,基因,治疗,进一步,方法,疾病,疫苗
Topic2:分子,相关,蛋白质,作用,蛋白,方法,提供,技术,基因,肿瘤
Topic1:肝癌,技术,调控,靶点,体内,初步,制备,药物,工艺,miRNA
Topic3:分子,功能,技术,蛋白,肿瘤,细胞,治疗,干细胞,相关,药物
Topic5:技术,miRNA,基因,方法,作用,检测,功能,肿瘤,蛋白,活性
Some Thematic Terms in 2006 and 2007
2011年 2012年
Topic1:作用,细胞,技术,机制,检测,肝癌,肿瘤,蛋白质,小鼠,相关 Topic2:细胞,肿瘤干细胞,肿瘤,作用,提供,基因,临床,调控,系统,治疗
Topic5:细胞,肺癌,相关,提供,调控,蛋白质,构建,重要,成像,核酸
Some Thematic Terms in 2011 and 2012
2009年 2010年
Topic1:药物,体内,抗体,作用,毒性,肿瘤,动物,化合物,剂量,提供 Topic3:基因,分子,功能,蛋白,重要,调控,特异性,基础,生物,检测
Some Thematic Terms in 2009 and 2010
2006年 2007年
Topic5:技术,临床,肿瘤,检测,系统,基因,平台,分子,活性,相关 Topic5:技术,miR,基因,方法,作用,检测,功能,肿瘤,蛋白,活性
Some Thematic Terms in 2006 and 2007
[1] 周杰. 科技报告资源的构成及产生机理研究[J]. 情报学报, 2013,32(5):466-471.
[1] ( Zhou Jie. Study on the Composition and Formation of Science and Technology Report[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(5):466-471.)
[2] 孙静, 程齐凯, 张雯. 基于NEViewer的医学科研主题演化可视化分析[J]. 中华医学图书情报杂志, 2014,23(10):56-60.
[2] ( Sun Jing, Cheng Qikai, Zhang Wen. NEViewer-Based Visual Analysis of Medical Scientific Research Topics Evolution[J]. Chinese Journal of Medical Library and Information Science, 2014,23(10):56-60.)
[3] 范少萍, 安新颖, 单连慧, 等. 基于医学文献的主题演化类型与演化路径识别方法研究[J]. 情报理论与实践, 2019,42(3):114-119.
[3] ( Fan Shaoping, An Xinying, Shan Lianhui, et al. Topic Evolution Type and Method of Path Identification Based on Medical Literature[J]. Information Studies: Theory & Application, 2019,42(3):114-119.)
[4] 陈斯斯, 董立平, 许丹, 等. 医学文献主题新颖性探测方法对比分析[J]. 中华医学图书情报杂志, 2018,27(2):20-25.
[4] ( Chen Sisi, Dong Liping, Xu Dan, et al. Comparative Analysis of Subject Novelty Detection Methods in Medical Literature[J]. Chinese Journal of Medical Library and Information Science, 2018,27(2):20-25.)
[5] Collins F S, Varmus H. A New Initiative on Precision Medicine[J]. New England Journal of Medicine, 2015,372(9):793-795.
[6] 宫小翠, 安新颖. 基于LDA模型的医学领域主题分裂融合探测[J]. 图书情报工作, 2017,61(18):76-83.
[6] ( Gong Xiaocui, An Xinying. A Research of Topic Splitting and Merging Detecting in the Medical Field Based on the LDA Model[J]. Library and Information Service, 2017,61(18):76-83.)
[7] 陈恩红, 邱思语, 许畅, 等. 单词嵌入——自然语言的连续空间表示[J]. 数据采集与处理, 2014,29(1):19-29.
[7] ( Chen Enhong, Qiu Siyu, Xu Chang, et al. Word Embedding: Continuous Space Representation for Natural Language[J]. Journal of Data Acquisition & Processing, 2014,29(1):19-29.)
[8] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[9] Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014:1532-1543.
[10] Liu Y, Liu Z, Chua T S, et al. Topical Word Embeddings[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. AAAI Press, 2015: 2418-2424.
[11] Abulaish M, Fazil M. Modeling Topic Evolution in Twitter: An Embedding-Based Approach[J]. IEEE Access, 2018,6:64847-64857.
[12] 徐月梅, 吕思凝, 蔡连侨, 等. 结合卷积神经网络和Topic2Vec 的新闻主题演变分析[J]. 数据分析与知识发现, 2018,2(9):31-41.
[12] ( Xu Yuemei, Lv Sining, Cai Lianqiao, et al. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. Data Analysis and Knowledge Discovery, 2018,2(9):31-41.)
[13] 巴志超, 杨子江, 朱世伟, 等. 基于关键词语义网络的领域主题演化分析方法研究[J]. 情报理论与实践, 2016,39(3):67-72.
[13] ( Ba Zhichao, Yang Zijiang, Zhu Shiwei, et al. Research on Domain Topology Evolution Analysis Method Based on Keyword Semantic Network[J]. Information Studies: Theory & Application, 2016,39(3):67-72.)
[14] Jha K, Xun G, Gopalakrishnan V, et al. DWE-Med: Dynamic Word Embeddings for Medical Domain[J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2019,13(2):19.
[15] 曲靖野, 陈震, 郑彦宁. 基于主题模型的科技报告文档聚类方法研究[J]. 图书情报工作, 2018,62(4):113-120.
[15] ( Qu Jingye, Chen Zhen, Zheng Yanning. Research on the Text Clustering Method of Science and Technology Reports Based on the Topic Model[J]. Library and Information Service, 2018,62(4):113-120.)
[16] Vrettas G, Sanderson M. Conferences Versus Journals in Computer Science[M]. John Wiley & Sons, Inc., 2015.
[17] 丁玉飞, 王曰芬, 刘卫江. 基于主题模型的科技监测方法及应用研究[J]. 情报学报, 2015,34(8):854-865.
[17] ( Ding Yufei, Wang Yuefen, Liu Weijiang. Method of Science and Technology Monitoring Based on Topic Model and Its Application[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(8):854-865.)
[18] 王燕鹏. 国内基于主题模型的科技文献主题发现及演化研究进展[J]. 图书情报工作, 2016,60(3):130-137.
[18] ( Wang Yanpeng. Research Progress of Scientific and Technical Literature Topic Detection and Evolution Based on Topic Model in China[J]. Library and Information Service, 2016,60(3):130-137.)
[19] Chen W Q, Zheng R S, Baade P D, et al. Cancer Statistics in China, 2015[J]. CA: A Cancer Journal for Clinicians, 2016,66(2):115-132.
[20] 付振涛, 郭晓雷, 张思维, 等. 2014年中国鼻咽癌发病与死亡分析[J]. 中华肿瘤杂志, 2018,40(8):566-571.
[20] ( Fu Zhentao, Guo Xiaolei, Zhang Siwei, et al. Analysis of the Incidence and Death of Nasopharyngeal Carcinoma in China in 2014[J]. Chinese Journal of Oncology, 2018,40(8):566-571.)
[21] 曾木圣. 肿瘤学领域的发展现状和未来挑战[J]. 科学观察, 2015,10(3):58-62.
[21] ( Zeng Musheng. Development Status and Future Challenges in the Field of Oncology[J]. Science Focus, 2015,10(3):58-62.)
[22] 崔畅畅, 柯学, 吕慧侠. 肿瘤干细胞靶向治疗研究进展[J]. 药学进展, 2016,40(1):20-29.
[22] ( Cui Changchang, Ke Xue, Lv Huixia. Progress in Study on Therapeutic Targeting of Cancer Stem Cells[J]. Progress in Pharmaceutical Sciences, 2016,40(1):20-29.)
[23] 周庚寅, 张晓芳. 肿瘤多药耐药机制及其逆转[J]. 临床与实验病理学杂志, 2009,25(4):348-351.
[23] ( Zhou Gengyin, Zhang Xiaofang. Mechanism and Reversal of Multidrug Resistance in Tumors[J]. Chinese Journal of Clinical and Experimental Pathology, 2009,25(4):348-351.)
[24] 林高阳, 徐克. MicroRNA调控肿瘤耐药的研究进展[J]. 中国肺癌杂志, 2014,17(10):741-749.
[24] ( Lin Gaoyang, Xu Ke. Advances in Research on MicroRNA Regulation of Tumor Resistance[J]. Chinese Journal of Lung Cancer, 2014,17(10):741-749.)
[1] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[2] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[3] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[4] Hongqinling Wang,Zhichao Ba,Gang Li. Conversational Topic Intensity Calculation and Evolution Analysis of WeChat Group[J]. 数据分析与知识发现, 2019, 3(2): 33-42.
[5] Gang Li,Sijing Chen,Jin Mao,Yansong Gu. Spatio-Temporal Comparison of Microblog Trending Topics on Natural Disasters[J]. 数据分析与知识发现, 2019, 3(11): 1-15.
[6] Xu Yuemei,Lv Sining,Cai Lianqiao,Zhang Xiaoya. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. 数据分析与知识发现, 2018, 2(9): 31-41.
[7] Wang Jingqi,Li Rui,Wu Huayi. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[8] He Weilin,Feng Guohe,Xie Hongling. Analyzing Scientific Literature with Content Similarity - Topics over Time Model[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[9] Wang Yuefen,Jin Jialin. Characteristics and Development Trends of Papers from “New Technology of Library and Information Service”[J]. 现代图书情报技术, 2016, 32(9): 1-16.
[10] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
[11] Xu Yuemei,Li Yang,Liang Ye,Cai Lianqiao. Analyzing Evolution of News Topics with Manifold Learning[J]. 现代图书情报技术, 2016, 32(10): 59-69.
[12] Qin Xiaohui, Le Xiaoqiu. Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. 现代图书情报技术, 2015, 31(3): 18-25.
[13] Zhao Yingguang, Hong Na, An Xinying. A Survey of the Approach of Topic Evolution Model Based on Topic Model[J]. 现代图书情报技术, 2014, 30(10): 63-69.
[14] He Liang, Li Fang. Topic Evolution in Scientific Literature[J]. 现代图书情报技术, 2012, 28(4): 61-67.
[15] Xu Kun, Cao Jindan, Bi Qiang. A Study and Application on Medical Text Categorization Based on FCA[J]. 现代图书情报技术, 2012, 28(3): 23-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn