Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model
Shen Si1,Li Qinyu1,Ye Yuan1,Sun Hao1(),Ye Wenhao2
1School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China 2School of Information Management, Nanjing University, Nanjing 210023, China
[Objective] The paper uses word embedding representation technology to better discover the implicit associations among topics of the medical science and technology reports, aiming to improve the analysis methods for medical topic evolution. [Methods] We adopted the TWE (Topical Word Embeddings) model to analyze the potential semantic association among topics of oncology studies, as well as their evolution. [Results] We found the splitting correlation of topics in 2006 and 2007, as well as the merging correlation of topics in 2011 and 2012. However, these TWE correlation results were not fully reflected in the topic evolution of generated by traditional LDA method. In 2009 and 2010, the results yielded by traditional LDA and word embedding were completely different. [Limitations] Our sample size is limited because we only collected Chinese reports. More research is needed to examine the proposed method with other medical research topics. [Conclusions] The topic mining and evolution analysis based on the word embeddings representation model could highlight the impacts of deep learning on topic association. It provides better results for topic evolution analysis of medical Sci-Tech reports.
( Zhou Jie. Study on the Composition and Formation of Science and Technology Report[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(5):466-471.)
( Sun Jing, Cheng Qikai, Zhang Wen. NEViewer-Based Visual Analysis of Medical Scientific Research Topics Evolution[J]. Chinese Journal of Medical Library and Information Science, 2014,23(10):56-60.)
( Fan Shaoping, An Xinying, Shan Lianhui, et al. Topic Evolution Type and Method of Path Identification Based on Medical Literature[J]. Information Studies: Theory & Application, 2019,42(3):114-119.)
( Chen Sisi, Dong Liping, Xu Dan, et al. Comparative Analysis of Subject Novelty Detection Methods in Medical Literature[J]. Chinese Journal of Medical Library and Information Science, 2018,27(2):20-25.)
[5]
Collins F S, Varmus H. A New Initiative on Precision Medicine[J]. New England Journal of Medicine, 2015,372(9):793-795.
( Gong Xiaocui, An Xinying. A Research of Topic Splitting and Merging Detecting in the Medical Field Based on the LDA Model[J]. Library and Information Service, 2017,61(18):76-83.)
( Chen Enhong, Qiu Siyu, Xu Chang, et al. Word Embedding: Continuous Space Representation for Natural Language[J]. Journal of Data Acquisition & Processing, 2014,29(1):19-29.)
[8]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[9]
Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014:1532-1543.
[10]
Liu Y, Liu Z, Chua T S, et al. Topical Word Embeddings[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. AAAI Press, 2015: 2418-2424.
[11]
Abulaish M, Fazil M. Modeling Topic Evolution in Twitter: An Embedding-Based Approach[J]. IEEE Access, 2018,6:64847-64857.
( Xu Yuemei, Lv Sining, Cai Lianqiao, et al. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. Data Analysis and Knowledge Discovery, 2018,2(9):31-41.)
( Ba Zhichao, Yang Zijiang, Zhu Shiwei, et al. Research on Domain Topology Evolution Analysis Method Based on Keyword Semantic Network[J]. Information Studies: Theory & Application, 2016,39(3):67-72.)
[14]
Jha K, Xun G, Gopalakrishnan V, et al. DWE-Med: Dynamic Word Embeddings for Medical Domain[J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2019,13(2):19.
( Qu Jingye, Chen Zhen, Zheng Yanning. Research on the Text Clustering Method of Science and Technology Reports Based on the Topic Model[J]. Library and Information Service, 2018,62(4):113-120.)
[16]
Vrettas G, Sanderson M. Conferences Versus Journals in Computer Science[M]. John Wiley & Sons, Inc., 2015.
( Ding Yufei, Wang Yuefen, Liu Weijiang. Method of Science and Technology Monitoring Based on Topic Model and Its Application[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(8):854-865.)
( Wang Yanpeng. Research Progress of Scientific and Technical Literature Topic Detection and Evolution Based on Topic Model in China[J]. Library and Information Service, 2016,60(3):130-137.)
[19]
Chen W Q, Zheng R S, Baade P D, et al. Cancer Statistics in China, 2015[J]. CA: A Cancer Journal for Clinicians, 2016,66(2):115-132.
( Fu Zhentao, Guo Xiaolei, Zhang Siwei, et al. Analysis of the Incidence and Death of Nasopharyngeal Carcinoma in China in 2014[J]. Chinese Journal of Oncology, 2018,40(8):566-571.)
[21]
曾木圣. 肿瘤学领域的发展现状和未来挑战[J]. 科学观察, 2015,10(3):58-62.
[21]
( Zeng Musheng. Development Status and Future Challenges in the Field of Oncology[J]. Science Focus, 2015,10(3):58-62.)
( Cui Changchang, Ke Xue, Lv Huixia. Progress in Study on Therapeutic Targeting of Cancer Stem Cells[J]. Progress in Pharmaceutical Sciences, 2016,40(1):20-29.)
( Zhou Gengyin, Zhang Xiaofang. Mechanism and Reversal of Multidrug Resistance in Tumors[J]. Chinese Journal of Clinical and Experimental Pathology, 2009,25(4):348-351.)