[Objective] This paper analyzes the semantic drift of domain terms with machine learning techniques. It recognizes and visualizes interdisciplinary semantic drifts and explores their patterns and causes. [Methods] We designed a framework for identifying and visualizing the semantic drift of domain terms with deep learning algorithms. The framework combined algorithms of “SBERT model+word embedding optimization+hierarchical clustering” to identify interdisciplinary semantic drift. It also utilized Bokeh and principal component analysis to visualize the phenomenon of interdisciplinary semantic drift. [Results] The proposed framework can accurately identify interdisciplinary semantic drift, and the overall recognition accuracy (p) in the DT-Sentence dataset reached 86.15%. [Limitations] The framework needs to be verified with more disciplines’ datasets. [Conclusions] This study benefits data mining and visualization of semantic drifts. It also lays the technical foundation for semantic evolution, understanding, and modeling.
李楠, 汪波. 跨学科语义漂移识别与可视化分析*[J]. 数据分析与知识发现, 2023, 7(10): 15-24.
Li Nan, Wang Bo. Recognition and Visual Analysis of Interdisciplinary Semantic Drift. Data Analysis and Knowledge Discovery, 2023, 7(10): 15-24.
Yan E, Zhu Y J. Tracking Word Semantic Change in Biomedical Literature[J]. International Journal of Medical Informatics, 2018, 109: 76-86.
doi: S1386-5056(17)30418-5
pmid: 29195709
[2]
李轶. 四险一金领域术语语义漂移研究[D]. 哈尔滨: 哈尔滨工程大学, 2020.
[2]
(Li Yi. Research on Semantics Shifts of Terminology in Domain of Social Insurance and Housing Fund[D]. Harbin: Harbin Engineering University, 2020.)
(Chen Guo, Chen Jing, Xiao Lu. Lexical Semantic Chain: A Theoretical Framework for Lexical Semantic Mining in the Perspective of Domain Analysis[J]. Information Studies: Theory & Application, 2022, 45(4): 170-176, 183.)
[4]
Vylomova E, Murphy S, Haslam N, et al. Evaluation of Semantic Change of Harm-Related Concepts in Psychology[C]// Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. 2019: 29-34.
(Li Xuhui, Wu Qingfeng. Research on Video Semantic Representation for Events[J]. Library and Information Service, 2020, 64(10): 99-108.)
doi: 10.13266/j.issn.0252-3116.2020.10.011
(Qu Jiabin, Ou Shiyan. Semantic Modeling for Scientific Paper Argumentation Structure Driven by Sematic Publishing[J]. Journal of Modern Information, 2021, 41(12): 48-59.)
doi: 10.3969/j.issn.1008-0821.2021.12.005
[7]
Jatowt A, Duh K. A Framework for Analyzing Semantic Change of Words Across Time[C]// Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. 2014: 229-238.
[8]
Hamilton W L, Leskovec J, Jurafsky D. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2016: 1489-1501.
[9]
Kanjirangat V, Mitrovic S, Antonucci A, et al. SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-Based Embedding Spaces[C]// Proceedings of the 14th International Workshops on Semantic Evaluation. 2020: 214-221.
(Wu Shuqiong, Liu Dilin, Ran Ran. The Polysemy of the Mental Verb Xiang “Think”: A Corpus-Based Behavioral Profile Analysis[J]. Foreign Languages and Their Teaching, 2021(5): 1-13.)
Wang S H, Schlobach S, Klein M. Concept Drift and How to Identify It[J]. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 2011, 9(3): 247-265.
doi: 10.1016/j.websem.2011.05.003
[14]
Chen B T, Ding Y, Ma F C. Semantic Word Shifts in a Scientific Domain[J]. Scientometrics, 2018, 117(1): 211-226.
doi: 10.1007/s11192-018-2843-2
(Wang Zhongyi, Tu Yue, Xia Lixin. Research on Subject Knowledge Drift in Scientific Literature Resources[J]. Information Studies: Theory & Application, 2021, 44(6): 118-124.)
doi: 10.16353/j.cnki.1000-7490.2021.06.017
(Pan Jun, Wu Zongda. Diachronic Semantic Mining and Visualization of Chinese Words: A Knowledge Discovery Perspective[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(10): 1052-1064.)
(Zhang Rui, Zhao Dongxiang, Tang Xuli, et al. Research on Interdisciplinary Transfer and Development of Academic Terms from the Perspective of Knowledge Flow[J]. Information Studies: Theory & Application, 2020, 43(1): 47-55, 75.)
[18]
Xu J, Bu Y, Ding Y, et al. Understanding the Formation of Interdisciplinary Research from the Perspective of Keyword Evolution: A Case Study on Joint Attention[J]. Scientometrics, 2018, 117(2): 973-995.
doi: 10.1007/s11192-018-2897-1
(Qian Shenghua. Automatic Short Answer Grading Based on Siamese Network and BERT Model[J]. Computer Systems & Applications, 2022, 31(3): 143-149.)
[20]
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3982-3992.
[21]
Li B H, Zhou H, He J X, et al. On the Sentence Embeddings from Pre-trained Language Models[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 9119-9130.
(Bai Simeng, Niu Zhendong, He Hui, et al. Biomedical Text Classification Method Based on Hypergraph Attention Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 13-24.)