Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (10): 15-24    DOI: 10.11925/infotech.2096-3467.2022.0635
Current Issue | Archive | Adv Search |
Recognition and Visual Analysis of Interdisciplinary Semantic Drift
Li Nan,Wang Bo()
Institute of Science and Technology Information, East China University of Science and Technology, Shanghai 200237, China
Download: PDF (1288 KB)   HTML ( 16
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper analyzes the semantic drift of domain terms with machine learning techniques. It recognizes and visualizes interdisciplinary semantic drifts and explores their patterns and causes. [Methods] We designed a framework for identifying and visualizing the semantic drift of domain terms with deep learning algorithms. The framework combined algorithms of “SBERT model+word embedding optimization+hierarchical clustering” to identify interdisciplinary semantic drift. It also utilized Bokeh and principal component analysis to visualize the phenomenon of interdisciplinary semantic drift. [Results] The proposed framework can accurately identify interdisciplinary semantic drift, and the overall recognition accuracy (p) in the DT-Sentence dataset reached 86.15%. [Limitations] The framework needs to be verified with more disciplines’ datasets. [Conclusions] This study benefits data mining and visualization of semantic drifts. It also lays the technical foundation for semantic evolution, understanding, and modeling.

Key wordsSemantic Drift      Textual Analysis      BERT-Whitening     
Received: 19 June 2022      Published: 22 March 2023
ZTFLH:  TP391  
  G255  
Fund:Fundamental Research Funds for the Central Universities of Ministry of Education of China(222202226002)
Corresponding Authors: Wang Bo,ORCID:0000-0001-5222-950X,E-mail:wang.bo.tianwen@qq.com。   

Cite this article:

Li Nan, Wang Bo. Recognition and Visual Analysis of Interdisciplinary Semantic Drift. Data Analysis and Knowledge Discovery, 2023, 7(10): 15-24.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0635     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I10/15

Semantic Drift Technology Framework
术语 词频次数 术语 词频次数
图书馆 12 974 情报 1 418
信息 4 610 学科 1 332
用户 2 795 评价 1 290
高校 2 793 图书 1 124
网络 2 278 期刊 998
文献 2 268 模式 984
数据 1 995 专利 982
资源 1 818 读者 949
数字 1 658 技术 915
模型 1 556 舆情 913
High Frequency Candidate Term Top20 Data
术语 定义 一级学科 来源
半衰期 在单一的放射性衰变中,放射性活度降至其原有值的一半时所需要的时间 计量学 《计量学名词》
半衰期 反应物浓度降低至初浓度一半所需要的时间。在药动学中主要指药物吸收或消除一半所需的时间 药学 《药学名词》
半衰期 仅含一种放射性核素的样品的放射性活度降至其初始值一半所需要的时间 化学 《化学名词》
本体 对概念体系的明确的、形式化、可共享的规范说明。定义了组成主题领域的词汇表的基本术语及其关系,以及结合这些术语和关系来定义词汇表外延的规则 图书馆·情报与文献学 《图书馆·情报与文献学名词》
本体 对一个论域中存在的概念及其关系和性质的可共享的、形式化的、显现的描述;使用该本体所建立的模型可以被其他人员或系统共享 计算机科学技术 《计算机科学技术名词》
本体 具气囊花粉或具假囊孢子除气囊或假囊之外的部分 植物学 《植物学名词》
Example of Term Definition Data
术语 定义 定义类别 类别数
外包 动物囊胚发生原肠作用过程中以整片细胞(通常为外胚层细胞)为单位沿胚胎表面扩展,包裹胚体的深层细胞 0 2
外包 组织依赖于外部资源从事部件生产和其他增值活动,通过整合和利用外部资源以降低成本、强化核心能力和提升竞争优势 1
外包 把一个机构的内部IT基础设施、工作、进程或应用转包给一个拥有资源的外部机构 1
文献 记录有知识和信息的一切载体。由4个要素组成:所记录的知识和信息、记录知识和信息的符号、用于记录知识和信息的物质载体、记录的方式或手段 0 1
文献 用文字、图形、符号、声频、视频等技术手段记录人类知识的一种载体。具有历史意义或研究价值 0
Examples of Manual Annotation
Hierarchical Clustering Example
方法 P/% R/% F1/%
RoBERTa 85.05 88.48 86.71
DistilBERT 85.77 88.97 87.33
SBERT+Whitening(本文) 86.15 91.06 88.59
Experimental Results of Semantic Drift Recognition
术语类型 语义稳定
词数
语义稳定词
占比/%
语义漂移
词数
语义漂移词
占比/%
两种定义术语 123 64.06 69 35.94
三种定义术语 38 45.78 45 54.22
4种定义术语 13 32.50 27 67.50
大于4种术语 13 15.12 70 84.88
Distribution of Semantically Stable Words and Semantically Drift Words
术语 语义
漂移度
术语 语义
漂移度
术语 语义
漂移度
用户 0.421 网站 0.184 互联网 0.292
文献 0.203 指标 0.224 项目 0.258
情报 0.391 协同 0.271 数据源 0.436
评价 0.322 可视化 0.210 核心 0.140
图书 0.275 质量 0.493 版权 0.252
技术 0.199 智库 0.149 计量 0.456
情报学 0.129 社会化 0.340 标准 0.118
图书馆学 0.062 资源共享 0.276 危机 0.344
主题 0.292 视角 0.213 主体 0.354
政府 0.350 书目 0.269 隐性 0.007
Semantic Drift of Terms
Examples of Interdisciplinary Semantic Drift
Subject Word Cloud Graph
[1] Yan E, Zhu Y J. Tracking Word Semantic Change in Biomedical Literature[J]. International Journal of Medical Informatics, 2018, 109: 76-86.
doi: S1386-5056(17)30418-5 pmid: 29195709
[2] 李轶. 四险一金领域术语语义漂移研究[D]. 哈尔滨: 哈尔滨工程大学, 2020.
[2] (Li Yi. Research on Semantics Shifts of Terminology in Domain of Social Insurance and Housing Fund[D]. Harbin: Harbin Engineering University, 2020.)
[3] 陈果, 陈晶, 肖璐. 词汇语义链:领域分析视角下的词汇语义挖掘理论框架[J]. 情报理论与实践, 2022, 45(4): 170-176, 183.
[3] (Chen Guo, Chen Jing, Xiao Lu. Lexical Semantic Chain: A Theoretical Framework for Lexical Semantic Mining in the Perspective of Domain Analysis[J]. Information Studies: Theory & Application, 2022, 45(4): 170-176, 183.)
[4] Vylomova E, Murphy S, Haslam N, et al. Evaluation of Semantic Change of Harm-Related Concepts in Psychology[C]// Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. 2019: 29-34.
[5] 李旭晖, 吴青峰. 面向事件的视频语义表示方法[J]. 图书情报工作, 2020, 64(10): 99-108.
doi: 10.13266/j.issn.0252-3116.2020.10.011
[5] (Li Xuhui, Wu Qingfeng. Research on Video Semantic Representation for Events[J]. Library and Information Service, 2020, 64(10): 99-108.)
doi: 10.13266/j.issn.0252-3116.2020.10.011
[6] 曲佳彬, 欧石燕. 语义出版驱动的科学论文论证结构语义建模研究[J]. 现代情报, 2021, 41(12): 48-59.
doi: 10.3969/j.issn.1008-0821.2021.12.005
[6] (Qu Jiabin, Ou Shiyan. Semantic Modeling for Scientific Paper Argumentation Structure Driven by Sematic Publishing[J]. Journal of Modern Information, 2021, 41(12): 48-59.)
doi: 10.3969/j.issn.1008-0821.2021.12.005
[7] Jatowt A, Duh K. A Framework for Analyzing Semantic Change of Words Across Time[C]// Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. 2014: 229-238.
[8] Hamilton W L, Leskovec J, Jurafsky D. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2016: 1489-1501.
[9] Kanjirangat V, Mitrovic S, Antonucci A, et al. SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-Based Embedding Spaces[C]// Proceedings of the 14th International Workshops on Semantic Evaluation. 2020: 214-221.
[10] 李瑛, 文旭. 从“头”认知——转喻、隐喻与一词多义现象研究[J]. 外语教学, 2006, 27(3): 1-5.
[10] (Li Ying, Wen Xu. Cognition from “Head”—— Research on Polysemy with Metonymy and Metaphor[J]. Foreign Language Education, 2006, 27(3): 1-5.)
[11] 吴淑琼, 刘迪麟, 冉苒. 心理动词“想”的多义性:基于语料库的行为特征分析[J]. 外语与外语教学, 2021(5): 1-13.
[11] (Wu Shuqiong, Liu Dilin, Ran Ran. The Polysemy of the Mental Verb Xiang “Think”: A Corpus-Based Behavioral Profile Analysis[J]. Foreign Languages and Their Teaching, 2021(5): 1-13.)
[12] Rosch E H. Natural Categories[J]. Cognitive Psychology, 1973, 4(3): 328-350.
doi: 10.1016/0010-0285(73)90017-0
[13] Wang S H, Schlobach S, Klein M. Concept Drift and How to Identify It[J]. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 2011, 9(3): 247-265.
doi: 10.1016/j.websem.2011.05.003
[14] Chen B T, Ding Y, Ma F C. Semantic Word Shifts in a Scientific Domain[J]. Scientometrics, 2018, 117(1): 211-226.
doi: 10.1007/s11192-018-2843-2
[15] 王忠义, 涂悦, 夏立新. 科技文献资源中学科知识漂移研究[J]. 情报理论与实践, 2021, 44(6): 118-124.
doi: 10.16353/j.cnki.1000-7490.2021.06.017
[15] (Wang Zhongyi, Tu Yue, Xia Lixin. Research on Subject Knowledge Drift in Scientific Literature Resources[J]. Information Studies: Theory & Application, 2021, 44(6): 118-124.)
doi: 10.16353/j.cnki.1000-7490.2021.06.017
[16] 潘俊, 吴宗大. 知识发现视角下词汇历时语义挖掘与可视化研究[J]. 情报学报, 2021, 40(10): 1052-1064.
[16] (Pan Jun, Wu Zongda. Diachronic Semantic Mining and Visualization of Chinese Words: A Knowledge Discovery Perspective[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(10): 1052-1064.)
[17] 张瑞, 赵栋祥, 唐旭丽, 等. 知识流动视角下学术名词的跨学科迁移与发展研究[J]. 情报理论与实践, 2020, 43(1): 47-55, 75.
[17] (Zhang Rui, Zhao Dongxiang, Tang Xuli, et al. Research on Interdisciplinary Transfer and Development of Academic Terms from the Perspective of Knowledge Flow[J]. Information Studies: Theory & Application, 2020, 43(1): 47-55, 75.)
[18] Xu J, Bu Y, Ding Y, et al. Understanding the Formation of Interdisciplinary Research from the Perspective of Keyword Evolution: A Case Study on Joint Attention[J]. Scientometrics, 2018, 117(2): 973-995.
doi: 10.1007/s11192-018-2897-1
[19] 钱升华. 基于孪生网络和BERT模型的主观题自动评分系统[J]. 计算机系统应用, 2022, 31(3): 143-149.
[19] (Qian Shenghua. Automatic Short Answer Grading Based on Siamese Network and BERT Model[J]. Computer Systems & Applications, 2022, 31(3): 143-149.)
[20] Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3982-3992.
[21] Li B H, Zhou H, He J X, et al. On the Sentence Embeddings from Pre-trained Language Models[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 9119-9130.
[22] 白思萌, 牛振东, 何慧, 等. 基于超图注意力网络的生物医学文本分类方法[J]. 数据分析与知识发现, 2022, 6(11): 13-24.
[22] (Bai Simeng, Niu Zhendong, He Hui, et al. Biomedical Text Classification Method Based on Hypergraph Attention Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 13-24.)
[1] Wu Sisi, Ma Jing. Multi-task & Multi-modal Sentiment Analysis Model Based on Aware Fusion[J]. 数据分析与知识发现, 2023, 7(10): 74-84.
[2] Shi Lili, Lin Jun, Zhu Guiyang. Extracting Product Features and Analyzing Customer Needs from Chinese Online Reviews with Hybrid Neural Network[J]. 数据分析与知识发现, 2023, 7(10): 63-73.
[3] Yu Jiaqi, Zhao Doudou, Liu Rui. Examining Topics and Sentiments of Chronic Disease Patients’ Online Reviews — Case Study of “Sweet Homeland”[J]. 数据分析与知识发现, 2023, 7(10): 95-108.
[4] Gao Feng, Yang Zihang, Hou Jin, Gu Jinguang, Cheng Junjun. Constructing and Evaluating Chinese Reading Comprehension Corpus for Anti-Terrorism Field[J]. 数据分析与知识发现, 2023, 7(10): 131-143.
[5] Zhang Yanqiong, Zhu Zhaosong, Zhao Xiaochi. Constructing Multimodal Corpus of Chinese Vocabulary for Sign Language Linguistics[J]. 数据分析与知识发现, 2023, 7(10): 144-155.
[6] Pan Xiaoyu, Ni Yuan, Jin Chunhua, Zhang Jian. Extracting Value Elements and Constructing Index System for Calligraphy Works Based on Hyperplane-BERT-Louvain Optimized LDA Model[J]. 数据分析与知识发现, 2023, 7(10): 109-118.
[7] He Li, Yang Meihua, Liu Luyao. Detecting Events with SPO Semantic and Syntactic Information[J]. 数据分析与知识发现, 2023, 7(9): 114-124.
[8] Xu Chen, Zhang Wei. Detecting Crowdfunding Frauds Based on Textual and Imbalanced Data[J]. 数据分析与知识发现, 2023, 7(9): 125-135.
[9] Pu Xianghe, Wang Hongbin, Xian Yantuan. Few-Shot Knowledge Graph Completion Combined with Type-Aware Attention[J]. 数据分析与知识发现, 2023, 7(9): 51-63.
[10] Xiang Zhuoyuan, Chen Hao, Wang Qian, Li Na. Few-Shot Language Understanding Model for Task-Oriented Dialogues[J]. 数据分析与知识发现, 2023, 7(9): 64-77.
[11] He Chaocheng, Huang Qian, Li Xinru, Wang Chunying, Wu Jiang. Trending Topics on Metaverse: A Microblog Text Analysis with BERT and DTM[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[12] Bao Tong, Zhang Chengzhi. Extracting Chinese Information with ChatGPT:An Empirical Study by Three Typical Tasks[J]. 数据分析与知识发现, 2023, 7(9): 1-11.
[13] Zhang Yingyi, Zhang Chengzhi, Zhou Yi, Chen Bikun. ChatGPT-Based Scientific Paper Entity Recognition: Performance Measurement and Availability Research[J]. 数据分析与知识发现, 2023, 7(9): 12-24.
[14] Zhai Dongsheng, Lou Ying, Kan Huimin, He Xijun, Liang Guoqiang, Ma Zifei. Constructing TCM Knowledge Graph with Multi-Source Heterogeneous Data[J]. 数据分析与知识发现, 2023, 7(9): 146-158.
[15] Han Pu, Gu Liang, Ye Dongyu, Chen Wenqi. Recognizing Chinese Medical Literature Entities Based on Multi-Task and Transfer Learning[J]. 数据分析与知识发现, 2023, 7(9): 136-145.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn