Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (9): 133-144    DOI: 10.11925/infotech.2096-3467.2020.0192
Current Issue | Archive | Adv Search |
Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings
Wu Jinming1,Hou Yuefang2,Cui Lei2()
1Institute of Medical Information/Medical Library, Chinese Academy of Medical Science & Peking Union Medical College, Beijing 100020, China
2College of Medical Informatics, China Medical University, Shenyang 110122, China
Download: PDF (881 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study proposes an automatic procedure to present the clustering results, aiming to promote the development of co-word clustering analysis.[Methods] First, we examined the indexing rules of neoplastic diagnosis and chose 10 common neoplasms as sample sets for co-occurrence clustering analysis. Then, we reviewed the results and combined the indexing rules to identify the semantic types / subheading combination patterns of high-frequency subject headings. Third, we developed a python application to automatically interpret the clustering results for four groups of neoplasms. Finally, we invited 12 experts to evaluate the accuracy, comprehensiveness, practicality, comprehensibility and simplicity of the presentation.[Results] We found 30 indexing patterns of neoplastic diagnosis as well as 98 combination semantic patterns. The scores of the accuracy, comprehensiveness, practicality, comprehensibility and simplicity were 4.282, 4.435, 4.209, 4.457, and 4.206 out of 5.[Limitations] It was difficult to reveal the “hidden relations” among the subject headings with the proposed method.[Conclusions] Our new method could effectively present results of co-occurrence clustering analysis for medical records.

Key wordsCo-word Analysis      Clustering Analysis      Cluster Description      Knowledge Expression      Automatic Description     
Received: 16 March 2020      Published: 17 June 2020
ZTFLH:  G202  
Corresponding Authors: Cui Lei     E-mail: lcui@cmu.edu.cn

Cite this article:

Wu Jinming,Hou Yuefang,Cui Lei. Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings. Data Analysis and Knowledge Discovery, 2020, 4(9): 133-144.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0192     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I9/133

Research Framework
检索策略 类型 检索篇数 高频词阈值 高频词数 类数 规则数
"Abdominal Neoplasms/diagnosis"[Majr] 训练集 6 138 90 30 7 5
"Bone Neoplasms/diagnosis"[Majr] 训练集 3 302 46 41 7 6
"Breast Neoplasms/diagnosis"[Majr] 训练集 3 464 61 40 6 12
"Digestive System Neoplasms/diagnosis"[Majr] 训练集 6 783 150 25 7 4
"Endocrine Gland Neoplasms/diagnosis"[Majr] 训练集 8 283 173 25 6 5
"Eye Neoplasms/diagnosis"[Majr] 训练集 5 713 84 30 6 3
"Head and Neck Neoplasms/diagnosis"[Majr] 训练集 7 777 175 25 5 4
"Skin Neoplasms/diagnosis"[Majr] 训练集 2 830 32 44 9 7
"Thoracic Neoplasms/diagnosis"[Majr] 训练集 8 074 161 26 5 9
"Urogenital Neoplasms/diagnosis"[Majr] 训练集 8 107 187 27 7 8
"Lung Neoplasms/diagnosis"[Majr] 验证集 5 159 83 40 9 /
"Stomach Neoplasms/diagnosis"[Majr] 验证集 3 789 51 35 7 /
"Prostatic Neoplasms/diagnosis"[Majr] 验证集 3 987 61 39 7 /
"Thyroid Neoplasms/diagnosis"[Majr] 验证集 5 324 99 35 6 /
Retrieval Strategy and Results of Training Set and Validation Set
标引规则 标引含义
(器官肿瘤)/相同的副主题词+(组织学类型)/相同的副主题词 表示某组织学类型的某器官肿瘤的某一方面
(原发肿瘤)/病理学+(转移肿瘤)/继发性+(组织学类型)/继发性 肿瘤的继发
(原发肿瘤)/病理学+(组织学类型)/病理学+(被肿瘤)/病理学+肿瘤浸润 肿瘤的浸润
(疾病A)/诊断+(疾病B)/诊断+诊断,鉴别 疾病A、B的鉴别诊断
(疾病)/诊断或下位词+(专指的诊断方法)/方法 应用某技术诊断某疾病
(疾病)/诊断显像+(器官)/诊断显像+(专指诊断显像技术)(NLM或不标引) 应用某显像技术在某解剖学部位诊断某疾病
(疾病)/诊断+生物标记/相应的副主题词+(内源性物质)/相应的副主题词 某内源性物质作为生物标记诊断某疾病
(肿瘤)/病理学+肿瘤分期 表示肿瘤的分期
(放射性核素显像技术)/方法+(放射性同位素)+放射性药物 某放射性同位素以放射性药物的方式进行投药应用于某放射性核素显像技术
(疾病)/诊断+临床酶试验+(器官)/酶学 应用临床酶试验诊断某器官部位疾病
Indexing Patterns of Neoplastic Diagnosis(Partial)
Rule Base Construction Process
正式规则 语义注释
12000(器官肿瘤)/诊断+(组织学类型)/诊断+(诊断技术)/方法 应用(某种诊断技术)诊断组织学类型为(某组织学类型)的(某器官肿瘤)。
12160(器官肿瘤A)/诊断显像+(组织学类型)/诊断显像+(诊断显像技术)/方法+(器官肿瘤A)/病理学 通常是指应用(某显像诊断技术)对组织学类型为(某组织学类型)的(某器官肿瘤)进行诊断,获取肿瘤的病理学信息(如肿瘤的临床分期、肿瘤的恶性程度、病变的范围等)或进行病理学相关研究等。
13000(器官肿瘤)/诊断+(组织学类型)/诊断+肿瘤标记/相应副主题词 通过检测肿瘤标记物来诊断组织学类型为(某组织学类型)的(某器官肿瘤)。
13200(器官肿瘤A)/诊断+(组织学类型)/诊断+肿瘤标记/相应副主题词+(器官肿瘤A)/治疗 通常是指某物质可作为标志物在临床上诊断组织学类型为(某组织学类型)的(某器官肿瘤),辅助临床治疗,或检测肿瘤治疗效果,或判断肿瘤预后以及群体随访观察或其他。
02141(疾病)/诊断显像+(放射性核素显像技术)/方法+(具体放射性药物)+放射性药物 应用放射性药物(某具体药物)辅助(某种放射性核素显像技术)来诊断(某疾病)。
02180(肿瘤A)/诊断显像+(诊断显像技术)/方法 +(肿瘤A)/治疗 通常是指应用(某诊断显像技术)诊断(某肿瘤),指导放疗或外科治疗等治疗计划的制定;或进行术后或放疗后残余和/或复发的早期诊断;或进行治疗效果以及预后的评估或其他。
03000(疾病)/诊断 +生物标记 通过检测分析生物标记物来诊断(某疾病)。
03100(疾病)/诊断+生物标记+(内源性物质) (某内源性物质)可作为标志物在临床上诊断(某疾病)。
08000(疾病)/流行病学+普查+早期发现 通常是指对(某疾病)进行早期诊断和普查以及进行流行病学相关研究等。
Semantic Type / Subheading Combination Patterns of Neoplastic Diagnosis (Partial)
类别 主要主题词/副主题词 程序运行结果
Cluster 0 Radiopharmaceuticals
Fluorodeoxyglucose F18
Positron-Emission Tomography
Tomography, X-Ray Computed
Multimodal Imaging
(1)实施正电子发射断层显像术时应用放射性药物氟脱氧葡萄糖F18。
(2)进行多模态成像时应用了X线体层摄影术。
(3)进行多模态成像时应用了正电子发射断层显像术。
Cluster 1 Algorithms
Radiographic Image Interpretation, Computer-Assisted/methods
Tomography, X-Ray Computed/methods
Solitary Pulmonary Nodule/diagnostic imaging
(1)应用X线体层摄影术诊断肺硬币病变。
(2)应用计算机辅助放射摄影影像解释技术诊断肺硬币病变。
Cluster 2 Mesothelioma/diagnosis
Biomarkers, Tumor/metabolism
Carcinoma, Squamous Cell/diagnosis
Adenocarcinoma/diagnosis
(1)通过检测分析肿瘤标志物诊断间皮瘤。
(2)通过检测分析肿瘤标志物诊断鳞状细胞癌。
(3)通过检测分析肿瘤标志物诊断腺癌。
Cluster 3 Lung Neoplasms/diagnosis
Carcinoma, Non-Small-Cell Lung/diagnosis
Lung Neoplasms/Therapy
Biomarkers, Tumor/analysis
Lung Neoplasms/genetics
(1)某种遗传物质或相关产物等可作为标志物在临床上诊断肺肿瘤,辅助临床治疗,或检测肿瘤治疗效果,或判断肿瘤预后以及群体随访观察或其他。
(2)通过检测分析肿瘤标志物诊断非小细胞肺癌。
Cluster 4 Early Detection of Cancer/methods
Mass Screening/methods
Early Detection of Cancer
Lung Neoplasms/epidemiology
(1)对肺肿瘤进行早期诊断和普查以及进行流行病学相关研究等。
Cluster 5 Lung Neoplasms /pathology
Carcinoma, Non-Small-Cell Lung/pathology
Carcinoma, Non-Small-Cell Lung/diagnostic imaging
Lung Neoplasms/diagnostic imaging
Lung Neoplasms/radiotherapy
Position-Emission Tomography/methods
(1)应用正电子发射断层显像术对肺肿瘤进行诊断,获取肿瘤的病理学信息(如肿瘤的临床分期、肿瘤的恶性程度、病变的范围等)或进行病理学相关研究等。
(2)应用正电子发射断层显像术对非小细胞肺癌进行诊断,获取肿瘤的病理学信息(如肿瘤的临床分期、肿瘤的恶性程度、病变的范围等)或进行病理学相关研究等。
(3)应用正电子发射断层显像术对肺肿瘤进行诊断,并指导放疗计划的制定;或进行放疗后残余和/或复发的早期诊断;或进行治疗效果以及预后的评估或其他。
Cluster 6 Lung/diagnostic imaging
Lung/pathology
Bronchoscopy/methods
Bronchial Neoplasms/diagnosis
(1)通过支气管镜检查在肺等相关部位诊断支气管肿瘤,获取肿瘤的病理学信息(如肿瘤的临床分期、肿瘤的恶性程度、病变的范围等)或进行病理学相关研究等。
Cluster 7 Solitary Pulmonary/diagnosis
Lung Neoplasms/surgery
Adenocarcinoma/diagnostic imaging
Lung Neoplasms/secondary
(1)肺硬币病变的诊断,可能包含检查、鉴别诊断及预后等。
(2)肺肿瘤的手术治疗。
(3)腺癌的显像诊断,包括放射诊断、超声诊断等。
(4)其他器官肿瘤转移后生成肺肿瘤。
Cluster 8 Biomarkers, Tumor/blood
Small Cell Lung Carcinoma/diagnosis
Lung Neoplasms/metabolism
Lung Neoplasms/drug therapy
(1)通过检测分析肿瘤标志物诊断小细胞肺癌。
(2)某物质可作为标志物在临床上诊断肺肿瘤,可能作为临床药物治疗的治疗靶标,或检测肿瘤药物治疗效果,或判断肿瘤预后以及群体随访观察或其他。
Co-occurrence Clustering Results Presentation of Lung Neoplastic Diagnosis Subject Headings
主题 准确性 全面性 实用性 易理解性 简洁性
肺肿瘤诊断 4.444 4.519 4.074 4.629 4.111
胃肿瘤诊断 4.333 4.571 4.238 4.381 4.476
前列腺肿瘤诊断 4.238 4.429 4.524 4.429 4.238
甲状腺肿瘤诊断 4.111 4.222 4.000 4.389 4.000
平均分 4.282 4.435 4.209 4.457 4.206
Expert Evaluation Results
Cluster 0 of Co-occurrence Clustering Results of Lung Neoplastic Diagnosis Subject Headings
[1] 崔雷, 隋明爽. 共现聚类分析结果表达方式的研究[J]. 情报学报, 2015,34(12):1270-1277.
[1] ( Cui Lei, Sui Mingshuang. Study on an Approach to Presenting the Co-word Clustering Analysis Results[J]. Journal of Library Science in China, 2015,34(12):1270-1277.)
[2] Zhou Q J, Leng F H, Leydesdorff L. The Reflection of Hierarchical Cluster Analysis of Co-occurrence Matrices in SPSS[J]. Chinese Journal of Library and Information Science, 2015,8(2):11-24.
[3] Rasmussen M, Karypis G. gCLUTO-An Interactive Clustering, Visualization, and Analysis System[R]. UMN-CS TR-04-021, 2004.
[4] Song Y, Liu B, Chen X, et al. Atmospheric Pollution Mapping of the Yangtze River Basin: An AQI-based Weighted Co-word Analysis[J]. International Journal of Environmental Research and Public Health, 2020,17(3):817.
[5] Xing Y N, Wang Y B, Zhang W, et al. The Hotspots Analysis of Education and Management of Childhood Asthma Based on Cluster Analysis Method[J]. Studies in Health Technology and Informatics, 2019,264:1618-1619.
pmid: 31438260
[6] Yang A L, Lv Q Q, Chen F, et al. Identification of Recent Trends in Research on Vitamin D: A Quantitative and Co-word Analysis[J]. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research, 2019,25:643-655.
[7] Callon M, Courtial J P, Turner W A, et al. From Translations to Problematic Networks: An Introduction to Co-word Analysis[J]. Social Science Information, 1983,22(2):191-235.
[8] Ding Y, Chowdhury G G, Foo S. Bibliometric Cartography of Information Retrieval Research by Using Co-word Analysis[J]. Information Processing & Management, 2001,37(6):817-842.
[9] 钟伟金, 李佳. 共词分析法研究(二)——类团分析[J]. 情报杂志, 2008,27(6):141-143.
[9] ( Zhong Weijin, Li Jia. The Research of Co-word Analysis (2)[J]. Journal of Information, 2008,27(6):141-143.)
[10] 钟伟金. 共词聚类分析法的类团实例研究——对肿瘤治疗热点主题的分析[J]. 中华医学图书情报杂志, 2009,18(2):48-53.
[10] ( Zhong Weijin. Clustered Word Group in Co-word Cluster Analysis of Hot Subject Terms of Tumor Therapy[J]. Chinese Journal of Medical Library and Information Science, 2009,18(2):48-53.)
[11] 赵兴烈. 医学文献主题标引[M]. 北京: 首都医学院图书馆, 1985.
[11] ( Zhao Xinglie. Subject Indexing of Medical Literature[M]. Beijing: Capital Medical University Library, 1985.)
[12] 李守凉. 生物医学文献主题标引[M]. 长沙: 湖南科学技术出版社, 1992.
[12] ( Li Shouliang. Subject Indexing of Biomedical Literature[M]. Changsha: Hunan Science & Technology Press, 1992.)
[13] 肖晓旦, 张士靖. 医学文献主题标引[M]. 北京: 高等教育出版社, 2006.
[13] ( Xiao Xiaodan, Zhang Shijing. Subject Indexing of Medical Literature[M]. Beijing: Higher Education Press, 2006.)
[14] 李丹亚, 胡铁军, 诸文雁, 等. 中文医学主题词表检索系统[J]. 中华医学图书馆杂志, 2001,10(4):1-2,9.
[14] ( Li Danya, Hu Tiejun, Zhu Wenyan, et al. Retrieval System for the Chinese Medical Subject Headings[J]. Chinese Journal of Medical Library, 2001,10(4):1-2,9.)
[15] 崔雷, 刘伟, 闫雷, 等. 文献数据库中书目信息共现挖掘系统的开发[J]. 现代图书情报技术, 2008(8):70-75.
[15] ( Cui Lei, Liu Wei, Yan Lei, et al. Development of a Text Mining System Based on the Co-occurrence of Bibliographic Items in Literature Databases[J]. New Technology of Library and Information Service, 2008(8):70-75.)
[16] 于跃, 徐志健, 王坤, 等. 基于双聚类方法的生物医学信息学文本数据挖掘研究[J]. 图书情报工作, 2012,56(18):133-136.
[16] ( Yu Yue, Xu Zhijian, Wang Kun, et al. Text Data Mining in Biomedical Informatics Based on Biclustering Method[J]. Library and Information Service, 2012,56(18):133-136.)
[17] 方丽, 崔雷. 利用双聚类算法探测学科前沿及知识基础——以h指数研究领域为例[J]. 情报理论与实践, 2014,37(11):55-60.
[17] ( Fang Li, Cui Lei. Detection of Research Front and Intellectual Base Based on Biclustering Algorithm[J]. Information Studies: Theory & Application, 2014,37(11):55-60.)
[18] 李范, 李敏, 王丽, 等. 利用共词分析挖掘国际护理信息学研究热点[J]. 医学信息学志, 2014,35(9):48-53.
[18] ( Li Fan, Li Min, Wang Li, et al. Mining Research Hotpots of International Nursing Informatics by Co-word Analysis[J]. Journal of Medical Informatics, 2014,35(9):48-53.)
[19] Miñarro-Giménez J A, Kreuzthaler M, Schulz S. Knowledge Extraction from MEDLINE by Combining Clustering with Natural Language Processing[J]. AMIA Annual Symposium Proceedings, 2015: 915-924.
pmid: 14728421
[20] 钱庆, 李军莲. 中国生物医学文献数据库的知识管理[J]. 医学情报工作, 2004,25(5):347-349.
[20] ( Qian Qing, Li Junlian. Knowledge Management of Chinese Biomedical Literature Database[J]. Journal of Medical Intelligence, 2004,25(5):347-349.)
[21] Cimino J J, Barnett G O. Automatic Knowledge Acquisition from MEDLINE[J]. Methods of Information in Medicine, 1993,32(2):120-130.
[22] Wang L Q, Del Fiol G, Bray B E, et al. Generating Disease-pertinent Treatment Vocabularies from MEDLINE Citations[J]. Journal of Biomedical Informatics, 2017,65:46-57.
doi: 10.1016/j.jbi.2016.11.004 pmid: 27866001
[1] Qikai Cheng,Jiamin Wang,Wei Lu. Discovering Domain Vocabularies Based on Citation Co-word Network[J]. 数据分析与知识发现, 2019, 3(6): 57-65.
[2] Jiang Wu,Yinghui Zhao,Jiahui Gao. Research on Weibo Opinion Leaders Identification and Analysis in Medical Public Opinion Incidents[J]. 数据分析与知识发现, 2019, 3(4): 53-62.
[3] Hong Ma, Yongming Cai. A CA-LDA Model for Chinese Topic Analysis: Case Study of Transportation Law Literature[J]. 数据分析与知识发现, 2016, 32(12): 17-26.
[4] Li Gang, Ye Guanghui, Zhang Yan. Feature Recognition of Niche Expert——Empirical Analysis Based on MetaFilter Dataset[J]. 现代图书情报技术, 2015, 31(6): 71-77.
[5] Zhao Yuxiang,Peng Xixian. Media as a Community? Literature Based Topic Evaluation in Information Systems Discipline[J]. 现代图书情报技术, 2014, 30(1): 56-65.
[6] Hu Changping, Chen Guo. A New Feature Selection Method Based on Term Contribution in Co-word Analysis[J]. 现代图书情报技术, 2013, 29(7/8): 89-93.
[7] Tang Xiaobo, Xiao Lu. Research of Co-word Analysis Method of Combining Keywords Extension and Domain Ontology[J]. 现代图书情报技术, 2013, 29(11): 60-67.
[8] Ye Chunlei, Leng Fuhai. Theme Identification Empirical Study on Technical Documentation in Full-text[J]. 现代图书情报技术, 2012, 28(1): 53-57.
[9] Lu Wei, Peng Yu, Chen Wu. Hot Research Topics Detection Based on SOM[J]. 现代图书情报技术, 2011, 27(1): 63-68.
[10] Yang Ying, Cui Lei. Evolution of Topics About Medical Informatics by Improved Co-word Cluster Analysis[J]. 现代图书情报技术, 2011, 27(1): 83-87.
[11] Wang Lixue,Leng Fuhai,Wang Haixia. Research on Technology Readiness Level and Identified Methods[J]. 现代图书情报技术, 2010, 26(3): 58-63.
[12] Teng Guangqing,Bi Qiang. Market Segmentation of Digital Library Users Based on Concept Lattice ——Conceptual Clustering Analysis of Digital Library Users[J]. 现代图书情报技术, 2010, 26(2): 7-11.
[13] Chen Shiji. Survey of Approaches to Research Front Detection[J]. 现代图书情报技术, 2009, (9): 28-33.
[14] Wang Jiandong. Domestic Information Services Research Concept Network Analysis Based on Complex Network Method[J]. 现代图书情报技术, 2009, (10): 56-61.
[15] Zhao Qi,Zhang Zhixiong,Sun Tan. A Research on the Methodological of Text Visualization[J]. 现代图书情报技术, 2008, 24(8): 24-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn