Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (6): 91-108    DOI: 10.11925/infotech.2096-3467.2019.1224
Measurement and Distribution of Index Quality in Research Topics from Academic Databases
Li Keyu,Wang Hao(),Gong Lijuan,Tang Huihui
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper measures the quality of index terms from research topics in academic databases and explores their distribution characteristics. [Methods] We collected the index terms of research topics in humanities, society and natural sciences from Web of Science and CNKI. Then, we constructed terminology spaces based on research topics, domains and databases. Third, we used term discriminative capacity (TDC) to evaluate their quality. Finally, we conducted ANOVA testing to explore the distribution characteristics of index terms quality from different databases/domains. [Results] The index term quality of research topics followed the rules of “Abstract”> average level >“Keyword”. The “Title” of CNKI (“Keyword Plus” in Web of Science) were lower than “Abstract”, while the “Title” in WoS were lower than average. [Limitations] The amount of research topics in this study needs to be expanded. [Conclusions] The TDC measure method is stable and reliable, which helps us improve the information retrieval services and terms quality.

Key wordsIndexing Term      Term Discriminative Capability      ANOVA Testing      Search Fields      The Distribution Characteristics of Terms Quality     
Received: 08 November 2019      Published: 07 July 2020
ZTFLH:  TP391 G35  
Li Keyu,Wang Hao,Gong Lijuan,Tang Huihui. Measurement and Distribution of Index Quality in Research Topics from Academic Databases. Data Analysis and Knowledge Discovery, 2020, 4(6): 91-108.

Research Framework
主题序号 领域标识 研究主题 检索文献数(篇) 有效文献数(篇) 选用文献数(篇) 术语数量(个)
1 A&HCI Aristotle (亚里士多德) 2 897 323 100 2 727
2 Realism(现实主义) 5 374 1 026 100 2 555
3 Christianity(基督教) 4 364 743 100 3 247
4 SSCI Government failure(政府失效) 4 254 1 871 100 3 184
5 Population urbanization(人口城市化) 3 882 2 782 100 3 463
6 Economic depression(经济萧条) 4 988 3 208 100 3 430
7 SCI Petrology(岩石学) 6 887 4 542 100 4 740
8 Rubella(风疹) 4 940 2 709 100 3 913
9 Supersaturated solution(过饱和溶液) 5 072 2 745 100 3 377
10 CSSCI_A 文学批评 4 561 4 468 100 2 308
11 黑格尔 2 257 2 225 100 1 914
12 非物质文化遗产 2 405 2 334 100 1 958
13 CSSCI_S 通货膨胀 4 315 4 297 100 1 873
14 产业集聚 4 324 4 316 100 1 732
15 经济危机 4 455 4 367 100 1 905
16 CSCD 粒子群算法 4 553 4 552 100 1 889
17 细胞移植 5 912 5 799 100 2 021
18 配合物 5 240 5 171 100 2 185
Literature Search in WoS and CNKI
编号 含义 编号 含义
字段 1 Title 1 Title
2 Keyword 2 Keyword
3 Keyword Plus 3 Abstract
4 Abstract
领域 1 A&HCI 1 CSSCI_A
Symbolic Explanation
Frequency Histogram of TDV、TDC of Terms in A&HCI
Relationship Between TDV 、TDC and DF in A&HCI
One-Way ANOVA Results of TDC for Each Research Topic in A&HCI
One-Way ANOVA Results of TDC for Each Research Topic in CSSCI_A
One-Way ANOVA Results of TDC for Each Research Topic in SSCI
One-Way ANOVA Results of TDC for Each Research Topic in CSSCI_S
One-Way ANOVA Results of TDC for Each Research Topic in SCI
One-Way ANOVA Results of TDC for Each Research Topic in CSCD
Distribution of Positive and Negative Terms of Research Topics、Domains in WoS
One-Way ANOVA Results of TDC for Each Domain in WoS
Distribution of Positive and Negative Terms of Research Topics、Domains in CNKI
One-Way ANOVA Results of TDC for Each Domain in CNKI
Distribution of Positive and Negative Terms of WoS and CNKI
One-Way ANOVA Results of Field Factors
One-Way ANOVA Results of Domain Factors
Two-Way ANOVA Results with Domain、Field Factors as Fixed Factors
The Relationship Between M_TDC and the Number of Terms of Horizontal and Vertical Factors in WoS
The Relationship Between M_TDC and the Number of Terms of Domains in WoS
The Relationship Between M_TDC and the Number of Terms of Topics in WoS
