Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 143-152    DOI: 10.11925/infotech.2096-3467.2019.0630
Impacts of Chinese Term Granularity on Measuring Term Discriminative Capacity
Xiong Xin1,2,Wang Hao1,2(),Zhang Haichao1,2,Zhang Baolong1,2
1School of Information Management, Nanjing University, Nanjing 210023, China
2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper explores the granularity of Chinese terms from different fields, and then measures the Term Discriminative Capacity (TDC).[Methods] First, we used TDC to evaluate the quality of terms from four indexes. Then, we detected the differences in TDC among disciplines, fields and term granularity.[Results] In control group, the order of mean TDC was Title > Abstract > Keywords Plus > Keywords. In experimental group, the performance of Keywords Plus was improved, thus Title > Keywords Plus > Abstract > Keywords.[Limitations] We only collected data from five disciplines in Humanities and Social sciences.[Conclusions] Both Chinese term granularity and source fields influence the Term Discriminative Capacity. We should standarize term granularity to reduce the impact of fields.

Key wordsTerm Discriminative Capacity      Term Granularity      Academic Literature Retrieval System      Automatic Indexing     
Received: 10 June 2019      Published: 26 April 2020
ZTFLH:  TP391  
Xiong Xin,Wang Hao,Zhang Haichao,Zhang Baolong. Impacts of Chinese Term Granularity on Measuring Term Discriminative Capacity. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 143-152.

Research Framework
序号 学科 学科简称 文献
1 哲学 PHI 8 160 3 861 47.32% 人文
2 历史学 HIS 7 341 3 624 49.37% 人文
3 经济学 ECO 34 255 19 149 55.90% 社科
4 社会学 SOC 4 622 2 268 49.07% 社科
5 图书馆、情报与文献学 LIS 10 285 6 440 62.62% 交叉
Documents and Effective Documents in Disciplines
字段(Field) 编号 简称
题名 1 TI
摘要 2 AB
关键词 3 KW
附加关键词 4 KP
Serial Numbers and Abbreviations of Fields

对照组 2 772 8 997 3 294 7 986 18 891
实验组 2 772 8 997 2 693 5 188 11 173
Numbers of Terms
字段 TI AB KW KP All
对照组 1.94 2.05 4.11 4.37 3.31
实验组 1.94 2.05 1.95 1.95 2.06
Average Length of Terms
Percentages of Short Terms
Scatter Plot of TDC by Field (Control Group)
Line Plots of One-way ANOVA Mean of TDC and Filed (Control Group)
Scatter Plot of TDC and Number by Filed (Experimental Group)
Line Plots of One-way ANOVA Mean of TDC and Filed (Experimental Group)
Line Plots and Term Granularity of Two-way ANOVA Mean
