1School of Information Management, Nanjing University, Nanjing 210023, China 2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper explores the granularity of Chinese terms from different fields, and then measures the Term Discriminative Capacity (TDC).[Methods] First, we used TDC to evaluate the quality of terms from four indexes. Then, we detected the differences in TDC among disciplines, fields and term granularity.[Results] In control group, the order of mean TDC was Title > Abstract > Keywords Plus > Keywords. In experimental group, the performance of Keywords Plus was improved, thus Title > Keywords Plus > Abstract > Keywords.[Limitations] We only collected data from five disciplines in Humanities and Social sciences.[Conclusions] Both Chinese term granularity and source fields influence the Term Discriminative Capacity. We should standarize term granularity to reduce the impact of fields.
熊欣,王昊,张海潮,张宝隆. 中文术语粒度对其区分能力测度的影响分析*[J]. 数据分析与知识发现, 2020, 4(2/3): 143-152.
Xiong Xin,Wang Hao,Zhang Haichao,Zhang Baolong. Impacts of Chinese Term Granularity on Measuring Term Discriminative Capacity. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 143-152.
( Ma Li . The Mark of Key Words in Social Academic Articles[J]. Journal of the Central University for Nationalities: Philosophy and Social Sciences Edition, 2007,34(4):133-136.)
( Ma Zhanghua . A Brief Discussion on the Differences Between Indexing Words and Retrieval Words[J]. Journal of Academic Libraries, 1997, 15(4): 59,61.)
[3]
Garfield E . Current Contents[J]. Current Contents, 1990(32):295-299.
( Chu Heting . Automation of Indexing: On the Major Approaches to Automatic Indexing[J]. Journal of the China Society for Scientific and Technical Information, 1993,12(3):218-229.)
[5]
Salton G, Yang C S, Yu C T . A Theory of Term Importance in Automatic Text Analysis[J]. Journal of the American Society for Information Science, 1975,26(1):33-44.
[6]
Salton G . Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer[M]. Addison-Wesley, 1989.
[7]
Luhn H P . A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J]. IBM Journal of Research and Development, 1957,1(4):309-317.
( Han Kesong, Wang Yongcheng . Methods of Keyword and Subject Concept Indexing to Chinese Full-text[J]. Journal of the China Society for Scientific and Technical Information, 2001,20(2):212-216.)
[9]
Hulth A . Improved Automatic Keyword Extraction Given More Linguistic Knowledge [C]// Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan. 2003: 216-223.
[10]
Ercan G, Cicekli I . Using Lexical Chains for Keyword Extraction[J]. Information Processing and Management, 2007,43(6):1705-1714.
[11]
Salton G, Buckley C . Automatic Text Structuring and Retrieval-Experiments in Automatic Encyclopedia Searching [C]//Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1991: 21-30.
[12]
Matsuo Y, Ishizuka M . Keyword Extraction from a Single Document Using Word Co-occurrence Statistical Information[J]. International Journal on Artificial Intelligence Tools, 2004,13(1):157-169.
[13]
Zhang K, Xu H, Tang J , et al. Keyword Extraction Using Support Vector Machine [C]// Proceedings of the 7th International Conference on Web-Age Information Management, Hong Kong, China. 2006: 85-96.
[14]
Huang Z, Xu W, Yu K . Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[15]
苏新宁, 邹晓明 . 现代图书情报技术[J]. 现代图书情报技术, 2000(1):23-26.
[15]
( Su Xinning, Zou Xiaoming . On Automatic Indexing of Documents[J]. New Technology of Library and Information Service, 2000(1):23-26.)
[16]
章成志 . 现代图书情报技术[J]. 现代图书情报技术, 2007(11):33-39.
[16]
( Zhang Chengzhi . Review and Prospect of Automatic Indexing Research[J]. New Technology of Library and Information Service, 2007(11):33-39.)
[17]
Kim W, Aronson A R, Wilbur W J . Automatic MeSH Term Assignment and Quality Assessment [C]// Proceedings of the 2001 American Medical Informatics Association Annual Symposium, Washington, DC, USA. 2001.
[18]
Wacholder N, Klavans J L, Evans D K . Evaluation of Automatically Identified Index Terms for Browsing Electronic Documents [C]// Proceedings of the 6th Conference on Applied Natural Language Processing. 2000: 302-309.
[19]
Salton G, Yang C S . On the Specification of Term Values in Automatic Indexing[J]. Journal of Documentation, 1973,29(4):351-372.
[20]
Salton G, Wong A . On the Role of Words and Phrases in Automatic Text Analysis[J]. Computers and the Humanities, 1976,10(2):69-87.
[21]
Willett P . An Algorithm for the Calculation of Exact Term Discrimination Values[J]. Information Processing and Management, 1985,21(3):225-232.
[22]
Ajiferuke I, Chu C M . Quality of Indexing in Online Databases: An Alternative Measure for a Term Discriminating Index[J]. Information Processing and Management, 1988,24(5):599-601.
[23]
Fisher R A . Statistical Methods for Research Workers[M]. Oliver and Boyd, 1925.
( Zhang Haichao, Wang Hao, Tang Huihui , et al. Application of CRFs Chinese Character Role Labeling Method in Chinese Keywords Plus Extraction[J]. Information Studies: Theory & Application, 2019,42(2):169-176.)
( Institute of Scientific and Technical Information of China. The Statistical Report of Chinese Scientific and Technical Journals of 2018[R]. Beijing: Institute of Scientific and Technical Information of China, 2018.)