|
|
Impacts of Chinese Term Granularity on Measuring Term Discriminative Capacity |
Xiong Xin1,2,Wang Hao1,2(),Zhang Haichao1,2,Zhang Baolong1,2 |
1School of Information Management, Nanjing University, Nanjing 210023, China 2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] This paper explores the granularity of Chinese terms from different fields, and then measures the Term Discriminative Capacity (TDC).[Methods] First, we used TDC to evaluate the quality of terms from four indexes. Then, we detected the differences in TDC among disciplines, fields and term granularity.[Results] In control group, the order of mean TDC was Title > Abstract > Keywords Plus > Keywords. In experimental group, the performance of Keywords Plus was improved, thus Title > Keywords Plus > Abstract > Keywords.[Limitations] We only collected data from five disciplines in Humanities and Social sciences.[Conclusions] Both Chinese term granularity and source fields influence the Term Discriminative Capacity. We should standarize term granularity to reduce the impact of fields.
|
Received: 10 June 2019
Published: 26 April 2020
|
|
Corresponding Authors:
Hao Wang
E-mail: ywhaowang@nju.edu.cn
|
[1] |
马利 . 社科学术论文中关键词的标引[J]. 中央民族大学学报:哲学社会科学版, 2007,34(4):133-136.
|
[1] |
( Ma Li . The Mark of Key Words in Social Academic Articles[J]. Journal of the Central University for Nationalities: Philosophy and Social Sciences Edition, 2007,34(4):133-136.)
|
[2] |
马张华 . 简论标引用词和检索用词的差别[J]. 大学图书馆学报, 1997, 15(4): 59,61.
|
[2] |
( Ma Zhanghua . A Brief Discussion on the Differences Between Indexing Words and Retrieval Words[J]. Journal of Academic Libraries, 1997, 15(4): 59,61.)
|
[3] |
Garfield E . Current Contents[J]. Current Contents, 1990(32):295-299.
|
[4] |
储荷婷 . 索引工作自动化:自动标引的主要方法[J]. 情报学报, 1993,12(3):218-229.
|
[4] |
( Chu Heting . Automation of Indexing: On the Major Approaches to Automatic Indexing[J]. Journal of the China Society for Scientific and Technical Information, 1993,12(3):218-229.)
|
[5] |
Salton G, Yang C S, Yu C T . A Theory of Term Importance in Automatic Text Analysis[J]. Journal of the American Society for Information Science, 1975,26(1):33-44.
|
[6] |
Salton G . Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer[M]. Addison-Wesley, 1989.
|
[7] |
Luhn H P . A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J]. IBM Journal of Research and Development, 1957,1(4):309-317.
|
[8] |
韩客松, 王永成 . 中文全文标引的主题词标引和主题概念标引方法[J]. 情报学报, 2001,20(2):212-216.
|
[8] |
( Han Kesong, Wang Yongcheng . Methods of Keyword and Subject Concept Indexing to Chinese Full-text[J]. Journal of the China Society for Scientific and Technical Information, 2001,20(2):212-216.)
|
[9] |
Hulth A . Improved Automatic Keyword Extraction Given More Linguistic Knowledge [C]// Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan. 2003: 216-223.
|
[10] |
Ercan G, Cicekli I . Using Lexical Chains for Keyword Extraction[J]. Information Processing and Management, 2007,43(6):1705-1714.
|
[11] |
Salton G, Buckley C . Automatic Text Structuring and Retrieval-Experiments in Automatic Encyclopedia Searching [C]//Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1991: 21-30.
|
[12] |
Matsuo Y, Ishizuka M . Keyword Extraction from a Single Document Using Word Co-occurrence Statistical Information[J]. International Journal on Artificial Intelligence Tools, 2004,13(1):157-169.
|
[13] |
Zhang K, Xu H, Tang J , et al. Keyword Extraction Using Support Vector Machine [C]// Proceedings of the 7th International Conference on Web-Age Information Management, Hong Kong, China. 2006: 85-96.
|
[14] |
Huang Z, Xu W, Yu K . Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
|
[15] |
苏新宁, 邹晓明 . 现代图书情报技术[J]. 现代图书情报技术, 2000(1):23-26.
|
[15] |
( Su Xinning, Zou Xiaoming . On Automatic Indexing of Documents[J]. New Technology of Library and Information Service, 2000(1):23-26.)
|
[16] |
章成志 . 现代图书情报技术[J]. 现代图书情报技术, 2007(11):33-39.
|
[16] |
( Zhang Chengzhi . Review and Prospect of Automatic Indexing Research[J]. New Technology of Library and Information Service, 2007(11):33-39.)
|
[17] |
Kim W, Aronson A R, Wilbur W J . Automatic MeSH Term Assignment and Quality Assessment [C]// Proceedings of the 2001 American Medical Informatics Association Annual Symposium, Washington, DC, USA. 2001.
|
[18] |
Wacholder N, Klavans J L, Evans D K . Evaluation of Automatically Identified Index Terms for Browsing Electronic Documents [C]// Proceedings of the 6th Conference on Applied Natural Language Processing. 2000: 302-309.
|
[19] |
Salton G, Yang C S . On the Specification of Term Values in Automatic Indexing[J]. Journal of Documentation, 1973,29(4):351-372.
|
[20] |
Salton G, Wong A . On the Role of Words and Phrases in Automatic Text Analysis[J]. Computers and the Humanities, 1976,10(2):69-87.
|
[21] |
Willett P . An Algorithm for the Calculation of Exact Term Discrimination Values[J]. Information Processing and Management, 1985,21(3):225-232.
|
[22] |
Ajiferuke I, Chu C M . Quality of Indexing in Online Databases: An Alternative Measure for a Term Discriminating Index[J]. Information Processing and Management, 1988,24(5):599-601.
|
[23] |
Fisher R A . Statistical Methods for Research Workers[M]. Oliver and Boyd, 1925.
|
[24] |
张海潮, 王昊, 唐慧慧 , 等. CRFs字角色标注方法在中文附加关键词抽取中的应用研究[J]. 情报理论与实践, 2019,42(2):169-176.
|
[24] |
( Zhang Haichao, Wang Hao, Tang Huihui , et al. Application of CRFs Chinese Character Role Labeling Method in Chinese Keywords Plus Extraction[J]. Information Studies: Theory & Application, 2019,42(2):169-176.)
|
[25] |
NLPIR 汉语分词系统[CP/OL]. [ 2018- 11- 26]. http://www.nlpir.org/.
|
[25] |
( NLPIR Chinese Word Segmentation System[CP/OL].[ 2018- 11- 26]. http://www.nlpir.org/
|
[26] |
中国科学技术信息研究所. 2018版中国科技期刊引证报告[R]. 北京: 中国科学技术信息研究所, 2018.
|
[26] |
( Institute of Scientific and Technical Information of China. The Statistical Report of Chinese Scientific and Technical Journals of 2018[R]. Beijing: Institute of Scientific and Technical Information of China, 2018.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|