|
|
Study on Solution to Redundancy of Scientific Literature Keywords |
Xing Meifeng |
National Science Library, Chinese Academy of Sciences, Beijing 100190, China; Graduate University of Chinese Academy of Sciences, Bejing 100049, China; Jinzhong University Library, Jinzhong 030600, China |
|
|
Abstract Irregular keywords often cause high redundancy in the same research topic. To address the issue, this paper proposes an improved keywords selection algorithm based on similarity calculation. It re-segments keywords using field dictionary and common-sense knowledge database thesaurus. When the total semantic similarity is greater than a given threshold, the two compared keywords are considered to express the same meaning, then merging and keeping only one of them in library,which achieves the purpose of the dimension reduction. Finally, experimental results show the effectiveness of the method.
|
Received: 25 October 2011
Published: 26 February 2012
|
|
[1] Chua S, Kulathuramaiyer N. Semantic Feature Selection Using WordNet[C].In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence,Beijing,China.IEEE Computer Society, 2004: 166-172.[2] Li X B, Szpakowicz S,Matwin S.A WordNet-based Algorithm for Word Sense Disambiguation[C]. In: Proceedings of the IJCAI-95, Montreal, Canada. 1995:1368-1374.[3] 熊忠阳,付玲玲,张玉芳.文本分类中基于概念映射的二次特征降维方法[OL].[2011-03-10].http://www.cnki.net/kcms/detail/11.2127.TP.20110223.1435.007.html?uid=WEEvREcwSlJHSldRa3JPV0dvSFpWamplRWN1SW9vVW91ZlRaY0xY V2cxZFMzVVkzTkpOemo1cXN6ckVhNGx3PQ==.[4] 唐歆瑜,乐文忠,李志成. 基于知网语义相似度计算的特征降维方法研究[J]. 科学技术与工程 ,2006,6(21):3442-3446.[5] 董振东,董强.知网[DB/OL].[2011-02-10]. http://www.keenage.com.[6] 吕震宇,林永民,赵爽,等.基于同义词词林的文本特征选择与加权研究[J]. 情报杂志 , 2008,27(5):130-132.[7] 中华人民共和国国家标准.GB/T 7713.1-2006 学位论文编写规则[S].2006.[8] 马开俊.数字化建设中文献信息主题标引方式管见[J]. 情报资料工作 ,2004(Z1):355-356.[9] 谭慧华.CAJ- CD 关键词标引质量探析[J]. 情报杂志 ,2003,22(3):79-80.[10] 郭淑敏.医学期刊编辑中的关键词标引[J]. 中华医学科研管理杂志 ,2006,19(3):178-179.[11] 赵宗蔚.提高期刊论文关键词索引质量—自然语言与人工语言的结合[J]. 图书馆论坛 ,2005,25(1):119-121.[12] Jelinek F.Continuous Speech Recognition by Statistical Methods[J]. Proceedings of the IEEE,1976,64(4): 532-556.[13] Gao Y Q,Zhou B,Diao Z J,et al. MARS: A Statistical Semantic Parsing and Generation-based Multilingual Automatic Translation System[J]. Machine Translation,2002,21(2):185-212.[14] Koerich A L,Sabourin R, Suen C Y. Large Vocabulary Off-line Handwriting Recognition: A Survey[J].Pattern Analysis & Applications,2003,6(2):97-121.[15] Zheng C, Kai F L.A New Statistical Approach to Chinese Pinyin Input[C].In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics(ACL-2000),Hong Kong.2000.[16] Ponte J M, Croft W B. A Language Modeling Approach to Information Retrieval[C]. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval,New York, USA.1998:275-281.[17] 刘群,张华平,俞鸿魁,等. 基于层叠隐马模型的汉语词法分析[J]. 计算机研究与发展 ,2004,41(8):1421-1429.[18] Lucene[EB/OL].[2011-03-20].http://lucene.apache.org.[19] Kumar N, Srinathan K. Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique[C]. In: Proceedings of the 2008 ACM Symposium on Document Engineering,Sao Paulo,Brazil.2008:199-208.[20] ICTCLAS[EB/OL].[2011-05-01].http://ictclas.org/ictclas_files.html.[21] 刘群,李素建.基于知网的词汇语义相似度计算[C].见: 第三届汉语词汇语义学研讨会 ,台北. 2002. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|