Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (1): 34-39    DOI: 10.11925/infotech.1003-3513.2012.01.06
Current Issue | Archive | Adv Search |
Study on Solution to Redundancy of Scientific Literature Keywords
Xing Meifeng
National Science Library, Chinese Academy of Sciences, Beijing 100190, China; Graduate University of Chinese Academy of Sciences, Bejing 100049, China; Jinzhong University Library, Jinzhong 030600, China
Export: BibTeX | EndNote (RIS)      
Abstract  Irregular keywords often cause high redundancy in the same research topic. To address the issue, this paper proposes an improved keywords selection algorithm based on similarity calculation. It re-segments keywords using field dictionary and common-sense knowledge database thesaurus. When the total semantic similarity is greater than a given threshold, the two compared keywords are considered to express the same meaning, then merging and keeping only one of them in library,which achieves the purpose of the dimension reduction. Finally, experimental results show the effectiveness of the method.
Key wordsScientific literature keywords      Redundancy      Semantic similarity      Feature reduction     
Received: 25 October 2011      Published: 26 February 2012



Cite this article:

Xing Meifeng. Study on Solution to Redundancy of Scientific Literature Keywords. New Technology of Library and Information Service, 2012, 28(1): 34-39.

URL:     OR

[1] Chua S, Kulathuramaiyer N. Semantic Feature Selection Using WordNet[C].In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence,Beijing,China.IEEE Computer Society, 2004: 166-172.

[2] Li X B, Szpakowicz S,Matwin S.A WordNet-based Algorithm for Word Sense Disambiguation[C]. In: Proceedings of the IJCAI-95, Montreal, Canada. 1995:1368-1374.

[3] 熊忠阳,付玲玲,张玉芳.文本分类中基于概念映射的二次特征降维方法[OL].[2011-03-10]. V2cxZFMzVVkzTkpOemo1cXN6ckVhNGx3PQ==.

[4] 唐歆瑜,乐文忠,李志成. 基于知网语义相似度计算的特征降维方法研究[J]. 科学技术与工程 ,2006,6(21):3442-3446.

[5] 董振东,董强.知网[DB/OL].[2011-02-10].

[6] 吕震宇,林永民,赵爽,等.基于同义词词林的文本特征选择与加权研究[J]. 情报杂志 , 2008,27(5):130-132.

[7] 中华人民共和国国家标准.GB/T 7713.1-2006 学位论文编写规则[S].2006.

[8] 马开俊.数字化建设中文献信息主题标引方式管见[J]. 情报资料工作 ,2004(Z1):355-356.

[9] 谭慧华.CAJ- CD 关键词标引质量探析[J]. 情报杂志 ,2003,22(3):79-80.

[10] 郭淑敏.医学期刊编辑中的关键词标引[J]. 中华医学科研管理杂志 ,2006,19(3):178-179.

[11] 赵宗蔚.提高期刊论文关键词索引质量—自然语言与人工语言的结合[J]. 图书馆论坛 ,2005,25(1):119-121.

[12] Jelinek F.Continuous Speech Recognition by Statistical Methods[J]. Proceedings of the IEEE,1976,64(4): 532-556.

[13] Gao Y Q,Zhou B,Diao Z J,et al. MARS: A Statistical Semantic Parsing and Generation-based Multilingual Automatic Translation System[J]. Machine Translation,2002,21(2):185-212.

[14] Koerich A L,Sabourin R, Suen C Y. Large Vocabulary Off-line Handwriting Recognition: A Survey[J].Pattern Analysis & Applications,2003,6(2):97-121.

[15] Zheng C, Kai F L.A New Statistical Approach to Chinese Pinyin Input[C].In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics(ACL-2000),Hong Kong.2000.

[16] Ponte J M, Croft W B. A Language Modeling Approach to Information Retrieval[C]. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval,New York, USA.1998:275-281.

[17] 刘群,张华平,俞鸿魁,等. 基于层叠隐马模型的汉语词法分析[J]. 计算机研究与发展 ,2004,41(8):1421-1429.

[18] Lucene[EB/OL].[2011-03-20].

[19] Kumar N, Srinathan K. Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique[C]. In: Proceedings of the 2008 ACM Symposium on Document Engineering,Sao Paulo,Brazil.2008:199-208.

[20] ICTCLAS[EB/OL].[2011-05-01].

[21] 刘群,李素建.基于知网的词汇语义相似度计算[C].见: 第三届汉语词汇语义学研讨会 ,台北. 2002.
[1] Gong Lijuan,Wang Hao,Zhang Zixuan,Zhu Liping. Reducing Dimensions of Custom Declaration Texts with Word2Vec[J]. 数据分析与知识发现, 2020, 4(2/3): 89-100.
[2] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[3] Zhou Cheng,Wei Hongqin. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[4] Chen Erjing,Jiang Enbo. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[5] Wang Zixuan,Le Xiaoqiu,He Yuanbiao. Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[6] Zhai Dongsheng,Cai Wenhao,Zhang Jie,Li Zhenfei. An Improved Method of Semantic Similarity Calculation of Chinese Trademarks[J]. 数据分析与知识发现, 2017, 1(11): 19-28.
[7] Liu Jian,Bi Qiang,Liu Qingxu,Wang Fu. New Content Recommendation Service of Digital Literature[J]. 现代图书情报技术, 2016, 32(9): 70-77.
[8] Ba Zhichao,Li Gang,Zhu Shiwei. Similarity Measurement of Research Interests in Semantic Network[J]. 现代图书情报技术, 2016, 32(4): 81-90.
[9] Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[10] Liu Huailiang, Du Kun, Qin Chunxiu. Research on Chinese Text Categorization Based on Semantic Similarity of HowNet[J]. 现代图书情报技术, 2015, 31(2): 39-45.
[11] Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[12] Hu Jiming, Xiao Lu. Semantic Incremental Improvement on Vector Space Model for Text Modeling[J]. 现代图书情报技术, 2014, 30(10): 49-55.
[13] He Chao, Zhang Yufeng. Research on Business Intelligence Link Analysis Algorithm Combining Semantic Similarity[J]. 现代图书情报技术, 2013, 29(3): 27-32.
[14] Sun Haixia, Li Junlian, Li Danya, Wu Yingjie, Li Xiaoying. The Study on Semantic Mapping from Free Word to Subject Headings Based on Semantic System of CMeSH[J]. 现代图书情报技术, 2013, 29(11): 46-51.
[15] Ma Junhong. A Staged and Integrated Semantic Similarity Algorithm of Text[J]. 现代图书情报技术, 2013, 29(10): 20-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938