|
|
A New Feature Selection Method Based on Term Contribution in Co-word Analysis |
Hu Changping, Chen Guo |
Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China |
|
|
Abstract From the view of data dimension reduction, the method of constructing co-word matrix by high frequent words has a great improvement space. By comparing co-word analysis with traditional text processing including text categorization, text clustering and information retrieval, the authors introduce a new feature selection method based on term contribution and the algorithm description. Through experimental comparison, it is shown that the new method has obvious effect on improving the data quality and cluster result.
|
Received: 08 May 2013
Published: 02 September 2013
|
|
[1] 叶鹰,张力,赵星,等.用共关键词网络揭示领域知识结构的实验研究[J]. 情报学报,2012,31(12):1245-1251.(Ye Ying, Zhang Li, Zhao Xing, et al. An Experimental Study on Revealing Domain Knowledge Structure by Co-keyword Networks[J]. Journal of the China Society for Scientific and Technical Information, 2012,31(12):1245-1251.)[2] 张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J]. 计算机应用研究,2012,29(7):2541-2543. (Zhang Yufang,Wan Binhou,Xiong Zhongyang. Research on Feature Dimension Reduction in Text Classification[J]. Application Research of Computers,2012,29(7): 2541-2543.)[3] 杨小兵.聚类分析中若干关键技术的研究[D].杭州:浙江大学,2005.(Yang Xiaobing. Research of Key Techniques in Cluster Analysis[D].Hangzhou: Zhejiang University,2005.)[4] 陈涛,谢阳群.文本分类中的特征降维方法综述[J]. 情报学报,2005,24(6):690-695.(Chen Tao,Xie Yangqun. Literature Review of Feature Dimension Reduction in Text Categorization[J]. Journal of the China Society for Scientific and Technical Information, 2005,24(6):690-695.)[5] Dash M, Liu H. Feature Selection for Clustering[C].In: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan. 2000:110-121.[6] 王博.文本分类中特征选择技术的研究[D].长沙:国防科学技术大学,2009.(Wang Bo. Related Technologies Research on Feature Selection for Text Categorization[D].Changsha:National University of Defense Technology,2009.)[7] Liu T, Liu S P, Chen Z, et al. An Evaluation on Feature Selection for Text Clustering[C]. In: Proceedings of the 20th International Conference on Machine Learning (ICML'03). 2003:488-495.[8] 龚静.中文文本聚类中特征选择算法的研究[D].湘潭:湘潭大学,2006.(Gong Jing. The Study on Feature Selection Algorithm in Chinese Text Clustering[D].Xiangtan: Xiangtan University,2006.)[9] 刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J]. 计算机研究与发展,2005,42 (3):381-386.(Liu Tao, Wu Gongyi, Chen Zheng. An Effective Unsupervised Feature Selection Method for Text Clustering[J]. Journal of Computer Research and Development,2005,42(3):381-386.)[10] TF-IDF[EB/OL].[2013-02-12].http://zh.wikipedia.org/wiki/TF-IDF.[11] 钟伟金.共词分析法应用的规范化研究——主题词和关键词的聚类效果对比分析[J]. 图书情报工作,2011,55(6):114-118.(Zhong Weijin. Empirical Study on Effectiveness of the Co-word Cluster Analysis——Comparative Analysis on the Clustering Results of Subject Heedings and Keywords[J]. Library and Information Service,2011,55(6):114-118.)[12] Wang Z Y, Li G, Li C Y,et al. Research on the Semantic-based Co-word Analysis[J].Scientometrics,2012,90(3):855-875.[13] 苏新宁,夏立新.2000-2009年我国数字图书馆研究主题领域分析——基于CSSCI关键词统计数据[J]. 中国图书馆学报,2011,37(4):60-69.(Su Xinning, Xia Lixin. Topic Analysis of Digital Library Research from 2000 to 2009 in China: Based on the Statistical Data of Key Words Released by CSSCI[J]. Journal of Library Science in China, 2011,37(4):60-69.)[14] Liu G Y, Hu J M, Wang H L. A Co-word Analysis of Digital Library Field in China[J]. Scientometrics,2012,91(1):203-217. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|