Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 89-93    DOI: 10.11925/infotech.1003-3513.2013.07-08.13
article Current Issue | Archive | Adv Search |
A New Feature Selection Method Based on Term Contribution in Co-word Analysis
Hu Changping, Chen Guo
Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  From the view of data dimension reduction, the method of constructing co-word matrix by high frequent words has a great improvement space. By comparing co-word analysis with traditional text processing including text categorization, text clustering and information retrieval, the authors introduce a new feature selection method based on term contribution and the algorithm description. Through experimental comparison, it is shown that the new method has obvious effect on improving the data quality and cluster result.
Key wordsCo-word analysis      Clustering      Term contribution      Feature selection      Digital library     
Received: 08 May 2013      Published: 02 September 2013
: 

TP391

 

Cite this article:

Hu Changping, Chen Guo. A New Feature Selection Method Based on Term Contribution in Co-word Analysis. New Technology of Library and Information Service, 2013, 29(7/8): 89-93.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.07-08.13     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I7/8/89

[1] 叶鹰,张力,赵星,等.用共关键词网络揭示领域知识结构的实验研究[J]. 情报学报,2012,31(12):1245-1251.(Ye Ying, Zhang Li, Zhao Xing, et al. An Experimental Study on Revealing Domain Knowledge Structure by Co-keyword Networks[J]. Journal of the China Society for Scientific and Technical Information, 2012,31(12):1245-1251.)
[2] 张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J]. 计算机应用研究,2012,29(7):2541-2543. (Zhang Yufang,Wan Binhou,Xiong Zhongyang. Research on Feature Dimension Reduction in Text Classification[J]. Application Research of Computers,2012,29(7): 2541-2543.)
[3] 杨小兵.聚类分析中若干关键技术的研究[D].杭州:浙江大学,2005.(Yang Xiaobing. Research of Key Techniques in Cluster Analysis[D].Hangzhou: Zhejiang University,2005.)
[4] 陈涛,谢阳群.文本分类中的特征降维方法综述[J]. 情报学报,2005,24(6):690-695.(Chen Tao,Xie Yangqun. Literature Review of Feature Dimension Reduction in Text Categorization[J]. Journal of the China Society for Scientific and Technical Information, 2005,24(6):690-695.)
[5] Dash M, Liu H. Feature Selection for Clustering[C].In: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan. 2000:110-121.
[6] 王博.文本分类中特征选择技术的研究[D].长沙:国防科学技术大学,2009.(Wang Bo. Related Technologies Research on Feature Selection for Text Categorization[D].Changsha:National University of Defense Technology,2009.)
[7] Liu T, Liu S P, Chen Z, et al. An Evaluation on Feature Selection for Text Clustering[C]. In: Proceedings of the 20th International Conference on Machine Learning (ICML'03). 2003:488-495.
[8] 龚静.中文文本聚类中特征选择算法的研究[D].湘潭:湘潭大学,2006.(Gong Jing. The Study on Feature Selection Algorithm in Chinese Text Clustering[D].Xiangtan: Xiangtan University,2006.)
[9] 刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J]. 计算机研究与发展,2005,42 (3):381-386.(Liu Tao, Wu Gongyi, Chen Zheng. An Effective Unsupervised Feature Selection Method for Text Clustering[J]. Journal of Computer Research and Development,2005,42(3):381-386.)
[10] TF-IDF[EB/OL].[2013-02-12].http://zh.wikipedia.org/wiki/TF-IDF.
[11] 钟伟金.共词分析法应用的规范化研究——主题词和关键词的聚类效果对比分析[J]. 图书情报工作,2011,55(6):114-118.(Zhong Weijin. Empirical Study on Effectiveness of the Co-word Cluster Analysis——Comparative Analysis on the Clustering Results of Subject Heedings and Keywords[J]. Library and Information Service,2011,55(6):114-118.)
[12] Wang Z Y, Li G, Li C Y,et al. Research on the Semantic-based Co-word Analysis[J].Scientometrics,2012,90(3):855-875.
[13] 苏新宁,夏立新.2000-2009年我国数字图书馆研究主题领域分析——基于CSSCI关键词统计数据[J]. 中国图书馆学报,2011,37(4):60-69.(Su Xinning, Xia Lixin. Topic Analysis of Digital Library Research from 2000 to 2009 in China: Based on the Statistical Data of Key Words Released by CSSCI[J]. Journal of Library Science in China, 2011,37(4):60-69.)
[14] Liu G Y, Hu J M, Wang H L. A Co-word Analysis of Digital Library Field in China[J]. Scientometrics,2012,91(1):203-217.
[1] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[2] Wang Xiwei,Jia Ruonan,Wei Yanan,Zhang Liu. Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[3] Lu Linong,Zhu Zhongming,Zhang Wangqiang,Wang Xiaochun. Cross-database Knowledge Integration and Fingerprint of Institutional Repositories with Lingo3G Clustering Algorithm[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[4] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[5] Liang Jiaming, Zhao Jie, Zheng Peng, Huang Liushen, Ye Minqi, Dong Zhenning. Framework for Computing Trust in Online Short-Rent Platform Using Feature Selection of Images and Texts[J]. 数据分析与知识发现, 2021, 5(2): 129-140.
[6] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] Yang Chen, Chen Xiaohong, Wang Chuhan, Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[8] Yu Fengchang,Cheng Qikai,Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[9] Wu Jinming,Hou Yuefang,Cui Lei. Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[10] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[11] Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[12] Yang Xu,Qian Xiaodong. Synchronous Clustering Algorithm for Social Networks Based on Improved Vicsek Model[J]. 数据分析与知识发现, 2020, 4(4): 119-128.
[13] Xiong Huixiang,Li Xiaomin,Li Yueyan. Group Recommendation Based on Attribute Mining of Book Reviews[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[14] Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui,Peng Keyun. Detecting News Topics Based on Equalized Paragraph and Sub-topic Vector[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
[15] Huaming Zhao,Li Yu,Qiang Zhou. Determining Best Text Clustering Number with Mean Shift Algorithm[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn