Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 89-93    DOI: 10.11925/infotech.1003-3513.2013.07-08.13
article Current Issue | Archive | Adv Search |
A New Feature Selection Method Based on Term Contribution in Co-word Analysis
Hu Changping, Chen Guo
Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF(639 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  From the view of data dimension reduction, the method of constructing co-word matrix by high frequent words has a great improvement space. By comparing co-word analysis with traditional text processing including text categorization, text clustering and information retrieval, the authors introduce a new feature selection method based on term contribution and the algorithm description. Through experimental comparison, it is shown that the new method has obvious effect on improving the data quality and cluster result.
Key wordsCo-word analysis      Clustering      Term contribution      Feature selection      Digital library     
Received: 08 May 2013      Published: 02 September 2013
: 

TP391

 

Cite this article:

Hu Changping, Chen Guo. A New Feature Selection Method Based on Term Contribution in Co-word Analysis. New Technology of Library and Information Service, 2013, 29(7/8): 89-93.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.07-08.13     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I7/8/89

[1] 叶鹰,张力,赵星,等.用共关键词网络揭示领域知识结构的实验研究[J]. 情报学报,2012,31(12):1245-1251.(Ye Ying, Zhang Li, Zhao Xing, et al. An Experimental Study on Revealing Domain Knowledge Structure by Co-keyword Networks[J]. Journal of the China Society for Scientific and Technical Information, 2012,31(12):1245-1251.)
[2] 张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J]. 计算机应用研究,2012,29(7):2541-2543. (Zhang Yufang,Wan Binhou,Xiong Zhongyang. Research on Feature Dimension Reduction in Text Classification[J]. Application Research of Computers,2012,29(7): 2541-2543.)
[3] 杨小兵.聚类分析中若干关键技术的研究[D].杭州:浙江大学,2005.(Yang Xiaobing. Research of Key Techniques in Cluster Analysis[D].Hangzhou: Zhejiang University,2005.)
[4] 陈涛,谢阳群.文本分类中的特征降维方法综述[J]. 情报学报,2005,24(6):690-695.(Chen Tao,Xie Yangqun. Literature Review of Feature Dimension Reduction in Text Categorization[J]. Journal of the China Society for Scientific and Technical Information, 2005,24(6):690-695.)
[5] Dash M, Liu H. Feature Selection for Clustering[C].In: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan. 2000:110-121.
[6] 王博.文本分类中特征选择技术的研究[D].长沙:国防科学技术大学,2009.(Wang Bo. Related Technologies Research on Feature Selection for Text Categorization[D].Changsha:National University of Defense Technology,2009.)
[7] Liu T, Liu S P, Chen Z, et al. An Evaluation on Feature Selection for Text Clustering[C]. In: Proceedings of the 20th International Conference on Machine Learning (ICML'03). 2003:488-495.
[8] 龚静.中文文本聚类中特征选择算法的研究[D].湘潭:湘潭大学,2006.(Gong Jing. The Study on Feature Selection Algorithm in Chinese Text Clustering[D].Xiangtan: Xiangtan University,2006.)
[9] 刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J]. 计算机研究与发展,2005,42 (3):381-386.(Liu Tao, Wu Gongyi, Chen Zheng. An Effective Unsupervised Feature Selection Method for Text Clustering[J]. Journal of Computer Research and Development,2005,42(3):381-386.)
[10] TF-IDF[EB/OL].[2013-02-12].http://zh.wikipedia.org/wiki/TF-IDF.
[11] 钟伟金.共词分析法应用的规范化研究——主题词和关键词的聚类效果对比分析[J]. 图书情报工作,2011,55(6):114-118.(Zhong Weijin. Empirical Study on Effectiveness of the Co-word Cluster Analysis——Comparative Analysis on the Clustering Results of Subject Heedings and Keywords[J]. Library and Information Service,2011,55(6):114-118.)
[12] Wang Z Y, Li G, Li C Y,et al. Research on the Semantic-based Co-word Analysis[J].Scientometrics,2012,90(3):855-875.
[13] 苏新宁,夏立新.2000-2009年我国数字图书馆研究主题领域分析——基于CSSCI关键词统计数据[J]. 中国图书馆学报,2011,37(4):60-69.(Su Xinning, Xia Lixin. Topic Analysis of Digital Library Research from 2000 to 2009 in China: Based on the Statistical Data of Key Words Released by CSSCI[J]. Journal of Library Science in China, 2011,37(4):60-69.)
[14] Liu G Y, Hu J M, Wang H L. A Co-word Analysis of Digital Library Field in China[J]. Scientometrics,2012,91(1):203-217.
[1] Ke Li,Yuya Sasaki. Analyzing Sentiment Distribution with Spatial-textual Data of Multi-dimensional Clustering[J]. 数据分析与知识发现, 2019, 3(7): 14-22.
[2] Qikai Cheng,Jiamin Wang,Wei Lu. Discovering Domain Vocabularies Based on Citation Co-word Network[J]. 数据分析与知识发现, 2019, 3(6): 57-65.
[3] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[4] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[5] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[6] Jiang Wu,Yinghui Zhao,Jiahui Gao. Research on Weibo Opinion Leaders Identification and Analysis in Medical Public Opinion Incidents[J]. 数据分析与知识发现, 2019, 3(4): 53-62.
[7] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[8] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[9] Jiaxin Ye,Huixiang Xiong. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. 数据分析与知识发现, 2019, 3(2): 21-32.
[10] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[11] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[12] Xiangdong Li,Fan Gao,Youhai Li. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[13] Xiufang Wang,Shu Sheng,Yan Lu. Analyzing Public Opinion from Microblog with Topic Clustering and Sentiment Intensity[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
[14] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[15] Zhen Yang,Hongjun Wang,Yu Zhou. A Clustering Algorithm with Adaptive Cut-off Distance and Cluster Centers[J]. 数据分析与知识发现, 2018, 2(3): 39-48.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn