Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (12): 43-47    DOI: 10.11925/infotech.1003-3513.2008.12.08
Current Issue | Archive | Adv Search |
Text Clustering Research on the Max Term Contribution Dimension Reduction and Simulated Annealing Algorithm
Lu Guoli  Wang Xiaohua  Wang Rongbo
(Computer Application Technology Laboratory of Hangzhou Dianzi University, Hangzhou 310018, China)
Download: PDF (558 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper presents a new algorithm for text character extraction and dimension reduction based on the Max Term Contribution. Its main idea is computing the contribution of each term in the high dimension document-base and extracting the maximum contribution terms to construct a low dimension document-base from the high dimension document-base using the search algorithm. Then a modified K-means clustering method based on the Simulated Annealing (SA) is presented to cluster the low dimension document datum which is obtained by MTC. Finally, some experiments show that the new method can improve the cluster precision.

Key wordsText clustering      Max term contribution      Character extraction      Simulated annealing     
Received: 02 September 2008      Published: 25 December 2008
ZTFLH: 

TP391

 
Corresponding Authors: Lu Guoli     E-mail: lgl@zjnu.cn
About author:: Lu Guoli,Wang Xiaohua,Wang Rongbo

Cite this article:

Lu Guoli,Wang Xiaohua,Wang Rongbo. Text Clustering Research on the Max Term Contribution Dimension Reduction and Simulated Annealing Algorithm. New Technology of Library and Information Service, 2008, 24(12): 43-47.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.12.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I12/43

[1] 中国科学院计算机网络信息中心. 第21次中国互联网络发展状况统计报告[R],2008.
[2] 秦进,陆汝占. 文本分类中的特征提取[J].计算机应用, 2003,23(2):45-46.
[3] 伍建军,康耀红. 一种基于特征词聚类的文本分类方法[J]. 情报理论与实践,2007,30(1):109-111.
[4] Friedman JH.Turkey JW. A Projection Pursuit Algorithm for Exploratory Data Analysis [J]. IEEE Transactions on Computer, 1974, 23(9):881-890.
[5] Gao MT. A New Algorithm for Text Clustering Based on Projection Pursuit [C]. In:Proceedings of the 6th International Conference on Machine Learning and Cybernetics, HongKong, 2007:3401-3405.
[6] 周水庚,关佶红. 隐含语义索引及其在中文文本处理中的应用研究[J].小型微型计算机系统,2001,22(2): 239-243.
[7] Gonzaga L,Grivet M. A Simple and Fast Term Selection Procedure for Text Clustering [C]. In:Proceedings of the 7th International Conference on Intelligent Systems Design and Application,2007:777-781.
[8] 杨淑莹. 模式识别与智能计算—Matlab技术实现[M],北京:电子工业出版社,2008.
[9] 张蓉,彭 宏. 一种快速的模拟退火算法及其在数据聚类中的应用[J]. 计算机工程与应用,2001,37(15):85-87.
[10] 武兆慧, 张桂娟, 刘希玉. 基于模拟退火遗传算法的聚类分析[J]. 计算机应用研究,2005,22(12):24-26.

[1] Huaming Zhao,Li Yu,Qiang Zhou. Determining Best Text Clustering Number with Mean Shift Algorithm[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
[2] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[3] Zhang Tao,Ma Haiqun. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[4] Guan Qin,Deng Sanhong,Wang Hao. Chinese Stopwords for Text Clustering: A Comparative Study[J]. 数据分析与知识发现, 2017, 1(3): 72-80.
[5] Chen Dongyi,Zhou Zicheng,Jiang Shengyi,Wang Lianxi,Wu Jialin. A Framework for Customer Segmentation on Enterprises’ Microblog[J]. 现代图书情报技术, 2016, 32(2): 43-51.
[6] Gong Kaile,Cheng Ying,Sun Jianjun. Clustering Blog Posts with Co-occurrence Analysis[J]. 现代图书情报技术, 2016, 32(10): 50-58.
[7] Gu Xiaoxue, Zhang Chengzhi. Using Content and Tags for Web Text Clustering[J]. 现代图书情报技术, 2014, 30(11): 45-52.
[8] Xu Xin, Hong Yunjia. Study on Text Visualization of Clustering Result for Domain Knowledge Base —— Take Knowledge Base of Chinese Cuisine Culture as the Object[J]. 现代图书情报技术, 2014, 30(10): 25-32.
[9] Deng Sanhong,Wan Jiexi,Wang Hao,Liu Xiwen. Experimental Study of Multilingual Text Clustering[J]. 现代图书情报技术, 2014, 30(1): 28-35.
[10] Zhao Hui, Liu Huailiang. Research on Short Text Clustering Algorithm for User Generated Content[J]. 现代图书情报技术, 2013, 29(9): 88-92.
[11] He Wenjing, He Lin. Research on Text Clustering Based on Social Tagging[J]. 现代图书情报技术, 2013, 29(7/8): 49-54.
[12] Hong Yunjia, Xu Xin. Study on Multi-level Text Clustering for Knowledge Base Based on Domain Ontology——Taking Knowledge Base of Chinese Cuisine Culture as an Example[J]. 现代图书情报技术, 2013, (12): 19-26.
[13] Bian Peng, Zhao Yan, Su Yuzhao. An Improved Method for Determining Optimal Number of Clusters in K-means Clustering Algorithm[J]. 现代图书情报技术, 2011, 27(9): 34-40.
[14] Rao Yanghui,Ye Liang,Cheng Jie. Research on the Application of WordNet in Text Clustering[J]. 现代图书情报技术, 2009, (10): 67-70.
[15] Wang Lianjun. An Analysis on Web-Based Text Mining[J]. 现代图书情报技术, 2002, 18(6): 38-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn