Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 49-54    DOI: 10.11925/infotech.1003-3513.2013.07-08.07
article Current Issue | Archive | Adv Search |
Research on Text Clustering Based on Social Tagging
He Wenjing, He Lin
College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF(577 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  In this paper, the authors select social tags which are used to annotate resources as feature items. Text clustering is implemented by K-means, a kind of clustering algorithm, and successfully conducted on small data set. The implementation of primary technology, such as tag filtering, clustering algorithm, in text clustering based on social tagging is discussed in details. By the experiment, it is concluded that text clustering based on social tags performs better than keywords, which can improve the clustering results.
Key wordsSocial tag      Feature selection      Clustering algorithm      Text clustering     
Received: 27 May 2013      Published: 02 September 2013
: 

G250

 

Cite this article:

He Wenjing, He Lin. Research on Text Clustering Based on Social Tagging. New Technology of Library and Information Service, 2013, 29(7/8): 49-54.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.07-08.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I7/8/49

[1] Brooks C H, Montanez N.An Analysis of the Effectiveness of Tagging in Blogs[C]. In: Proceedings of 2005 AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs.California:AAAI, 2005:9-14.
[2] Al-Khalifa H S,Davis H C.Folksonomy Versus Automatic Keyword Extraction:An Empirical Study[EB/OL].[2012-08-15].http://eprints.ecs. soton.ac.uk/.
[3] Ramage D, Heymann P, Manning C D, et al. Clustering the Tagged Web[C]. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining(WSDM'09). New York, NY, USA: ACM, 2009:54-63.
[4] 王波,唐常杰,段磊,等.RT-Rank:基于RSS标签排名相关性的文档聚类[J]. 计算机研究与发展,2007,44(S3):125-130.(Wang Bo,Tang Changjie,Duan Lei, et al. RT-Rank: Document Clustering Based on RSS Tag Rank Correlation[J].Journal of Computer Research and Development,2007,44(S3):125-130.)
[5] Kim H L, Yang S, Song S, et al. Tag Mediated Society with SCOT Ontology[C/OL]. In: Proceedings of Semantic Web Challenge. 2007.[2013-04-18].http://www.cs.vu.nl/~pmika/swc-2007/SCOT.pdf.
[6] 杨丹,曹俊.基于Web2.0的社会性标签推荐系统[J]. 重庆工学院学报:自然科学版,2008,22(7):51-55.(Yang Dan, Cao Jun. Web Page Recommender System Based on Social Tags in Web 2.0[J].Journal of Chongqing Institute of Technology:Natural Science,2008,22(7):51-55.)
[7] 张云,冯博琴.利用标签的层次化搜索结果聚类方法[J]. 西安交通大学学报,2009,43(4):18-21.(Zhang Yun, Feng Boqin. Clustering Method Based on Label Hierarchical Search Results[J].Journal of Xi'an Jiaotong University,2009,43(4):18-21.)
[8] Heymann P, Garcia-Molina H. Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems[R]. California: Stanford University,2006.
[9] 窦永香,苏山佳,赵捧未.基于Porter算法的英文标签聚类方法研究[J]. 现代图书情报技术,2009(9):40-44.(Dou Yongxiang,Su Shanjia,Zhao Pengwei.An English Tag Clustering Method Based on the Porter Stemming Algorithm[J].New Technology of Library and Information Service,2009(9):40-44.)
[10] Zubiaga A, Krner C, Strohmaier M. Tags vs Shelves: From Social Tagging to Social Classification[C]. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia. New York, NY, USA: ACM,2011:93-102.
[11] 靳延安.一种基于动机倾向的标签推荐方法[J]. 计算机应用研究,2013,30(1):72-77.(Jin Yan'an. Approach for Tag Recommendation Based on Orientation of Motivation[J]. Application Research of Computers,2013,30(1):72-77.)
[12] Steinbach M, Karypis G, Kumar V. A Comparison of Document Clustering Techniques[R]. Minnesota: University of Minnesota,2000.
[1] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[2] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[3] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[4] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[5] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[6] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[7] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[8] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[9] Zhipeng Li,Weizhong Li. Feature Selection Based on Modified QPSO Algorithm[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[10] Huixiang Xiong,Wuxuan Jiang. Clustering and Recommending Users Based on Tags and Relation Network[J]. 数据分析与知识发现, 2017, 1(6): 36-46.
[11] Qin Guan, Sanhong Deng, Hao Wang. Chinese Stopwords for Text Clustering: A Comparative Study[J]. 数据分析与知识发现, 2017, 1(3): 72-80.
[12] Mengyao Xie,Xuwei Pan. Constructing Dynamic Social Tag Cloud for User Interests[J]. 数据分析与知识发现, 2017, 1(2): 35-40.
[13] Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[14] Xiangdong Li,Tao Ruan,Kang Liu. Automatic Classification of Documents from Wikipedia[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
[15] Yonghe Lu,Jinghuang Chen. Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm[J]. 数据分析与知识发现, 2017, 1(1): 91-101.