Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (10): 22-29    DOI: 10.11925/infotech.1003-3513.2015.10.04
Current Issue | Archive | Adv Search |
Clustering Machine-Generated Tags with Different Quality
Zhang Chengzhi1,2, Gu Xiaoxue1
1 School of Economics & Management, Nanjing University of Science and Technology, Nanjing 210094, China;
2 Jiangsu Key Laboratory of Data Engineering and Knowledge Service (Nanjing University), Nanjing 210093, China
Download: PDF(761 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Conventional tags or words clustering haven't considered the impact of clustering members' quality to clustering results. This paper aims to analyze the differences in clustering results of different quality of the clustering machine-generated tags and make suggestions to improve the clustering result with fusion of tag quality. [Methods] Firstly, fetch the data of Engadet's blogs in Chinese and English, preprocess the data and get the candidate tags, extract tags' social and content features to calculate their weight. The authors use two strategies to distinguish different quality tags and obtain different tag sets. Then calculate the similarities of these tag sets and use AP algorithm to get clustering results, which could be compared and analyzed. [Results] The experiment results show that, for both Chinese and English tags, clustering results of Top5 tags are better than Top5-10, and clustering results of weighted social attributes of tags are better than non-weighted tags. [Limitations] The method of distinguishing tags' quality is relatively simple and lacking of effective method to evaluate the quality of tags. [Conclusions] Clustering results of machine-generated tags with high quality are better than clustering results of tags with low quality. The clustering performance of machine-generated tags can be improved by weighting the social attribute. At the same time, the social attribute of tags can be used to evaluate the quality of them.

Received: 29 April 2014      Published: 06 April 2016
:  G250  

Cite this article:

Zhang Chengzhi, Gu Xiaoxue. Clustering Machine-Generated Tags with Different Quality. New Technology of Library and Information Service, 2015, 31(10): 22-29.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.10.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I10/22

[1] Peters I. Folksonomies: Indexing and Retrieval in Web 2.0 [M]. Walter de Gruyter, 2009.
[2] Liu Y, Liu M, Chen X, et al. Automatic Tag Recommendation for Weblogs [C]. In: Proceedings of International Conference on Information Technology and Computer Science (ITCS 2009). 2009: 546-549.
[3] Li Z, Zhou D, Juan Y F, et al. Keyword Extraction for Social Snippets [C]. In: Proceedings of the 19th International Conference on World Wide Web. 2010: 1143-1144.
[4] Carmel D, Uziel E, Guy I, et al. Folksonomy-based Term Extraction for Word Cloud Generation [J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2012, 3(4): Article No. 60.
[5] Gemmell J, Shepitsen A, Mobasher M, et al. Personalization in Folksonomies Based on Tag Clustering [C]. In: Proceedings of the 6th Workshop on Intelligent Techniques for Web Personalization and Recommender Systems.2008.
[6] Shepitsen A, Gemmell J, Mobasher B, et al. Personalized Recommendation in Social Tagging Systems Using Hierar­chical Clustering [C]. In: Proceedings of the 2008 ACM Conference on Recommender Systems.2008:259-266.
[7] Wang J, Hong L, Davison B D. RSDC'09: Tag Recommen­dation Using Keywords and Association Rules [C]. In: Proceedings of ECML PKDD 2009 Discovery Challenge Workshop. 2009: 261-274.
[8] Kim H N, El Saddik A. Exploring Social Tagging for Personalized Community Recommendations [J]. User Modeling and User-Adapted Interaction, 2013, 23(2-3): 249-285.
[9] 李蕾, 章成志. 社会化标签质量评估研究综述[J]. 现代图书情报技术, 2013(11): 22-29. (Li Lei, Zhang Chengzhi. Survey on Quality Evaluation of Social Tags [J]. New Technology of Library and Information Service, 2013(11): 22-29.)
[10] Sen S, Vig J, Riedl J. Learning to Recognize Valuable Tags [C]. In: Proceedings of the 14th International Conference on Intelligent User Interfaces.2009:87-96.
[11] Chen X, Shin H. Extracting Representative Tags for Flickr Users[C]. In: Proceedings of the 2010 IEEE International Conference on Data Mining Workshops (ICDMW). 2010: 312-317.
[12] 李丕绩, 马军, 张冬梅, 等. 用户评论中的标签抽取以及排序[J]. 中文信息学报, 2012, 26(5): 14-19. (Li Piji, Ma Jun, Zhang Dongmei, et al. Extraction and Ranking of Tags for User Opinions [J]. Journal of Chinese Information Processing, 2012, 26(5): 14-19.)
[13] Suchanek F M, Vojnovic M, Gunawardena D. Social Tags: Meaning and Suggestions [C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Manage­ment. 2008: 223-232.
[14] Begelman G, Keller P, Smadja F. Automated Tag Clustering: Improving Search and Exploration in the Tag Space [C]. In: Proceedings of the Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland. 2006: 15-33.
[15] Cui J, Liu H, He J, et al. TagClus: A Random Walk-based Method for Tag Clustering [J]. Knowledge and Information Systems, 2011, 27(2): 193-225.
[16] Ramage D, Heymann P, Manning C D, et al. Clustering the Tagged Web [C]. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 2009: 54-63.
[17] 曹高辉, 焦玉英, 成全. 基于凝聚式层次聚类算法的标签聚类研究[J]. 现代图书情报技术, 2008 (4): 23-28. (Cao Gaohui, Jiao Yuying, Cheng Quan. Research on Tag Cluster Based on Hierarchical Agglomerative Clustering Algorithm [J]. New Technology of Library and Information Service, 2008 (4): 23-28.)
[18] Sbodio M L, Simpson E. Tag Clustering with Self Organizing Maps [R]. Hewlett-Packard Development Company, LP, 2009.
[19] Zong Y, Xu G, Jin P, et al. APPECT: An Approximate Backbone-based Clustering Algorithm for Tags [C]. In: Proceedings of the 7th International ADMA Conference, Beijing, China. Springer. 2011: 175-189.
[20] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing [J]. Communications of the ACM, 1975, 18(11): 613-620.
[21] 何文静, 何琳. 基于社会标签的文本聚类研究[J]. 现代图书情报技术, 2013(7-8): 49-54. (He Wenjing, He Lin. Research on Text Clustering Based on Social Tagging [J]. New Technology of Library and Information Service, 2013 (7-8): 49-54.)
[22] Frey B J, Dueck D. Clustering by Passing Messages Between Data Points [J]. Science, 2007, 315(5814): 972-976.
[23] Tan P N, Steinbach M, Kumar V. 数据挖掘导论[M]. 范明, 范宏建译. 北京: 人民邮电出版社, 2006: 340-341. (Tan P N, Steinbach M, Kumar V. Introduction to Data Mining [M]. Translated by Fan Ming, Fan Hongjian. Beijing: Posts & Telecom Press, 2006: 340-341.)
[24] MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations [C]. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967: 281-297.
[25] Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis [M]. John Wiley & Sons, 2009.
[26] Ester M, Kriegel H P, Sander J, et al. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [C]. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). 1996: 226-231.

[1] Liu Feng, Zhang Xiaolin. Review on the Scientific Metadata Standards and Research on Its Generic Design[J]. 现代图书情报技术, 2015, 31(12): 3-12.
[2] Sun Yi'nan, Ku Liping, Song Xiufang, Liu Jingjing, Jiang Xian. The Policy Research and Analysis of Subject Data Repository ——Cases Study of Life Sciences[J]. 现代图书情报技术, 2015, 31(12): 13-20.
[3] Bi Qiang, Liu Jian. Research on the Service Recommendation of the Content of Digital Literature Resources[J]. 现代图书情报技术, 2015, 31(12): 21-27.
[4] Zhu Guang. Copyright Protection Scheme of Color Images for Libraries, Museums and Archives Based on Zero-Watermarking[J]. 现代图书情报技术, 2015, 31(12): 89-94.
[5] Wang Zhengjun, Yu Xiaoyi, Jin Yuling. Using Sniffer Technology to Constraint Electronic Resource Excessive Downloading[J]. 现代图书情报技术, 2015, 31(12): 95-100.
[6] Jin Wei, Zhao Rongying, Yin Ge. An Analysis of the Accumulation State and the Validity of User Readership Data in Online Reference Managers ——Take the Indicators of Altmetrics as an Example[J]. 现代图书情报技术, 2015, 31(11): 75-81.
[7] Zheng Yangyang, Xu Jian, Xiao Zhuo. Utilization of Sentiment Analysis and Visualization in Online Video Bullet-screen Comments[J]. 现代图书情报技术, 2015, 31(11): 82-90.
[8] Liu Yueru, Guo Limin. The New Utilizes of WeChat Platform with Interactive Functions[J]. 现代图书情报技术, 2015, 31(11): 104-109.
[9] Gu Xiaoxue, Zhang Chengzhi. Combined with Annotated Content and User Attributes for Tag Clustering[J]. 现代图书情报技术, 2015, 31(10): 30-39.
[10] Liu Dan. Personalized Book Recommender Service Deployment Using Apache Mahout[J]. 现代图书情报技术, 2015, 31(10): 102-108.
[11] Ma Yumeng, Guo Jinjing, Wang Fang. Research on the Framework of Semantic Organization Model for Research Data in the e-Science Environment[J]. 现代图书情报技术, 2015, 31(7-8): 48-57.
[12] Wu Dan, Ran Aihua. A Comparative Study of Mobile Reading Applications Based on User Experiences[J]. 现代图书情报技术, 2015, 31(7-8): 73-79.
[13] Chen Ting, Han Tao, Li Zexia, Li Guopeng, Wang Xiaomei. Research on Comparison Method of Scientific Funding Layout——Take NSF and EU FP Grants for Instance[J]. 现代图书情报技术, 2015, 31(7-8): 89-96.
[14] Guo Zhenying, Zhao Wenbing, Wei Yuhui. Construction of Linked Data with Lightweight Book Bibliography Ontology[J]. 现代图书情报技术, 2015, 31(7-8): 139-143.
[15] Guo Limin, Liu Yueru, Xiang Mingqiong. Application of WeChat QR Code in Reader Authentication[J]. 现代图书情报技术, 2015, 31(7-8): 144-147.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn