Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (10): 30-39    DOI: 10.11925/infotech.1003-3513.2015.10.05
Current Issue | Archive | Adv Search |
Combined with Annotated Content and User Attributes for Tag Clustering
Gu Xiaoxue1, Zhang Chengzhi1,2
1 School of Economics & Management, Nanjing University of Science and Technology, Nanjing 210094, China;
2 Jiangsu Key Laboratory of Data Engineering and Knowledge Service (Nanjing University), Nanjing 210093, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Explore the impact of tags' annotated content and tags' user attributes and their combinations in tag clustering. [Methods] Using ScienceNet.cn blogs, extract tag feature, build a vector space model and calculate the similarities between tags where linear method and Sigmod method are used to weight them, finally use the AP algorithm to cluster the tags. [Results] Experimental evaluation results show that in subject classification, in combination of annotated content and user attributes, two types of weighting methods can improve the clustering results, and the performace of Sigmod method is optimal; while in systematic classification, the combination of these two features can't perform as well as the former one and even worse than the content feature. [Limitations] The data selected for experiment is small and the classification for estimating the clustering results is not perfect. What's more, AP clustering algorithm lacks the ability to deal with big data. [Conclusions] The combination of these two features can improve the tag clustering results in some cases, and we should focus more on tag's content in tag clustering.

Received: 29 April 2015      Published: 06 April 2016
:  G250  

Cite this article:

Gu Xiaoxue, Zhang Chengzhi. Combined with Annotated Content and User Attributes for Tag Clustering. New Technology of Library and Information Service, 2015, 31(10): 30-39.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.10.05     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I10/30

[1] Gemmell J, Shepitsen A, Mobasher M, et al. Personalization in Folksonomies Based on Tag Clustering [C]. In: Proceedings of the 6th Workshop on Intelligent Techniques for Web Personalization and Recommender Systems.2008.
[2] Mathes A. Folksonomies-cooperative Classification and Communication Through Shared Metadata [J]. Computer Mediated Communication, 2004, 47(10): 1-13.
[3] Hammond T, Hannay T, Lund B, et al. Social Bookmarking Tools (I): A General Review [J/OL]. D-Lib Magazine, 2005, 11(4). http://www.dlib.org/dlib/april05/hammond/04hammond. html.
[4] Millen D R, Feinberg J, Kerr B. Dogear: Social Bookmarking in the Enterprise [C]. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2006: 111-120.
[5] Choy S O, Lui A K. Web Information Retrieval in Collaborative Tagging Systems [C]. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006). 2006: 352-355.
[6] Wu X, Zhang L,Yu Y. Exploring Social Annotations for the Semantic Web [C]. In: Proceedings of the 15th International Conference on World Wide Web. 2006: 417-426.
[7] Yan R, Natsev A,Campbell M. An Efficient Manual Image Annotation Approach Based on Tagging and Browsing [C]. In: Proceedings of the Workshop on Multimedia Information Retrieval on the Many Faces of Multimedia Semantics. 2007: 13-20.
[8] Simpson E. Clustering Tags in Enterprise and Web Folksonomies [C]. In: Proceedings of the International Conference on Weblogs & Social Media, Seattle, USA. 2008.
[9] Begelman G, Keller P, Smadja F. Automated Tag Clustering: Improving Search and Exploration in the Tag Space [C]. In: Proceedings of the Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland. 2006: 15-33.
[10] Van Damme C, Hepp M, Siorpaes K. Folksontology: An Integrated Approach for Turning Folksonomies into Ontologies [C]. In: Proceedings of the ESWC Workshop “Bridging the Gap between Semantic Web and Web”. 2007: 57-70.
[11] Agirre E,De Lacalle O L. Clustering WordNet Word Senses [C]. In: Proceedings of the Conference on Recent Advances on Natural Language (RANLP'03). 2003: 121-130.
[12] Fokker J, Pouwelse J, Buntine W. Tag-based Navigation for Peer-to-peer Wikipedia [C]. In: Proceedings of the Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland. 2006.
[13] Christopher H B, Nancy M. Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering [C]. In: Proceedings of the 15th World Wide Web Conference (WWW'06), Edinburgh, Scotland. 2006.
[14] Salton G, McGill M J. Introduction to Modern Information Retrieval [M]. New York, NY, USA: McGraw-Hill, Inc., 1983.
[15] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing [J]. Communications of the ACM, 1975, 18(11): 613-620.
[16] 周津, 陈超, 俞能海. 采用对象特征向量表示法的标签聚类算法[J]. 小型微型计算机系统, 2012, 33(3): 525-530. (Zhou Jin, Chen Chao, Yu Nenghai. Tag Clustering Algorithm Using Object-based Feature Vector [J]. Journal of Chinese Computer Systems, 2012, 33(3): 525-530.)
[17] Jeh G, Widom J. SimRank: A Measure of Structural-context Similarity [C]. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2002: 538-543.
[18] Cui J, Liu H, He J, et al. Tagclus: A Random Walk-based Method for Tag Clustering [J]. Knowledge and Information Systems, 2011, 27(2): 193-225.
[19] 王萍, 张际平. 一种社会性标签聚类算法[J]. 计算机应用与软件, 2010, 27(2): 126-129. (Wang Ping, Zhang Jiping. A Clustering Algorithm of Social Tags [J]. Computer Applications and Software, 2010, 27(2): 126-129.)
[20] MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations [C]. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967: 281-297.
[21] Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis [M]. John Wiley & Sons, 2009.
[22] Ester M, Kriegel H P, Sander J, et al. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [C]. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96) .1996: 226-231.
[23] Ramage D, Heymann P, Manning C D, et al. Clustering the Tagged Web [C]. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 2009:54-63.
[24] 曹高辉, 焦玉英, 成全. 基于凝聚式层次聚类算法的标签聚类研究[J]. 现代图书情报技术, 2008(4): 23-28. (Cao Gaohui, Jiao Yuying, Cheng Quan. Research on Tag Cluster Based on Hierarchical Agglomerative Clustering Algorithm [J]. New Technology of Library and Information Service, 2008(4): 23-28.)
[25] Shepitsen A, Gemmell J, Mobasher B, et al. Personalized Recommendation in Social Tagging Systems Using Hierarchical Clustering [C]. In: Proceedings of the 2008 ACM Conference on Recommender Systems.2008: 259-266.
[26] Sbodio M L, Simpson E. Tag Clustering with Self Organizing Maps [R]. Hewlett-Packard Development Company, LP, 2009.
[27] Zong Y, Xu G, Jin P, et al. APPECT: An Approximate Backbone-based Clustering Algorithm for Tags [C]. In: Proceedings of the 7th International ADMA Conference, Beijing, China. Springer.2011: 175-189.
[28] Salton G, Yu C T. On the Construction of Effective Vocabularies for Information Retrieval [C]. In: Proceedings of the 1973 Meeting on Programming Languages ACM SIGPLAN Notices.1973: 48-60.
[29] 金春霞, 周海岩. 位置加权文本聚类算法[J]. 计算机工程与科学, 2011, 33(6): 154-158. (Jin Chunxia, Zhou Haiyan. A Text Clustering Based on Position Weighting [J]. Computer Engineering & Science, 2011, 33(6): 154-158.)
[30] 姚清耘. 基于向量空间模型的中文文本聚类方法的研究[D]. 上海: 上海交通大学, 2008. (Yao Qingyun. Research of VSM-based Chinese Text Clustering Algorithms [D]. Shanghai:Shanghai Jiaotong University,2008.)
[31] 何文静, 何琳. 基于社会标签的文本聚类研究[J]. 现代图书情报技术, 2013(7-8): 49-54. (He Wenjing, He Lin. Research on Text Clustering Based on Social Tagging [J]. New Technology of Library and Information Service, 2013(7-8): 49-54.)
[32] Ehrig M, Staab S. QOM-quick Ontology Mapping[C].In: Proceedings of the 3rd International Semantic Web Conference, Hiroshima, Japan. Springer, 2004: 683-697.
[33] Peukert E, Massmann S, Konig K. Comparing Similarity Combination Methods for Schema Matching [C]. In: Proceedings of the GI Jahrestagung (1). 2010: 692-701.
[34] 何琳. 基于多策略的领域本体术语抽取研究[J]. 情报学报, 2012, 31(8): 798-804. (He Lin. Domain Ontology Terminology Extraction Based on Integrated Strategy Method [J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(8): 798-804.)
[35] Frey B J,Dueck D. Clustering by Passing Messages Between Data Points[J]. Science, 2007, 315(5814): 972-976.
[36] Tan P N, Steinbach M, Kumar V. 数据挖掘导论[M]. 范明, 范宏建译. 北京: 人民邮电出版社, 2006: 340-341. (Tan P N, Steinbach M, Kumar V. Introduction to Data Mining [M]. Translated by Fan Ming, Fan Hongjian. Beijing: Posts & Telecom Press, 2006: 340-341.)

[1] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[2] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[3] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[4] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[5] Li He,Liu Jiayu,Li Shiyu,Wu Di,Jin Shuaiqi. Optimizing Automatic Question Answering System Based on Disease Knowledge Graph[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
[6] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[7] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[8] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[9] Chang Zhijun,Qian Li,Xie Jing,Wu Zhenxin,Zhang Hu,Yu Qianqian,Wang Ying,Wang Yongji. Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J]. 数据分析与知识发现, 2021, 5(3): 69-77.
[10] Hu Shaohu,Zhang Yingyi,Zhang Chengzhi. Review of Keyword Extraction Studies[J]. 数据分析与知识发现, 2021, 5(3): 45-59.
[11] Liu Tong, Liu Chen, Ni Weijian. A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation [J]. 数据分析与知识发现, 0, (): 1-.
[12] Wang Hongbin, Wang Jianxiong, Zhang Yafei, Yang Heng. Topic Recognition Research on Topic Imbalanced News Text Data Set [J]. 数据分析与知识发现, 0, (): 1-.
[13] Sifan Zhang, Zhendong Niu, Hao Lu, Yifan Zhu, Rongrong Wang. Graph Convolution Embedding and Feature Cross Based Literature Citation Prediction Method:Taking the Transportation Field as An Example [J]. 数据分析与知识发现, 0, (): 1-.
[14] Qi Ruihua, Jian Yue, Guo Xu, Guan Jinghua, Yang Mingxi. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism [J]. 数据分析与知识发现, 0, (): 1-.
[15] Li Jiao, Huang Yongwen, Luo Tingting, Zhao Ruixue, Xian Guojian. Automatic Classification based on Multi-factor Algorithm [J]. 数据分析与知识发现, 0, (): 1-.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn