Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (6): 36-46    DOI: 10.11925/infotech.2096-3467.2017.06.04
Orginal Article Current Issue | Archive | Adv Search |
Clustering and Recommending Users Based on Tags and Relation Network
Xiong Huixiang(), Jiang Wuxuan
School of Information Management, Central China Normal University, Wuhan 430079, China
Download: PDF (1265 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new model to recommend potential similar users with the help of social tags and relation network. [Methods] First, we explored characteristics of the users’ short or long-term interests based on the social tagging system. Then, we built a user-clustering model using multidimensional scaling method with the tags and relationship data. Finally, we recommended similar users based on the clustering results. The proposed model was examined with Weibo data. [Results] We found that the new model could effectively combine the characteristics of the user’s interests, and then identify the potential similar ones. [Limitations] The sample data does not include everything on user interests. Thus, we only examined the effectiveness of the proposed model with limited data. [Conclusions] The user recommendation model based on static tags and dynamic relational network could improve the personalized recommendation services.

Key wordsSocial Tagging      Tag      Relation Network      User-cluster      Multidimensional Scaling Analysis     
Received: 07 April 2017      Published: 25 August 2017
ZTFLH:  TP181  

Cite this article:

Xiong Huixiang,Jiang Wuxuan. Clustering and Recommending Users Based on Tags and Relation Network. Data Analysis and Knowledge Discovery, 2017, 1(6): 36-46.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.06.04     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I6/36

用户ID 用户昵称 微博数 关注数 粉丝数 标签 关注列表
3694919990 各国美食学起来YOU 102 390 118 986 725 新闻趣事, …微博奇葩 1857414070, …
5590998575 不懂老兮 806 41 532 314 外貌协会, …星座运势 3725773862, …
3323442082 视觉酱 100 402 238 2 478 436 教育就业, …时尚 3193150774, …
2155768741 贵州旅游广播 3 667 248 316 615 FM972, …快乐 2760471402, …
3524931687 走走客云南旅游 271 137 60 云南旅游, …自驾旅游 3273935392, …
1990226474 昆宣发布 28 722 1 023 621 450 春城艺术, …春城人物 1266286555, …
3175953062 萌萌萌熊 55 9 759 时尚, …星座命理 1642909335, …
标签 旅游 美食 时尚 生活 新闻 电影 音乐
词频 57 48 40 38 34 31 31 29 28
权重w/% 1.6239 1.3675 1.1396 1.0826 0.9687 0.8832 0.8832 0.8262 0.7977
标签 旅游 美食 时尚 生活 新闻 电影 音乐 娱乐 搞笑
词频 57 48 40 38 34 31 29 27 26
权重w/% 2.035 1.7137 1.4281 1.3567 1.2139 1.1067 1.0353 0.9639 0.9282
标签 旅游 美食 搞笑 音乐 时尚 生活 新闻 电影 娱乐
词频 80 48 48 42 40 38 34 31 27
权重w/% 2.8633 1.718 1.718 1.5032 1.4316 1.3601 1.2527 1.1095 0.9664
用户 旅游 美食 搞笑 音乐 时尚 生活 新闻 电影 娱乐
U5107361689 1 0 0 0 0 0 1 0 0
U1662055430 0 0 0 1 0 0 0 1 1
U1654603903 1 1 0 0 1 0 0 1 1
U1692712653 1 0 0 1 0 0 0 0 1
U1651891204 1 0 0 0 1 0 0 0 0
U3524931687 0 1 0 0 0 1 1 0 0
U2040810221 1 1 0 0 1 0 1 0 0
U1215144691 1 1 0 1 1 0 0 1 0
U2684123023 0 1 0 1 1 0 0 1 0
dij U1 U80 U160 U161 U240 U332
U1 0 0.875 0.777778 0.818182 0.9375 0.909091
U2 0.9 0.9 0.666667 0.5 0.75 0.75
U80 0.875 0 0.777778 0.818182 0.9375 0.8
U160 0.777778 0.777778 0 0.5 0.888889 0.75
U161 0.818182 0.818182 0.5 0 0.8 0.833333
U240 0.9375 0.9375 0.888889 0.8 0 0.777778
U332 0.909091 0.8 0.75 0.833333 0.777778 0
用户 关注列表
U3694919990 5186027114, 5182575519…
U3948635268 1642630543, 5982981128…
U3323442082 5186027114, 3440325930…
U2155768741 3766659924, 3752852352…
U3524931687 2997829562, 5611200000…
U1990226474 5878659096, 5768117490…
U1108476625 5991719510, 2781627392…
U3175953062 2705706381, 3003417253…
U2912473701 5357651574, 2415848337…
U1288915263 3937348351, 1289945134…
U2029728883 5785953533, 3174322363…
U5177961014 5796731205, 1999607273…
U2206498342 2703907413, 5465835912…
U3101945993 5980283108, 5980023345…
U5721022666 5581785513, 2850809427…
用户 F5186027114 F5608272697 F3756087501 F2803301701 F2516014697
U1846588483 1 0 0 0 0
U2542011901 1 0 0 0 0
U1692712653 1 0 0 0 1
U1644572034 1 0 0 0 0
U1781457455 0 0 0 0 0
U5107361689 0 0 0 0 0
U2542011901 1 0 0 0 0
U2834863492 0 1 1 1 1
U3524931687 0 1 1 1 0
U1203156407 0 0 1 0 0
dij U1 U80 U160 U161 U240 U332
U1 0 0.963350 0.988636 0.970149 0.991701
U2 0.994350 0.992753 1 0.993827 1 1
U80 0.963350 0 0.994680 0.994186 0.991525 0.997076
U160 0.988636 0.994680 0 0.995762 0.992187 0.987012
U161 0.987654 0.994186 0.995762 0 0.996491 0.989664
U240 0.970149 0.991525 0.992187 0.996491 0 0.992882
U332 0.991701 0.997076 0.987012 0.989664 0.992882 0
用户 标签MDS 关注MDS
U2612101423 0.049094493 -0.034319904
U1846588483 0.014763293 -0.011171253
U1306794125 0.055376563 -0.034743694
U5179732445 0.50130544 -0.036149048
U5761248787 0.50130544 -0.004671656
U1665102492 0.04820318 -0.033469629
U2647197351 0.033225349 -0.046390183
U5961019705 0.034749234 -0.03427661
U1781457455 0.043747374 -0.034271488
U5107361689 -0.055230674 0.114665726
U2542011901 0.046136223 -0.000205833
U2871542364 0.058303826 -0.042518174
U2834863492 0.05151389 0.004734437
U2624882007 -0.081583674 -0.027694683
U1692712653 -0.08441402 -0.004928777
U1644572034 0.052114494 0.095748648
U1651891204 -0.139576002 -0.029852541
U2094215167 0.050809285 0.003524086
U3524931687 -0.10443334 -0.023421971
指标
TOT.Withinss簇群内距离平方总和 0.1385733
Betweenss簇群间距离平方总和 7.615879
用户ID 用户昵称 标签 关注列表
2132089917 陈秋实和他的朋友们 语录, 新闻, 美剧, 运动, 80后, 传媒, 写作, 处女座 1803526210, 1854768217, …
用户ID 用户昵称 标签 关注列表
2132089917 陈秋实和他的朋友们 语录, 新闻, 美剧, 运动, 80后, 传媒, 写作, 处女座 1803526210, 1854768217, …
1448466905 非要马甲线 下厨房, 营养学, 健身, 爱, 天蝎, 美食, 旅游 1690832323, 1238296465, …
1592611830 演员李健 天蝎座 1870958692, 5941080382, …
2307134004 STAGExx 时尚, 美食, 音乐, 电影, 旅游 1813787671, 1812640242, …
3173913704 葡萄sasa定制店 旅游, 时尚 5646244946, 3944457562, …
1254995044 山外有 电脑, 宅, 书, 纪录片, 摄影, 西南交通大学, 四川大学 64230524, 3208535250, …
[1] 熊回香, 王学东. 大众分类体系中标签概念空间的构建研究[J]. 情报学报, 2012, 31(9): 984-992.
doi: 10.3772/j.issn.1000-0135.2012.09.011
[1] (Xiong Huixiang, Wang Xuedong.Research on Tag Concept Space Construction in Folksonom System[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(9): 984-992.)
doi: 10.3772/j.issn.1000-0135.2012.09.011
[2] 熊回香, 杨雪萍. 社会化标注系统中的个性化信息推荐研究[J]. 情报学报, 2016, 35(5): 549-560.
doi: 10.3772/j.issn.1000-0135.2016.005.011
[2] (Xiong Huixiang, Yang Xueping.Personalized Information Recommendation Research Based on Combined Condition in Folksonomies[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 549-560.)
doi: 10.3772/j.issn.1000-0135.2016.005.011
[3] Arekar T, Sonar M R S, Uke N J. A Survey on Recommendation System[J]. IOSR Journal of Computer Engineering, 2015, 5(1): 1-4.
[4] Adomavicius G, Tuzhilin A.Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions[J]. IEEE Transactions on Knowledge & Data Engineering, 2005, 17(6): 734-749.
doi: 10.1007/978-3-319-00372-6_5
[5] 何晓林. 基于用户兴趣学习的个性化信息服务模型研究[D]. 北京: 北京交通大学, 2008.
[5] (He Xiaolin.Research on Personalized Information Service Model Based on User Interest Study[D]. Beijing: Beijing Jiaotong University, 2008.)
[6] 易明, 操玉杰, 沈劲枝, 等. 社会化标签系统中基于密度聚类的Web用户兴趣建模方法[J]. 情报学报, 2011, 30(1): 37-43.
doi: 10.3772/j.issn.1000-0135.2011.01.005
[6] (Yi Ming, Cao Yujie, Shen Jinzhi, et al.An Approach to Web User Interest Modeling Based on Density-based Clustering Algorithm in the Social Tag System[J]. Journal of the China Society for Scientific and Technical Information, 2012, 30(1): 37-43.)
doi: 10.3772/j.issn.1000-0135.2011.01.005
[7] 王向前, 李慧宗. 基于资源内容聚类的社会化标签聚类方法[J]. 情报杂志, 2016, 35(11): 141-145.
doi: 10.3969/j.issn.1002-1965.2016.11.026
[7] (Wang Xiangqian, Li Huizong.A Method of Tag Clustering Based on Resource Contents[J]. Journal of Intelligence, 2016, 35(11): 141-145.)
doi: 10.3969/j.issn.1002-1965.2016.11.026
[8] Shepitsen A, Gemmell J, Mobasher B, et al.Personalized Recommendation in Social Tagging Systems Using Hierarchical Clustering[C]//Proceedings of the 2008 ACM Conference on Recommender Systems, 2008: 259-266.
[9] Gemmell J, Shepitsen A, Mobasher B, et al.Personalizing Navigation in Folksonomies Using Hierarchical Tag Clustering[C]//Proceedings of International Conference on Data Warehousing and Knowledge Discovery. Springer Berlin Heidelberg, 2008: 196-205.
[10] 卢小宾, 孟玺, 张进. 基于词共现的社会化标签研究热点可视化分析[J]. 情报学报, 2012, 31(2): 204-212.
doi: 10.3772/j.issn.1000-0135.2012.02.012
[10] (Lu Xiaobin, Meng Xi, Zhang Jin.Visualization of Hot Topics in Social Tagging Based on Co-words Analysis Method[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(2): 204-212.)
doi: 10.3772/j.issn.1000-0135.2012.02.012
[11] 柴彦. 基于共词聚类分析方法的知识管理国内研究述评[J]. 情报科学, 2015, 33(4): 149-153.
[11] (Chai Yan.Review of Knowledge Management Based on Co-Word Clustering Analysis[J]. Information Science, 2015, 33(4): 149-153.)
[12] Masnick A M, Valenti S S, Cox B D, et al.A Multidimensional Scaling Analysis of Students’ Attitudes about Science Careers[J]. International Journal of Science Education, 2010, 32(5): 653-667.
doi: 10.1080/09500690902759053
[13] 黄红霞, 章成志. 中文微博用户标签的调查分析——以新浪微博为例[J]. 现代图书情报技术, 2012(10): 49-54.
[13] (Huang Hongxia, Zhang Chengzhi.Investigation and Analysis of Chinese Microblog UserTags——Using Sina Weibo as Example[J]. New Technology of Library and Information Service, 2012(10): 49-54.)
[14] 薛毅, 陈立萍. 统计建模与R软件[M]. 第1版. 北京: 清华大学出版社, 2007.
[14] (Xue Yi, Chen Liping.Statistical Modeling and R Software[M]. The 1st Edition. Beijing: Tsinghua University Press, 2007.)
[15] Cherven K.Network Graph Analysis and Visualization with Gephi[M]. Packt Publishing, 2013.
[16] 郭婷, 郑颖. 数据挖掘在国内图书情报领域的应用现状分析——基于文献计量分析和共词分析[J]. 情报科学, 2015, 33(10): 91-98.
[16] (Guo Ting, Zheng Ying.Research on the Application of Data Mining in the Field of Library and Information Science in China——Based on Bibliometric Analysis and Co-word Analysis[J]. Information Science, 2015, 33(10): 91-98.)
[17] Wikipedia. Multidimensional Scaling[EB/OL].[2016-11-01]. .
[18] Harrington P.机器学习实战[M]. 曲亚东, 李锐, 王斌等译. 第1版. 北京: 人民邮电出版社, 2013: 184-185.
[18] (Harrington P.Machine Learning in Action[M]. Translated by Qu Yadong, Li Rui, Wang Bin, et al. The 1st Edition. Beijing: Posts & Telecom Press, 2013: 184-185.)
[19] 张宇献, 刘通, 董晓, 等. 基于改进划分系数的模糊聚类有效性函数[J]. 沈阳工业大学学报, 2014, 36(4): 431-435.
doi: 10.7688/j.issn.1000-1646.2014.04.14
[19] (Zhang Yuxian, Liu Tong, Dong Xiao, et al.Validity Function for Fuzzy Clustering Based on Improved Partition Coefficient[J]. Journal of Shenyang University of Technology, 2014, 36(4): 431-435.)
doi: 10.7688/j.issn.1000-1646.2014.04.14
[20] 朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 30(S2): 139-141.
[20] (Zhu Lianjiang, Ma Bingxian, Zhao Xuequan.Clustering Validity Analysis Based on Silhouette Coefficient[J]. Journal of Computer Applications, 2010, 30(S2): 139-141.)
[1] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[3] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[4] Wang Yuan, Shi Kaize, Niu Zhendong. Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship[J]. 数据分析与知识发现, 2021, 5(10): 71-80.
[5] Zhao Yuxiang,Lian Jingwen. Review of Cultural Heritage Crowdsourcing in the Domain of Digital Humanities[J]. 数据分析与知识发现, 2021, 5(1): 36-55.
[6] Ye Jiaxin,Xiong Huixiang,Tong Zhaoli,Meng Qiuqing. Collaborative Tagging for Doctors in Online Medical Community[J]. 数据分析与知识发现, 2020, 4(6): 118-128.
[7] Xiong Huixiang,Li Xiaomin,Li Yueyan. Group Recommendation Based on Attribute Mining of Book Reviews[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[8] Liu Liu,Qin Tianyun,Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. 数据分析与知识发现, 2020, 4(12): 68-75.
[9] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[10] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[11] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
[12] Jiaxin Ye,Huixiang Xiong. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. 数据分析与知识发现, 2019, 3(2): 21-32.
[13] Chongwu Bi,Guanghui Ye,Mingqian Li,Jieyan Zeng. Discovering City Profile Based on Tag Semantic Mining[J]. 数据分析与知识发现, 2019, 3(12): 41-51.
[14] Wuxuan Jiang,Huixiang Xiong,Jiaxin Ye,Ning An. Creating Dynamic Tags for Social Networking Groups[J]. 数据分析与知识发现, 2019, 3(10): 98-109.
[15] Ye Guanghui,Hu Jinglan,Xu Jian,Xia Lixin. Analyzing Growth Trends and Attachment Mode of Social Blog Tags[J]. 数据分析与知识发现, 2018, 2(6): 70-78.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn