Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (5): 89-98     https://doi.org/10.11925/infotech.2096-3467.2021.1068
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
电商异构网络中基于多层信息融合的用户社区划分算法*
冯勇1,徐文韬1,王嵘冰1(),徐红艳1,张永刚2
1辽宁大学信息学院 沈阳 110036
2吉林大学符号计算与知识工程教育部重点实验室 长春 130012
User Community Partition Based on Multi-layer Information Fusion in E-commerce Heterogeneous Network
Feng Yong1,Xu Wentao1,Wang Rongbing1(),Xu Hongyan1,Zhang Yonggang2
1College of Information, Liaoning University, Shenyang 110036, China
2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
全文: PDF (929 KB)   HTML ( 28
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 当前用户社区划分算法大多因缺乏对电商网络异构性的考量,导致社区划分准确度不高。为此,本文提出一种电商异构网络中基于多层信息融合的用户社区划分算法。【方法】 根据不同关系类型对电商异构网络进行分层处理,构造基于不同关系类型的用户节点嵌入;通过表征融合将不同层的用户嵌入合并,获得电商异构网络中的用户融合嵌入表征;使用目标函数优化用户节点的相关参数;最后,通过改进的K-means算法形成用户聚类,得到合理的用户社区划分结果。【结果】 本文所提算法与基于DeepWalk、Node2Vec、GCN等主流用户社区划分算法中的次优算法相比,在NMI和Sim@5指标上分别提升6.4%和1.7%,在有效表征用户节点及精确划分用户社区方面都有良好的表现。【局限】 未考虑电商异构网络中所包含的时间信息,同时忽略了网络中噪声点所产生的影响。【结论】 本文算法切实有效,在电商领域有助于提升好友预测、群组推荐等核心应用的性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
冯勇
徐文韬
王嵘冰
徐红艳
张永刚
关键词 异构网络电子商务表征学习社区划分    
Abstract

[Objective] This paper proposes a new algorithm based on multi-layer information fusion in an e-commerce heterogeneous network, aiming to improve the accuracy of user community division. [Methods] First, we conducted hierarchical processing of the e-commerce heterogeneous networks and constructed user node embeddings based on different relationship types. Then, we merged users of different layers and obtained their embedding characterization in e-commerce heterogeneous networks. Third, we used the objective function to optimize the relevant parameters of the user nodes. Finally, we clustered these users with an improved K-means algorithm, and created the reasonable community division. [Results] The NMI and Sim@5 indicators of the proposed algorithm were 6.4% and 1.7% higher than the existing algorithms based on DeepWalk, Node2Vec, and GCN. The model effectively characterized user nodes and accurately divided their communities. [Limitations] We did not examine the time information and noise points from the heterogeneous network. [Conclusions] The proposed algorithm could improve the performance of friend prediction, group recommendation and other applications.

Key wordsHeterogeneous Network    E-commerce    Representation Learning    Community Division
收稿日期: 2021-09-22      出版日期: 2022-06-21
ZTFLH:  TP302  
  G202  
基金资助:*吉林大学教育部符号计算与知识工程重点实验室资助项目(93K172018K01);辽宁省教育厅科学研究基金面上项目的研究成果之一(LJKZ0085)
通讯作者: 王嵘冰,ORCID:0000-0003-4129-7093     E-mail: wrb@lnu.edu.cn
引用本文:   
冯勇, 徐文韬, 王嵘冰, 徐红艳, 张永刚. 电商异构网络中基于多层信息融合的用户社区划分算法*[J]. 数据分析与知识发现, 2022, 6(5): 89-98.
Feng Yong, Xu Wentao, Wang Rongbing, Xu Hongyan, Zhang Yonggang. User Community Partition Based on Multi-layer Information Fusion in E-commerce Heterogeneous Network. Data Analysis and Knowledge Discovery, 2022, 6(5): 89-98.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1068      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I5/89
Fig.1  不同类型网络
Fig.2  UCEH算法框架
Fig.3  表征融合
数据集 节点类型 节点数量 边类型 边类型对应的网络层 边数量
亚马逊 用户 6 506 用户查看产品 ULP 84 853
电子产品 3 660 用户购买产品 UBP 64 012
Netflix 用户
电影
398 556
81 633
用户评价电影 UEM 17 770
阿里巴巴 用户
商品
4 129
2 034
点击 UKI 4 108
添加至首选项 UPI 3 853
添加到购物车 UCI 7 153
不同类型节点之间转换 CON 2 751
Yelp 用户
商家
34 908
20 502
签到 URS 149 439
评分 UGS 41 843
标记 UMS 38 625
Table 1  数据集对比
数据集 Amazon Netflix Alibaba Yelp
指标 NMI Sim@5 NMI Sim@5 NMI Sim@5 NMI Sim@5
DeepWalk
Node2Vec
0.083
0.074
0.726
0.738
0.117
0.123
0.490
0.487
0.348
0.382
0.629
0.628
0.311
0.309
0.704
0.710
MetaPath2Vec
DGI
0.086
0.007
0.747
0.558
0.129
0.182
0.492
0.578
0.387
0.551
0.635
0.786
0.317
0.641
0.715
0.889
GCN 0.287 0.624 0.176 0.565 0.465 0.724 0.671 0.867
GAT
HAN
0.301
0.029
0.630
0.495
0.183
0.164
0.550
0.561
0.468
0.472
0.726
0.779
0.668
0.658
0.873
0.872
UCEH-AP 0.341 0.745 0.189 0.603 0.558 0.776 0.685 0.876
UCEH 0.344 0.753 0.194 0.605 0.563 0.787 0.691 0.898
Table 2  不同算法在两类实验中的结果比较
数据集 Amazon Netflix
网络层 ULP UBP UEM
指标 NMI Sim@5 NMI Sim@5 NMI Sim@5
E. 0.002 0.395 0.003 0.414 0.145 0.549
E.+R. 0.002 0.399 0.003 0.426 0.150 0.552
E.+I. 0.152 0.512 0.143 0.517 0.193 0.595
E.+I.+J. 0.169 0.544 0.153 0.525 0.194 0.592
数据集 Alibaba
网络层 UKI UPI UCI CON
指标 NMI Sim@5 NMI Sim@5 NMI Sim@5 NMI Sim@5
E. 0.526 0.698 0.651 0.872 0.089 0.495 0.547 0.801
E.+R. 0.525 0.728 0.659 0.874 0.079 0.490 0.564 0.804
E.+I. 0.527 0.708 0.656 0.882 0.143 0.526 0.569 0.802
E.+I.+J. 0.528 0.716 0.662 0.886 0.142 0.527 0.562 0.805
数据集 Yelp
网络层 URS UGS UMS
指标 NMI Sim@5 NMI Sim@5 NMI Sim@5
E. 0.404 0.740 0.054 0.583 0.038 0.701
E.+R. 0.421 0.741 0.051 0.568 0.020 0.661
E.+I. 0.405 0.741 0.053 0.569 0.401 0.824
E.+I.+J. 0.408 0.742 0.055 0.591 0.407 0.826
Table 3  本文算法在两类实验中的消融实验
Fig.4  各层网络的NMI值与注意力权重
[1] Valdeolivas A, Tichit L, Navarro C, et al. Random Walk with Restart on Multiplex and Heterogeneous Biological Networks[J]. Bioinformatics, 2018, 35(3): 497-505.
doi: 10.1093/bioinformatics/bty637
[2] Bagavathi A, Krishnan S. Multi-Net: A Scalable Multiplex Network Embedding Framework[C]// Proceedings of the 7th International Conference on Complex Networks and Their Applications. 2019: 119-131.
[3] 陈文杰, 文奕, 杨宁. 基于节点向量表示的模糊重叠社区划分算法[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[3] ( Chen Wenjie, Wen Yi, Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 41-50.)
[4] 冶忠林, 曹蓉, 赵海兴, 等. 基于矩阵分解的DeepWalk链路预测算法[J]. 计算机应用研究, 2020, 37(2): 424-429.
[4] ( Ye Zhonglin, Cao Rong, Zhao Haixing, et al. Link Prediction Based on Matrix Factorization for DeepWalk[J]. Application Research of Computers, 2020, 37(2): 424-429.)
[5] 王文涛, 吴淋涛, 黄烨, 等. 基于密集连接卷积神经网络的链路预测模型[J]. 计算机应用, 2019, 39(6): 1632-1638.
[5] ( Wang Wentao, Wu Lintao, Huang Ye, et al. Link Prediction Model Based on Densely Connected Convolutional Network[J]. Journal of Computer Applications, 2019, 39(6): 1632-1638.)
[6] 葛尧, 陈松灿. 面向推荐系统的图卷积网络[J]. 软件学报, 2020, 31(4): 1101-1112.
[6] ( Ge Yao, Chen Songcan. Graph Convolutional Network for Recommender Systems[J]. Journal of Software, 2020, 31(4): 1101-1112.)
[7] Zhang H M, Qiu L W, Yi L L, et al. Scalable Multiplex Network Embedding[C]// Proceedings of the 27th International Joint Conference on A.pngicial Intelligence. 2018: 3082-3088.
[8] Shi C, Li Y T, Zhang J W, et al. A Survey of Heterogeneous Information Network Analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17-37.
doi: 10.1109/TKDE.2016.2598561
[9] Tang L, Liu H. Uncovering Cross-Dimension Group Structures in Multi-dimensional Networks[C]// Proceedings of the 2009 SDM Workshop on Analysis of Dynamic Networks. 2009: 568-575.
[10] Papalexakis E E, Akoglu L, Ience D. Do More Views of a Graph Help? Community Detection and Clustering in Multi-Graphs[C]// Proceedings of the 16th International Conference on Information Fusion. IEEE, 2013: 899-905.
[11] Boutemine O, Bouguessa M. Mining Community Structures in Multidimensional Networks[J]. ACM Transactions on Knowledge Discovery from Data, 2017, 11(4): 1-36.
[12] 张宜浩, 朱小飞, 徐传运, 等. 基于用户评论的深度情感分析和多视图协同融合的混合推荐方法[J]. 计算机学报, 2019, 42(6): 1316-1333.
[12] ( Zhang Yihao, Zhu Xiaofei, Xu Chuanyun, et al. Hybrid Recommendation Approach Based on Deep Sentiment Analysis of User Reviews and Multi-View Collaborative Fusion[J]. Chinese Journal of Computers, 2019, 42(6): 1316-1333.)
[13] Cen Y K, Zou X, Zhang J W, et al. Representation Learning for Attributed Multiplex Heterogeneous Network[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 1358-1368.
[14] You Q Z, Jin H L, Wang Z W, et al. Image Captioning with Semantic Attention[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 4651-4659.
[15] Veličković P, Fedus W, Hamilton W L, et al. Deep Graph Infomax[OL]. arXiv Preprint, arXiv: 1809.10341.
[16] McGill W. Multivariate Information Transmission[J]. Transactions of the IRE Professional Group on Information Theory, 1954, 4(4): 93-111.
doi: 10.1109/TIT.1954.1057469
[17] 周世兵, 徐振源, 唐旭清. K-means算法最佳聚类数确定方法[J]. 计算机应用, 2010, 30(8): 1995-1998.
doi: 10.3724/SP.J.1087.2010.01995
[17] ( Zhou Shibing, Xu Zhenyuan, Tang Xuqing. Method for Determining Optimal Number of Clusters in K-Means Clustering Algorithm[J]. Journal of Computer Applications, 2010, 30(8): 1995-1998.)
doi: 10.3724/SP.J.1087.2010.01995
[18] 丁义, 杨建. 欧氏距离与标准化欧氏距离在k近邻算法中的比较[J]. 软件, 2020, 41(10): 135-136.
[18] ( Ding Yi, Yang Jian. Comparison Between Euclidean Distance and Standardized Euclidean Distance in K-Nearest Neighbor Algorithm[J]. Computer Engineering & Software, 2020, 41(10): 135-136.)
[19] Park C, Kim D, Han J W, et al. Unsupervised Attributed Multiplex Network Embedding[J]. Proceedings of the AAAI Conference on A.pngicial Intelligence, 2020, 34(4): 5371-5378.
[20] Wang X, Ji H Y, Shi C, et al. Heterogeneous Graph Attention Network[C]// Proceedings of the World Wide Web Conference. ACM, 2019: 2022-2032.
[21] He R N, McAuley J. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering[C]// Proceedings of the 25th International Conference on World Wide Web. 2016: 507-517.
[22] Bennett J, Lanning S. The Netflix Prize[C]// Proceedings of KDD Cup and Workshop 2007.ACM Press, 2007: 35-38.
[23] Zhang Y, Pang L, Shi L,et al. Large Scale Purchase Prediction with Historical User Actions on B2C Online Retail Platform[OL]. arXiv Preprint, arXiv: 1408.6515.
[24] Byers J W, Mitzenmacher M, Zervas G. The Groupon Effect on Yelp Ratings: A Root Cause Analysis[C]// Proceedings of the 13th ACM Conference on Electronic Commerce. 2012: 248-265.
[25] Dong Y X, Chawla N V, Swami A. MetaPath2Vec: Scalable Representation Learning for Heterogeneous Networks[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 135-144.
[26] 许晶航, 左万利, 梁世宁, 等. 基于图注意力网络的因果关系抽取[J]. 计算机研究与发展, 2020, 57(1): 159-174.
[26] ( Xu Jinghang, Zuo Wanli, Liang Shining, et al. Causal Relation Extraction Based on Graph Attention Networks[J]. Journal of Computer Research and Development, 2020, 57(1): 159-174.)
[27] Chen W J, Gu Y L, Ren Z C, et al. Semi-Supervised User Profiling with Heterogeneous Graph Attention Networks[C]// Proceedings of the 28th International Joint Conference on A.pngicial Intelligence. 2019: 2116-2122.
[1] 余传明,钟韵辞,林奥琛,安璐. 基于网络表示学习的作者重名消歧研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[2] 张纯金,郭盛辉,纪淑娟,杨伟,伊磊. 基于多属性评分隐表征学习的群组推荐算法*[J]. 数据分析与知识发现, 2020, 4(12): 120-135.
[3] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[4] 王宇, 李秀秀. 基于电子商务评论的商家信誉维度构建*[J]. 数据分析与知识发现, 2017, 1(8): 59-67.
[5] 薛福亮, 刘君玲. 基于用户间信任关系改进的协同过滤推荐方法*[J]. 数据分析与知识发现, 2017, 1(7): 90-99.
[6] 朱鹏, 赵笑笑, 伍薇. 移动电子商务消费者决策偏好影响因素实证研究*[J]. 数据分析与知识发现, 2017, 1(3): 1-9.
[7] 张文君, 王军, 徐山川. 电商用户需求状态的聚类分析——以淘宝网女装为例[J]. 现代图书情报技术, 2015, 31(3): 67-74.
[8] 高劲松, 梁艳琪, 李珂, 肖涟, 周习曼. 面向关联数据的电子商务信用信息服务模型研究[J]. 现代图书情报技术, 2014, 30(6): 8-16.
[9] 孙霄凌, 赵宇翔, 朱庆华. 在线商品评论系统功能需求的Kano模型分析——以我国主要购物网站为例[J]. 现代图书情报技术, 2013, (6): 76-84.
[10] 沈洪洲, 宗乾进, 袁勤俭. 应用Google云消息框架C2DM实现商务信息推送服务[J]. 现代图书情报技术, 2012, 28(6): 78-83.
[11] 李慧, 刘东苏. 消除用户主观评价差异的电子商务信誉模型[J]. 现代图书情报技术, 2012, 28(2): 48-52.
[12] 李聪. 电子商务协同过滤可扩展性研究综述[J]. 现代图书情报技术, 2010, 26(11): 37-41.
[13] 李聪. ECRec: 基于协同过滤的电子商务个性化推荐管理*[J]. 现代图书情报技术, 2009, (10): 34-39.
[14] 李纲,安璐. 基于SOM的手机电子商务交易聚类分析*[J]. 现代图书情报技术, 2008, 24(9): 70-77.
[15] 马丽. 基于组合加权评分的Item-based协同过滤算法[J]. 现代图书情报技术, 2008, 24(11): 60-64.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn