Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (9): 88-99     https://doi.org/10.11925/infotech.2096-3467.2018.0342
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于迁移成分分析的多层社交网络链接分类*
伍杰华1,2(), 沈静1, 周蓓1
1广东工贸职业技术学院计算机与信息工程学院 广州 510510
2华南理工大学计算机科学与工程学院 广州 510641
Classifying Multilayer Social Network Links Based on Transfer Component Analysis
Wu Jiehua1,2(), Shen Jing1, Zhou Bei1
1College of Computer Science and Information Engineering, Guangdong Polytechnic of Industry and Commerce, Guangzhou 510510, China
2School of Computer and Engineering, South China University of Technology, Guangzhou 510641, China
全文: PDF (980 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】解决多层次社交网络链接分类算法无法有效获取各层次间子网络的关联信息, 从而提高分类性能的问题。【方法】定义反映各子网络间关联的共性特征和反映各子网络自身属性结构的特性特征, 提出一种基于迁移成分分析的多层次社交网络链接分类算法, 获取反映层次间相互关联特征的组件, 使得不同层次的子网络能够相互学习。【结果】通过在真实多层次数据集YouTube和QueryLog上与基准分类算法、基于特征学习的分类算法和基准迁移分类算法进行比较, 在AUC和ROC曲线的评价指标上有1.57%-33.2%的提升。【局限】未能处理超大规模的网络数据, 同时尚未深入探讨特征定义的维度和性能之间的关系。【结论】本文方法能够有效地将迁移学习思想应用到多层次社交网络链接分类场景, 为该类模型的研究提供一种新的方案。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
伍杰华
沈静
周蓓
关键词 多层次网络社交网络链接分类成分分析迁移学习    
Abstract

[Objective] The paper aims to address the issues facing multi-layer social network link classification algorithms, which cannot effectively correlate information among sub-networks to improve classification. [Methods] First, we defined the common features reflecting the correlation between sub-network. Then we defined individuality features reflecting the characteristics of each sub-network’s own attributes. Third, we proposed an algorithm to classify multilayer social network links based on transfer component analysis. This algorithm collects characteristics of the correlation between layers, which makes sub-networks learn from each other. [Results] We compared the proposed model with the benchmark classification algorithm, feature selection based classification algorithm, and the benchmark transfer based classification algorithm on two real multi-layer datasets from YouTube and QueryLog. The performance of our algorithm on evaluation metrics of AUC and ROC curves were significantly improved. The evaluation index of the larger promotion curve has at least 1.57% and at most 33.2% improvement. [Limitations] We did not examine very large-scale network data with the proposed model. The relationship between the layers and performance of feature definition needs more discussion. [Conclusions] The proposed method effectively applies transfer learning to the classification of multilayer social network links and offers new directions for future studies.

Key wordsMultilayer Network    Social Network    Link Classification    Component Analysis    Transfer Learning
收稿日期: 2018-03-28      出版日期: 2018-10-25
ZTFLH:  分类号: TP391 G35  
基金资助:*本文系广东省科技计划项目“大规模异构复杂网络链接预测理论/算法和应用研究”(项目编号: 2017ZC0348)、广东省高等学校优秀青年教师培养计划项目“大规模异构复杂网络链接预测理论的研究和应用”(项目编号: YQ2015177)和广东高校重大科研项目与成果培育计划项目“多源异质网络链接预测关键技术研究”(项目编号: 2017GKTSCX009)的研究成果之一
引用本文:   
伍杰华, 沈静, 周蓓. 基于迁移成分分析的多层社交网络链接分类*[J]. 数据分析与知识发现, 2018, 2(9): 88-99.
Wu Jiehua,Shen Jing,Zhou Bei. Classifying Multilayer Social Network Links Based on Transfer Component Analysis. Data Analysis and Knowledge Discovery, 2018, 2(9): 88-99.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0342      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I9/88
  多层次社交网络结构及算法流程
  YouTube数据集中异构链接分布
  YouTube数据集中共邻节点和链接关系
  子网络度分布
  子网络聚类系数分布
网络 节点 关系 链接 平均度 聚类系数 平均共邻节点
YouTube 1 000 Friends(F) 8 915 36.61 0.56 6.14
Subscriptions(S) 31 282 129.59 0.71 21.74
Videos(V) 11 472 47.44 0.52 9.34
QueryLog 757 Bin1 13 764 36.36 0.53 10.18
Bin2 11 994 31.69 0.49 8.49
Bin3 9 037 22.29 0.44 6.77
  多层网络结构属性
r=0.8 S→F V→F F→S V→S F→V S→V
LR 0.5168(±0.267) 0.5173(±0.225) 0.5188(±0.326) 0.4872(±0.207) 0.5528(±0.317) 0.5498(±0.382)
SVM 0.5771(±0.571) 0.5747(±0.548) 0.5742(±0.595) 0.5786(±0.601) 0.5949(±0.573) 0.5977(±0.662)
FS-LR 0.5304(±0.389) 0.5337(±0.527) 0.5279(±0.636) 0.5263(±0.676) 0.6091(±0.882) 0.6042(±0.832)
FS-SVM 0.6042(±0.428) 0.6066(±0.444) 0.5848(±0.532) 0.5853(±0.593) 0.6345(±0.712) 0.6336(±0.688)
TLR 0.5207(±0.396) 0.6587(±0.356) 0.5682(±0.428) 0.5526(±0.433) 0.6257(±0.498) 0.5531(±0.524)
TSVM 0.6323(±0.436) 0.6911(±0.424) 0.6359(±0.449) 0.6343(±0.382) 0.6967(±0.507) 0.5699(±0.442)
TCA 0.6889(±0.547) 0.7048(±0.627) 0.7789(±0.673) 0.7548(±0.752) 0.8109(±0.736) 0.7925(±0.627)
r =0.9 S→F V→F F→S V→S F→V S→V
LR 0.4936(±0.243) 0.4939(±0.182) 0.4632(±0.221) 0.4628(±0.234) 0.5364(±0.322) 0.5345(±0.442)
SVM 0.5592(±0.615) 0.5603(±0.627) 0.5586(±0.532) 0.5577(±0.657) 0.5862(±0.581) 0.5862(±0.626)
FS-LR 0.5112(±0.434) 0.5118(±0.425) 0.5029(±0.453) 0.5049(±0.417) 0.5703(±0.633) 0.5853(±0.548)
FS-SVM 0.5855(±0.508) 0.5873(±0.486) 0.5676(±0.597) 0.5774(±0.536) 0.6035(0.727) 0.6044(±0.793)
TLR 0.4922(±0.354) 0.6411(±0.415) 0.5421(±0.428) 0.5281(±0.519) 0.6011(±0.463) 0.5352(±0.457)
TSVM 0.6068(±0.487) 0.6748(±0.397) 0.6077(±0.458) 0.6081(±0.425) 0.6752(±0.686) 0.5524(±0.417)
TCA 0.6677(±0.679) 0.6854(±0.682) 0.7638(±0.774) 0.7328(±0.746) 0.7924(±0.611) 0.7781(±0.615)
  YouTube数据集下各模型的分类结果(标准差基准单位是10-3)
r=0.8 Bin2→Bin1 Bin3→Bin1 Bin1→Bin2 Bin3→Bin2 Bin1→Bin3 Bin2→Bin3
LR 0.5425(±0.284) 0.5476(±0.245) 0.4983(±0.309) 0.5184(±0.228) 0.4947(±0.302) 0.5058(±0.359)
SVM 0.5492(±0.456) 0.5487(±0.472) 0.5624(±0.544) 0.5779(±0.430) 0.5237(±0.573) 0.5277(±0.662)
FS-LR 0.5480(±0.339) 0.5471(±0.315) 0.4957(±0.582) 0.5187(±0.628) 0.4999(±0.857) 0.5132(±0.741)
FS-SVM 0.5523(±0.229) 0.5538(±0.254) 0.5672(±0.381) 0.5798(±0.430) 0.6345(±0.554) 0.6336(±0.513)
TLR 0.5472(±0.297) 0.5477(±0.331) 0.5080(±0.366) 0.5194(±0.327) 0.5008(±0.439) 0.5079(±0.515)
TSVM 0.5508(±0.302) 0.5532(±0.326) 0.5312(±0.385) 0.5866(±0.374) 0.5335(±0.487) 0.5332(±0.468)
TCA 0.8108(±1.782) 0.8176(±1.856) 0.7782(±1.654) 0.7582(±1.788) 0.8024(±1.459) 0.8045(±1.565)
r =0.9 Bin2→Bin1 Bin3→Bin1 Bin1→Bin2 Bin3→Bin2 Bin1→Bin3 Bin2→Bin3
LR 0.5286(±0.298) 0.5278(±0.122) 0.4747(±0.215) 0.4802(±0.272) 0.4957(±0.303) 0.4981(±0.371)
SVM 0.5428(±0.475) 0.5433(±0.472) 0.5482(±0.528) 0.5492(±0.552) 0.5039(±0.466) 0.5082(±0.549)
FS-LR 0.5288(±0.418) 0.5282(±0.413) 0.4832(±0.339) 0.4884(±0.306) 0.4998(±0.462) 0.5074(±0.505)
FS-SVM 0.5467(±0.505) 0.5472(±0.441) 0.5561(±0.610) 0.5605(±0.633) 0.5089(±0.754) 0.5113(±0.779)
TLR 0.5237(±0.356) 0.5277(±0.301) 0.5073(±0.417) 0.5082(±0.485) 0.5117(±0.527) 0.5098(±0.533)
TSVM 0.5488(±0.536) 0.5392(±0.467) 0.5462(±0.538) 0.5433(±0.561) 0.5291(±0.758) 0.5232(±0.685)
TCA 0.7821(±1.527) 0.7847(±1.538) 0.7154(±1.475) 0.6816(±1.423) 0.7782(±1.644) 0.7866(±1.595)
  QueryLog数据集下各模型的分类结果(标准差基准单位是10-3)
  YouTube案例下的ROC曲线图
  QueryLog案例下的ROC曲线图
  不同规模源网络下各迁移案例的分类效果
  不同特征下各案例的分类效果
[1] Kim J, Hastak M.Social Network Analysis: Characteristics of Online Social Networks After a Disaster[J]. International Journal of Information Management, 2018, 38(1): 86-96.
doi: 10.1016/j.ijinfomgt.2017.08.003
[2] 林学民, 杜小勇, 李翠平. 基于图结构的大数据分析与管理技术专刊前言[J]. 软件学报, 2018, 29(3): 525-527.
[2] (Lin Xuemin, Du Xiaoyong, Li Cuiping.Preface to Special Issue of Big Data Analysis and Management Technology Based on Graph Structure[J]. Journal of Software, 2018, 29(3): 525-527.)
[3] Zafarani R, Abbasi M A, Liu H.Social Media Mining: An Introduction[M]. New York: Cambridge University Press, 2014.
[4] Wang P, Xu B W, Wu Y R, et al.Link Prediction in Social Networks: The State-of-the-Art[J]. Science China: Information Sciences, 2015, 58(1): 1-38.
[5] Enugala R, Rajamani L, Kurapati S, et al.Detecting Communities in Dynamic Social Networks Using Modularity Ensembles SOM[J]. International Journal of Rough Sets and Data Analysis (IJRSDA), 2018, 5(1): 34-43.
doi: 10.4018/IJRSDA
[6] Huang X, Li J, Hu X.Label Informed Attributed Network Embedding[C]//Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, 2017: 731-739.
[7] Dai C, Chen L, Li B, et al. Link Prediction in Multi-relational Networks Based on Relational Similarity[J]. Information Sciences, 2017, 394-395: 198-216.
doi: 10.1016/j.ins.2017.02.003
[8] Lü L, Zhou T.Link Prediction in Complex Networks: A Survey[J]. Physica A: Statistical Mechanics and Its Applications, 2011, 390(6):1150-1170.
doi: 10.1016/j.physa.2010.11.027
[9] Hasan M A, Chaoji V, Salem S, et al.Link Prediction Using Supervised Learning[C]//Proceedings of the 2006 SDM Workshop on Link Analysis, Counterterrorism and Security. 2006.
[10] Ma C, Bao Z K, Zhang H F.Improving Link Prediction in Complex Networks by Adaptively Exploiting Multiple Structural Features of Networks[OL]. arXiv Preprint, arXiv: 1608.04533.
[11] Tang J, Lou T, Kleinberg J, et al. Transfer Learning to Infer Social Ties Across Heterogeneous Networks[J].ACM Transactions on Information Systems (TOIS), 2016, 34(2): Article No.7.
[12] Scellato S, Noulas A, Mascolo C.Exploiting Place Features in Link Prediction on Location-based Social Networks[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011: 1046-1054.
[13] Boccaletti S, Bianconi G, Criado R, et al.The Structure and Dynamics of Multilayer Networks[J]. Physics Reports, 2014, 544(1): 1-122.
doi: 10.1016/j.physrep.2014.07.001
[14] Hristova D, Noulas A, Brown C, et al.A Multilayer Approach to Multiplexity and Link Prediction in Online Geo-social Networks[J]. EPJ Data Science, 2016, 5:24.
doi: 10.1140/epjds/s13688-016-0087-z
[15] Yang Y, Chawla N, Sun Y, et al.Predicting Links in Multi-relational and Heterogeneous Networks[C]// Proceedings of the 12th International Conference on Data Mining. 2012.
[16] Sun Y, Barber R, Gupta M, et al.Co-author Relationship Prediction in Heterogeneous Bibliographic Networks[C]// Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining. 2011.
[17] Pan S J, Yang Q.A Survey on Transfer Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359.
doi: 10.1109/TKDE.2009.191
[18] Pan S J, Tsang I W, Kwok J T, et al.Domain Adaptation via Transfer Component Analysis[J]. IEEE Transactions on Neural Networks, 2011, 22(2): 199-210.
doi: 10.1109/TNN.2010.2091281
[19] Borgwardt K M, Gretton A, Rasch M J, et al.Integrating Structured Biological Data by Kernel Maximum Mean Discrepancy[J]. Bioinformatics, 2006, 22(14): 49-57.
doi: 10.1093/bioinformatics/btl242
[20] Narasimhan J, Holder L.Feature Engineering for Supervised Link Prediction on Dynamic Social Networks[OL]. arXiv Preprint, arXiv: 1410.1783.
[21] Lü L, Chen D, Ren X L, et al.Vital Nodes Identification in Complex Networks[J]. Physics Reports, 2016, 650: 1-63.
doi: 10.1016/j.physrep.2016.06.007
[22] Kleinberg J M.Authoritative Sources in a Hyperlinked Environment[J]. Journal of the ACM, 1999, 46(5): 604-632.
doi: 10.1145/324133.324140
[23] Page L. The PageRank Citation Ranking: Bringing Order to the Web[R/OL]. Stanford InfoLab. .
[24] Maćkiewicz A, Ratajczak W.Principal Components Analysis[J]. Computers & Geosciences, 1993, 19(3): 303-342.
[25] Berlingerio M, Coscia M, Giannotti F, et al.Foundations of Multidimensional Network Analysis[C]//Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2011:485-489.
[26] 伍杰华. 基于RReliefF特征选择算法的复杂网络链接分类[J]. 计算机工程, 2017, 43(8):208-214.
[26] (Wu Jiehua.Complex Network Link Classification Based on RReliefF Feature Selection Algorithm[J]. Computer Engineering, 2017, 43(8): 208-214.)
[1] 陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[2] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[3] 马莹雪,赵吉昌. 自然灾害期间微博平台的舆情特征及演变*——以台风和暴雨数据为例[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[4] 林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 *[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[5] 刘伟江,魏海,运天鹤. 基于卷积神经网络的客户信用评估模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[6] 赵平,孙连英,涂帅,卞建玲,万莹. 改进的知识迁移景点实体识别算法研究及应用*[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[7] 刘彤,倪维健,孙宇健,曾庆田. 基于深度迁移学习的业务流程实例剩余执行时间预测方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 134-142.
[8] 向菲,谢耀谈. 基于混合采样与迁移学习的患者评论识别模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 39-47.
[9] 王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[10] 温彦,马立健,曾庆田,郭文艳. 基于地理信息偏好修正和社交关系偏好隐式分析的POI推荐 *[J]. 数据分析与知识发现, 2019, 3(8): 30-39.
[11] 仇丽青,贾玮,范鑫. 基于重叠社区的影响力最大化算法 *[J]. 数据分析与知识发现, 2019, 3(7): 94-102.
[12] 陈美杉,夏晨曦. 肝癌患者在线提问的命名实体识别研究:一种基于迁移学习的方法 *[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[13] 郭博, 赵隽瑞, 孙宇. 社会化问答社区用户行为统计特性及其动力学分析: 以知乎网为例[J]. 数据分析与知识发现, 2018, 2(4): 48-58.
[14] 陈远, 王超群, 胡忠义, 吴江. 基于主成分分析和随机森林的恶意网站评估与识别*[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
[15] 王飞飞, 张生太. 移动社交网络微信用户信息发布行为统计特征分析*[J]. 数据分析与知识发现, 2018, 2(4): 99-109.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn