Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (6): 79-91     https://doi.org/10.11925/infotech.2096-3467.2018.0101
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于复杂网络重叠社区的电子商务用户复合类型识别*
钱晓东(), 李敏
兰州交通大学经济管理学院 兰州 730070
Identifying E-commerce User Types Based on Complex Network Overlapping Community
Qian Xiaodong(), Li Min
School of Economics and Management, Lanzhou Jiaotong University, Lanzhou 730070, China
全文: PDF (2339 KB)   HTML ( 2
输出: BibTeX | EndNote (RIS)      
摘要 

目的】由用户特征的多样性可知, 用户往往是多重角色的混合体, 而已有研究很少涉及用户复合类型, 这不利于电子商务运营商全面地了解客户。本文提出一种电子商务用户复合类型的识别算法, 为运营商的个性化营销提供数量化依据。【方法】基于用户访问序列的特点构建节点距离矩阵; 从重定义匹配数、编辑代价和编辑规则等方面改进Jaro-Winkler Distance算法, 计算用户访问序列距离矩阵; 以距离矩阵为基础, 区分中心用户和非中心用户, 并构建用于用户复合类型识别的复杂网络; 从改进初始模块度增量矩阵等方面改进CNM算法, 获得用户类型初始划分; 再利用模糊隶属函数进行用户优化, 最终得到电子商务用户复合类型。【结果】以CONGA算法作为比较基准, 首先采用LFR基准程序生成的网络测试本文算法性能, 计算结果表明本文算法的NMI值较基准算法最高提高了15.60%; 再利用用户真实在线数据进行算法应用, 计算结果表明本文算法的整体聚类系数值较基准算法最高提高了10.87%; 且算法的时间复杂度低于基准算法。【局限】本文算法需要主观设定三个参数。【结论】用户网络符合小世界模型特性, 具有复杂网络的典型形态; 利用本文算法可以有效识别电子商务用户复合类型。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
钱晓东
李敏
关键词 用户复合类型复杂网络重叠社区访问序列距离CNM模糊隶属函数    
Abstract

[Objective] This paper presents an algorithm to identify composite types of e-commerce users, aiming to improve e-commerce operators’ personalized marketing services. [Methods] First, we built the node distance matrix based on the characteristics of user access sequences. Then, we modified the Jaro-Winkler distance algorithm from the perspectives of redefining matching number, editing cost and rules. Third, we used the improved algorithm to calculate the user access sequence distance matrix. Based on the distance matrix, we distinguished the central and non-central users to construct a complex network for identifying user composite types. We used the improved CNM algorithm to obtain the initial user types. With the help of fuzzy membership function for user optimization, we obtained their composite types. [Results] Compared to CONGA, the NMI of the proposed algorithm was improved by 15.60%. The algorithm was also applied to examine the real user’s online data, and its overall clustering coefficient was 10.87% higher than the CONGA. The time complexity of the new algorithm was reduced too. [Limitations] The proposed algorithm needs to set three parameters subjectively. [Conclusions] The user network conforms to the characteristics of a small-world model and has the typical morphology of a complex network. The algorithm can effectively identify the composite types of e-commerce users.

Key wordsUser Composite Type    Complex Network    Overlapping Communities    Access Sequence Distance    CNM    Membership Function
收稿日期: 2018-01-25      出版日期: 2018-07-11
ZTFLH:  TP393  
基金资助:*本文系国家自然科学基金项目“基于复杂网络的商务大数据聚类与管理应用研究”(项目编号: 71461017)的研究成果之一
引用本文:   
钱晓东, 李敏. 基于复杂网络重叠社区的电子商务用户复合类型识别*[J]. 数据分析与知识发现, 2018, 2(6): 79-91.
Qian Xiaodong,Li Min. Identifying E-commerce User Types Based on Complex Network Overlapping Community. Data Analysis and Knowledge Discovery, 2018, 2(6): 79-91.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0101      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I6/79
  改进的Jaro-Winkler Distance计算示例
η bt l p m' Ψ ξ dj dw dsf dsc
2/5 2/5 2 0.1 10 6 9 1 1 0.820 0.856
3/5 3/5 2 0.2 10 6 9 1 1 0.820 0.892
4/5 4/5 3 0.2 9 5 8 0.967 0.987 0.678 0.871
4/5 4/5 2 0.1 9 5 8 0.967 0.973 0.678 0.742
  不同需求条件下用户距离计算结果
  p取值1-100时${{\delta }_{cv}}$的取值
  p取值1-9时${{\delta }_{cv}}$的取值
  基于复杂网络重叠社区划分的电子商务用户复合类型识别模型
网络编号 n μ k 重叠
节点数
节点可归属
社区个数nbc
1 10 000 0.2 12.5143 20 4
2 10 000 0.2 12.6180 40 5
3 10 000 0.2 12.2637 60 6
4 10 000 0.2 12.5192 80 7
5 10 000 0.2 12.4001 100 8
6 10 000 0.2 12.3052 120 9
7 10 000 0.2 12.4211 140 10
1’ 10 000 0.3 12.5936 20 4
2’ 10 000 0.3 12.4137 40 5
3’ 10 000 0.3 12.3665 60 6
4’ 10 000 0.3 12.7131 80 7
5’ 10 000 0.3 12.2684 100 8
6’ 10 000 0.3 12.0958 120 9
7’ 10 000 0.3 12.2384 140 10
  LFR生成的基准网络
  η=med(disnode), bt=med(dsf), ε=0.00001
  η=mod(disnode), bt=mod(dsf), ε=0.00001
  η=med(disnode), bt=med(dsf), ε=0.001
  η=mod(disnode), bt=mod(dsf), ε=0.001
  η=med(disnode), bt=med(dsf)
  η=mod(disnode), bt=mod(dsf)
  η=med(disnode), bt=med(dsf), ε=0.00001
  η=mod(disnode), bt=mod(dsf), ε=0.00001
  η=med(disnode), bt=med(dsf), ε=0.001
  η=mod(disnode), bt=mod(dsf), ε=0.001
[1] 中国社会科学院财经战略研究院, 中央电视台财经频道. 中国电子商务半年报(2017)[EB/OL]. [2017-07-22]. .
[1] (National Academy of Economic Strategy, CCTV Finance and Economics. China Electronic Commerce Semi-Annual Report [EB/OL]. [2017-07-22].
[2] 中国互联网络信息中心.第40次中国互联网络发展状况统计报告[EB/OL]. [2017-08-17]. .
[2] (China Internet Network Information Center. The 40th China Statistical Report on Internet Development[EB/OL]. [2017-08-17].
[3] Suh E H, Noh K C, Suh C K.Customer List Segmentation Using the Combined Response Model[J]. Expert Systems with Applications, 1999, 17(2): 89-97.
doi: 10.1016/S0957-4174(99)00026-3
[4] Heilman C M, Bowman D.Segmenting Consumers Using Multiple-category Purchase Data[J]. International Journal of Research in Marketing, 2002, 19(3): 225-252.
doi: 10.1016/S0167-8116(02)00077-0
[5] 徐翔斌, 王佳强, 涂欢, 等. 基于改进RFM模型的电子商务客户细分[J]. 计算机应用, 2012, 32(5): 1439-1442.
doi: 10.3724/SP.J.1087.2012.01439
[5] (Xu Xiangbin, Wang Jiaqiang, Tu Huan, et al.Customer Classification of E-commerce Based on Improved RFM Model[J]. Journal of Computer Applications, 2012, 32(5): 1439-1442.)
doi: 10.3724/SP.J.1087.2012.01439
[6] Gregory S.An Algorithm to Find Overlapping Community Structure in Networks[C]//Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. Berlin, Heidelberg: Springer, 2007: 91-102. DOI: 10.1007/978-3-540-74976-9_12.
[7] 刘功申, 孟魁, 郭弘毅, 等. 基于贡献函数的重叠社区划分算法[J]. 电子与信息学报, 2017, 39(8): 1964-1971.
[7] (Liu Gongshen, Meng Kui, Guo Hongyi, et al.Overlapping- communities Recognition Algorithm Based on Contribution Function[J]. Journal of Electronics & Information Technology, 2017, 39(8): 1964-1971.)
[8] 刘世超, 朱福喜, 甘琳. 基于标签传播概率的重叠社区发现算法[J]. 计算机学报, 2016, 39(4): 717-729.
doi: 10.11897/SP.J.1016.2016.00717
[8] (Liu Shichao, Zhu Fuxi, Gan Lin.A Label-Propagation-Probability-Based Algorithm for Overlapping Community Detection[J]. Chinese Journal of Computers, 2016, 39(4): 717-729.)
doi: 10.11897/SP.J.1016.2016.00717
[9] 姜雅文, 贾彩燕, 于剑. 基于类原型的复杂网络重叠社区发现方法[J]. 模式识别与人工智能, 2013, 26(7): 648-659.
doi: 10.3969/j.issn.1003-6059.2013.07.007
[9] (Jiang Yawen, Jia Caiyan, Yu Jian.Overlapping Community Detection in Complex Networks Based on Cluster Prototypes[J]. PR & AI, 2013, 26(7): 648-659.)
doi: 10.3969/j.issn.1003-6059.2013.07.007
[10] Majorek K A, Dunin-Horkawicz S, Steczkiewicz K, et al.The RNase H-like Superfamily: New Members, Comparative Structural Analysis and Evolutionary Classification[J]. Nucleic Acids Research, 2014, 42(7): 4160-4179.
doi: 10.1093/nar/gkt1414 pmid: 24464998
[11] Paleo B W.An Approximate Gazetteer for GATE Based on Levenshtein Distance[C]//Proceedings of Student Session of the European Summer School of Logic, Language and Information. 2007.
[12] Boytsov L. Indexing Methods for Approximate Dictionary Searching: Comparative Analysis[J]. Journal of Experimental Algorithmics, 2011, 16: Article No. 1.1.
doi: 10.1145/1963190.1963191
[13] Winkler W E.String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage[C]//Proceedings of the Section on Survey Research. American Statistical Association, 1990: 354-359.
[14] Jaro M A.Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida[J]. Journal of the American Statistical Association, 1989, 84(406): 414-420.
doi: 10.1080/01621459.1989.10478785
[15] Rodriguez A, Laio A.Clustering by Fast Search and Find of Density Peaks[J]. Science, 2014, 344(6191): 1492-1496.
doi: 10.1126/science.1242072
[16] Newman M E J. Fast Algorithm for Detecting Community Structure in Networks[J]. Physical Review E: Statistical Nonlinear & Soft Matter Physics, 2003, 69(6 Pt 2): 066133.
[17] Nicosia V, Mangioni G, Carchiolo V, et al.Extending the Definition of Modularity to Directed Graphs with Overlapping Communities[J]. Journal of Statistical Mechanics Theory & Experiment, 2009(3): 3166-3168.
doi: 10.1088/1742-5468/2009/03/P03024
[18] Lancichinetti A, Fortunato S, Radicchi F.Bechmark Graghs for Testing Community Detection Algorithm[J]. Physical Review E, 2008, 78(4): 046110.
doi: 10.1103/PhysRevE.78.046110 pmid: 18999496
[19] Estévez P A, Tesmer M, Perez C A, et al.Normalized Mutual Information Feature Selection[J]. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201.
doi: 10.1109/TNN.2008.2005601 pmid: 19150792
[20] Saramäki J, Kivelä M, Onnela J P, et al.Generalizations of the Clustering Coefficient to Weighted Complex Networks[J]. Physical Review E: Statistical Nonlinear & Soft Matter Physics, 2007, 75(2): 027105.
doi: 10.1103/PhysRevE.75.027105 pmid: 17358454
[21] 乔少杰, 韩楠, 张凯峰, 等. 复杂网络大数据中重叠社区检测算法[J]. 软件学报, 2017, 28(3): 631-647.
doi: 10.13328/j.cnki.jos.005155
[21] (Qiao Shaojie, Han Nan, Zhang Kaifeng, et al.Algorithm for Detecting Overlapping Communities from Complex Network Big Data[J]. Journal of Software, 2017, 28(3): 631-647.)
doi: 10.13328/j.cnki.jos.005155
[1] 陈文杰,文奕,杨宁. 基于节点向量表示的模糊重叠社区划分算法*[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[2] 李文政,顾益军,闫红丽. 基于网络贝叶斯信息准则算法的社区数量预测研究*[J]. 数据分析与知识发现, 2020, 4(4): 72-82.
[3] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[4] 仇丽青,贾玮,范鑫. 基于重叠社区的影响力最大化算法 *[J]. 数据分析与知识发现, 2019, 3(7): 94-102.
[5] 李想,钱晓东. 商品在线评价对消费趋同影响研究*[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[6] 严娇,马静,房康. 基于融合共现距离的句法网络下文本语义相似度计算 *[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[7] 蒋武轩,熊回香,叶佳鑫,安宁. 网络社交平台中社群标签动态生成研究 *[J]. 数据分析与知识发现, 2019, 3(10): 98-109.
[8] 陈云伟, 张瑞红. 用于情报挖掘的典型网络社团划分算法比较研究*[J]. 数据分析与知识发现, 2018, 2(10): 84-94.
[9] 刘冰瑶, 马静, 李晓峰. 一种“特征降维”文本复杂网络的话题表示模型*[J]. 数据分析与知识发现, 2017, 1(11): 53-61.
[10] 吴江,陈君,张劲帆. 协同创新中知识供需系统的模拟研究*[J]. 现代图书情报技术, 2016, 32(9): 27-33.
[11] 叶腾,韩丽川,邢春晓,张妍. 基于复杂网络的虚拟社区创新知识传播机制研究*[J]. 现代图书情报技术, 2016, 32(7-8): 70-77.
[12] 夏立新,谭荧. LOD的网络结构分析与可视化*[J]. 现代图书情报技术, 2016, 32(1): 65-72.
[13] 王小立. 智能多Agent网络的微信信息传播仿真研究[J]. 现代图书情报技术, 2015, 31(6): 85-92.
[14] 杨宁, 黄飞虎, 文奕, 陈云伟. 基于微博用户行为的观点传播模型[J]. 现代图书情报技术, 2015, 31(12): 34-41.
[15] 杜坤, 刘怀亮, 郭路杰. 结合复杂网络的特征权重改进算法研究[J]. 现代图书情报技术, 2015, 31(11): 26-32.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn