[Objective] This paper presents an algorithm to identify composite types of e-commerce users, aiming to improve e-commerce operators’ personalized marketing services. [Methods] First, we built the node distance matrix based on the characteristics of user access sequences. Then, we modified the Jaro-Winkler distance algorithm from the perspectives of redefining matching number, editing cost and rules. Third, we used the improved algorithm to calculate the user access sequence distance matrix. Based on the distance matrix, we distinguished the central and non-central users to construct a complex network for identifying user composite types. We used the improved CNM algorithm to obtain the initial user types. With the help of fuzzy membership function for user optimization, we obtained their composite types. [Results] Compared to CONGA, the NMI of the proposed algorithm was improved by 15.60%. The algorithm was also applied to examine the real user’s online data, and its overall clustering coefficient was 10.87% higher than the CONGA. The time complexity of the new algorithm was reduced too. [Limitations] The proposed algorithm needs to set three parameters subjectively. [Conclusions] The user network conforms to the characteristics of a small-world model and has the typical morphology of a complex network. The algorithm can effectively identify the composite types of e-commerce users.
钱晓东, 李敏. 基于复杂网络重叠社区的电子商务用户复合类型识别*[J]. 数据分析与知识发现, 2018, 2(6): 79-91.
Qian Xiaodong,Li Min. Identifying E-commerce User Types Based on Complex Network Overlapping Community. Data Analysis and Knowledge Discovery, 2018, 2(6): 79-91.
(China Internet Network Information Center. The 40th China Statistical Report on Internet Development[EB/OL]. [2017-08-17].
[3]
Suh E H, Noh K C, Suh C K.Customer List Segmentation Using the Combined Response Model[J]. Expert Systems with Applications, 1999, 17(2): 89-97.
doi: 10.1016/S0957-4174(99)00026-3
[4]
Heilman C M, Bowman D.Segmenting Consumers Using Multiple-category Purchase Data[J]. International Journal of Research in Marketing, 2002, 19(3): 225-252.
doi: 10.1016/S0167-8116(02)00077-0
(Xu Xiangbin, Wang Jiaqiang, Tu Huan, et al.Customer Classification of E-commerce Based on Improved RFM Model[J]. Journal of Computer Applications, 2012, 32(5): 1439-1442.)
doi: 10.3724/SP.J.1087.2012.01439
[6]
Gregory S.An Algorithm to Find Overlapping Community Structure in Networks[C]//Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. Berlin, Heidelberg: Springer, 2007: 91-102. DOI: 10.1007/978-3-540-74976-9_12.
(Liu Gongshen, Meng Kui, Guo Hongyi, et al.Overlapping- communities Recognition Algorithm Based on Contribution Function[J]. Journal of Electronics & Information Technology, 2017, 39(8): 1964-1971.)
(Liu Shichao, Zhu Fuxi, Gan Lin.A Label-Propagation-Probability-Based Algorithm for Overlapping Community Detection[J]. Chinese Journal of Computers, 2016, 39(4): 717-729.)
doi: 10.11897/SP.J.1016.2016.00717
(Jiang Yawen, Jia Caiyan, Yu Jian.Overlapping Community Detection in Complex Networks Based on Cluster Prototypes[J]. PR & AI, 2013, 26(7): 648-659.)
doi: 10.3969/j.issn.1003-6059.2013.07.007
[10]
Majorek K A, Dunin-Horkawicz S, Steczkiewicz K, et al.The RNase H-like Superfamily: New Members, Comparative Structural Analysis and Evolutionary Classification[J]. Nucleic Acids Research, 2014, 42(7): 4160-4179.
doi: 10.1093/nar/gkt1414
pmid: 24464998
[11]
Paleo B W.An Approximate Gazetteer for GATE Based on Levenshtein Distance[C]//Proceedings of Student Session of the European Summer School of Logic, Language and Information. 2007.
[12]
Boytsov L. Indexing Methods for Approximate Dictionary Searching: Comparative Analysis[J]. Journal of Experimental Algorithmics, 2011, 16: Article No. 1.1.
doi: 10.1145/1963190.1963191
[13]
Winkler W E.String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage[C]//Proceedings of the Section on Survey Research. American Statistical Association, 1990: 354-359.
[14]
Jaro M A.Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida[J]. Journal of the American Statistical Association, 1989, 84(406): 414-420.
doi: 10.1080/01621459.1989.10478785
[15]
Rodriguez A, Laio A.Clustering by Fast Search and Find of Density Peaks[J]. Science, 2014, 344(6191): 1492-1496.
doi: 10.1126/science.1242072
[16]
Newman M E J. Fast Algorithm for Detecting Community Structure in Networks[J]. Physical Review E: Statistical Nonlinear & Soft Matter Physics, 2003, 69(6 Pt 2): 066133.
[17]
Nicosia V, Mangioni G, Carchiolo V, et al.Extending the Definition of Modularity to Directed Graphs with Overlapping Communities[J]. Journal of Statistical Mechanics Theory & Experiment, 2009(3): 3166-3168.
doi: 10.1088/1742-5468/2009/03/P03024
[18]
Lancichinetti A, Fortunato S, Radicchi F.Bechmark Graghs for Testing Community Detection Algorithm[J]. Physical Review E, 2008, 78(4): 046110.
doi: 10.1103/PhysRevE.78.046110
pmid: 18999496
[19]
Estévez P A, Tesmer M, Perez C A, et al.Normalized Mutual Information Feature Selection[J]. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201.
doi: 10.1109/TNN.2008.2005601
pmid: 19150792
[20]
Saramäki J, Kivelä M, Onnela J P, et al.Generalizations of the Clustering Coefficient to Weighted Complex Networks[J]. Physical Review E: Statistical Nonlinear & Soft Matter Physics, 2007, 75(2): 027105.
doi: 10.1103/PhysRevE.75.027105
pmid: 17358454
(Qiao Shaojie, Han Nan, Zhang Kaifeng, et al.Algorithm for Detecting Overlapping Communities from Complex Network Big Data[J]. Journal of Software, 2017, 28(3): 631-647.)
doi: 10.13328/j.cnki.jos.005155