Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (2): 74-85    DOI: 10.11925/infotech.2096-3467.2017.0886
Identifying User Interests Based on Browsing Behaviors
Liu Hongwei1, Gao Hongming1, Chen Li2(), Zhan Mingjun1, Liang Zhouyang1
1(School of Management, Guangdong University of Technology, Guangzhou 510520, China)
2(Guangdong Youth Vocational College, Guangzhou 510507, China)
[Objective] This paper proposes a model to identify the interests of online shoppers based on their browsing behaviors, aiming to improve the personalized recommendation services. [Methods] First, we launched experiment to collect clickstream data from Taobao and TMall. Second, we used the Bisecting K-means algorithm to analyze the retrieved data. Finally, we established the relationship mapping structure between interests and behaviors. [Results] We found four types of user’s implicit interests: Attention, Comprehension, Attitudes and Intention. Users with the Attitude and Intention types tended to make purchase. The characteristics of browsing paths were different among the users. [Limitations] We did not examine unstructured data, i.e., online sales advertisements, in this study. [Conclusions] This paper investigates the user interests in online shopping, and then improve the personalized recommendation services of the E-commerce platforms.

Key wordsImplicit Interest      Clickstream      Bisecting K-means Algorithm     
Received: 01 September 2017      Published: 07 March 2018
ZTFLH:  TP391.4 F713.8  

Liu Hongwei,Gao Hongming,Chen Li,Zhan Mingjun,Liang Zhouyang. Identifying User Interests Based on Browsing Behaviors. Data Analysis and Knowledge Discovery, 2018, 2(2): 74-85.

字段 含义
user_Id 用户ID
sessionId 会话ID
tabId 标签页记录ID
title 网页主题
url 用户访问地址
visitedTime 用户访问时间
goodlist 商品列表
Info 鼠标点击信息
缩写 H A S D F G R B P Y V T C O
类名 主页 账户 付款
购物车 商品 评价 品牌或旗舰店 价格 人气 销量 商品
目录 其他
频数 138 96 7 30 52 170 11 142 17 5 4 588 438 74
频率(%) 7.79 5.42 0.40 1.69 2.93 9.59 0.62 8.01 0.96 0.28 0.23 33.18 24.72 4.18
变量 均值 标准差 最小值 中位数 最大值
页面持续时间(秒) 12.28 45.32 0.00 3.00 1492.00
0.71 3.42 0.00 0.09 100.00
页面点击率(%) 27.67 18.76 0.27 26.01 100.00
会话访问深度(页) 28.20 25.72 2.00 22.00 102.00
动态兴趣 Time Timeratio Clickratio Sessiondepth
第1簇 5.283270 0.5805210 50.57808 16.92205
第2簇 7.042510 0.2328121 19.05581 64.23077
第3簇 11.558824 0.6666170 17.15118 12.02801
第4簇 155.5405 8.1338870 21.005 19.62162
4类动态兴趣 相关系数
第1簇 /
第2簇 -0.022
第3簇 -0.081
第4簇 0.679**
