Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (4): 108-119    DOI: 10.11925/infotech.2096-3467.2021.0880
Current Issue | Archive | Adv Search |
Profiling Big Data Users with Qualitative and Quantitative Fusion Methods
Wu Wenhan()
Department of Library Information and Archives, Shanghai University, Shanghai 200444, China
Download: PDF (2730 KB)   HTML ( 13
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This research designed a new model of profiling big data users’ portraits, aiming to address the fusion issue facing qualitative and quantitative methods. [Methods] We combined qualitative and quantitative methods to design the new model, which has a user value map based on sociological and psychological theories. Then, we used the Look-alike algorithm to build a map data label system, and used the K-Means clustering algorithm to processs the data. Finally, we interpret the clustered data. [Results] We examined our model with 200 million data points, and successfully divided young users into 20 groups. The total amount of data reached 17 million with 606 labels, which are better than the survey data. [Limitations] More research is needed to extract more original data, improve the subjective control of the user value map, as well as conduct heterogeneous data profiling. [Conclusions] The proposed model is of significance for related studies.

Key wordsBig Data      User Portrait      Empirical Research      Fusion Method     
Received: 20 August 2021      Published: 31 December 2021
ZTFLH:  G202  
Corresponding Authors: Wu Wenhan,ORCID:0000-0003-4432-6478     E-mail: wuwenhan000@163.com

Cite this article:

Wu Wenhan. Profiling Big Data Users with Qualitative and Quantitative Fusion Methods. Data Analysis and Knowledge Discovery, 2022, 6(4): 108-119.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0880     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I4/108

Full View of Fusion Model Design
类别 标签属性 名称 工作日 假期
工作时段8-18点 休闲时段18-24点 睡眠时段0-8点 生活时段8-24点 睡眠时段0-8点
POI POI搜索 咖啡厅(POI编码:000001) act_85-00_0_8_18 act_85-00_0_18_24 act_85-00_0_0_8 act_85-00_1_8_24 act_85-00_1_0_8
POI POI到达 咖啡厅 arr_85-00_0_8_18 arr_85-00_0_18_24 arr_85-00_0_0_8 arr_85-00_1_8_24 arr_85-00_1_0_8
APP APP使用 拍照摄影(大类编码:000002) app_85-00_0_8_18 app_85-00_0_18_24 app_85-00_0_0_8 app_85-00_1_8_24 app_85-00_1_0_8
Basic Information of Data and Labels
User Value Map
维度

标签
外显 内敛
群体价值观 服饰消费 购置奢侈品牌(Prada、LV、Hermes等) 热爱钻研 去科技展;购置VR眼镜;长时间使用极客类APP(每日超过两小时)
内容消费 美图和自拍APP深度用户 独自享用 垂钓园一个月两次以上;周末使用外卖软件(每周超过两次);网购泡面和速食火锅
娱乐场所 出入特定娱乐场所(酒吧、马术俱乐部);
高级酒店不过夜
自学自乐 足不出户的教育APP深度用户(腾讯课堂、网易公开课等)

维度

标签
求新 求稳
个体价值观 小众追求 去小众国家旅游(摩洛哥、捷克等) 不加班 工作日6点下班,双休日不去公司加班
线下活动 经常出差(每月1~2次);周末加班;高频
使用共享办公室APP(WeWork等)
线下活动 单一城市;出行旅游城市
信息搜集 财经类APP重度用户(每日超过两小时) 两点一线 周末总是长时间待在家里或者去购物中心消磨时光
Seed Group Core Distinguishing Labels
维度 正负分类 种子人群数量 交集数 AUC
群体价值观 外显 476 843 1 403 0.95
内敛 460 606
个体价值观 求新 301 826 7 0.83
求稳 315 418
Seed Group Selection and Results Evaluation
Classification and Composition of Label System
Overview of 20 Groups of People
K值类别 80后 85后 90后 95后
K(簇间/
簇内距离)
14(0.951 4) 14(0.970 7) 13(0.960 5) 10(0.903 9)
15(0.964 2) 15(0.979 3) 14(0.960 7) 11(0.919 6)
16(0.974 8) 16(0.991 4) 15(0.977 6) 12(0.938 9)
17(0.990 5) 17(0.998 9) 16(0.994 7) 13(0.970 6)
18(1.000 0) 18(1.000 0) 17(1.000 0) 14(1.000 0)
K
(轮廓距离)
14(1.000 0) 14(0.821 9) 13(0.744 9) 10(1.000 0)
15(0.773 9) 15(1.000 0) 14(0.847 7) 11(0.856 3)
16(0.802 3) 16(0.948 6) 15(1.000 0) 12(0.844 8)
17(0.958 4) 17(0.712 3) 16(0.906 2) 13(0.918 8)
18(0.727 9) 18(0.900 7) 17(0.995 5) 14(0.957 2)
Results of Different Parameters for 4 Groups
关键指标 基于定性定量融合方法
大数据画像
基于调查数据的
用户画像
总数据量/×103 178 987(输出结果) 12
划分群体数 20 18
标签/问题数 617 76
其中:基础信息 11 19
其中:偏好信息 606 57
解析维度大类 3 3
Horizontal Comparison of Research Results
对比维度 基于定性定量融合方法
大数据画像
基于调查数据的
用户画像
数据类型 静态数据结合动态位置 静态数据
数据形成 用户行为数据化 研究量化
关键技术 研究过程和算法设计 调查问卷和访谈
研究耗时 耗时相对较长(可分摊) 耗时较长,主要在寻找
调研对象
研究视角 用户客观数据为主 用户主观回答
呈现方式 动态图形为主 表格数据为主
Methodological Comparison
[1] Fullerton R A. The Birth of Consumer Behavior: Motivation Research in the 1940s and 1950s[J]. Journal of Historical Research in Marketing, 2013, 5(2):212-222.
doi: 10.1108/17557501311316833
[2] Bartels R. The History of Marketing Thought[M]. The 2nd Edition. New York: Gorsuch Scarisbrick Pub, 1976: 1-33.
[3] Deaton A, Muellbauer J. Economics and Consumer Behavior [M]. Cambridge: Cambridge University Press, 1980: 25-53.
[4] 刘江. 消费者行为研究[M]. 北京: 北京广告函授学院出版社, 1985: 1-11.
[4] ( Liu Jiang. Consumer Behavior Research[M]. Beijing: Beijing Advertising Correspondence Academy Press, 1985: 1-11.)
[5] 华光彦. 消费者研究[M]. 哈尔滨: 黑龙江人民出版社, 1987:2-17.
[5] ( Hua Guangyan. Consumer Research[M]. Harbin: Heilongjiang People’s Publishing Press, 1987: 2-17.)
[6] 卢泰宏. 消费者行为学50年: 演化与颠覆[J]. 外国经济与管理, 2017, 39(6):23-38.
[6] ( Lu Taihong. 50 Years of Consumer Behavior: Evolution and Overthrow[J]. Foreign Economics & Management, 2017, 39(6):23-38.)
[7] van Maanen J. Ethnography as Work: Some Rules of Engagement[J]. Journal of Management Studies, 2011, 48(1):218-234.
doi: 10.1111/j.1467-6486.2010.00980.x
[8] 王田, 梅洪常, 张伟. 影响消费的诸因素分析及模型化描述方法研究[J]. 消费经济, 2005, 21(5):9-14.
[8] ( Wang Tian, Mei Hongchang, Zhang Wei. Analysis of Factors Influencing Consumption and Their Model Description Method[J]. Consumer Economics, 2005, 21(5):9-14.)
[9] 倪红耀. B2C电子商务消费者重复购买影响因素研究——基于结构化方程模型的实证研究[J]. 消费经济, 2013, 29(3):60-64.
[9] ( Ni Hongyao. Research on the Influencing Factors of B2C E-commerce Consumers’ Repeat Purchase: Empirical Research Based on Structured Equation Model[J]. Consumer Economics, 2013, 29(3):60-64.)
[10] Wang X, Bendle N T, Mai F, et al. The Journal of Consumer Research at 40: A Historical Analysis[J]. Journal of Consumer Research, 2015, 42(1):5-18.
doi: 10.1093/jcr/ucv009
[11] 罗兰贝格. 战略性品牌管理工具Profiler简介[EB/OL]. (2011-11-07). [2020-04-20]. https://wenku.baidu.com/view/019326b365ce0508763213eb?fr=uc.
[11] ( Rolandberger. Introduction to Strategic Brand Management Tools: Profiler [EB/OL].(2011-11-07). [2020-04-20]. https://wenku.baidu.com/view/019326b365ce0508763213eb?fr=uc.)
[12] Institute S. SIGMA, Organization for International Market Research and Consulting [EB/OL]. (2015-01-01). [2020-04-20]. http://www.sigma-online.com/en/About_SIGMA/.
[13] Price L L, Rowntree B S. Poverty: A Study of Town Life[J]. The Economic Journal, 1902, 12(45):56.
doi: 10.2307/2957025
[14] Wells W D, Gubar G. Life Cycle Concept in Marketing Research[J]. Journal of Marketing Research, 1966, 3(4):355-363.
doi: 10.1177/002224376600300403
[15] Riesman D, Glazer N, Denney R, et al. The Lonely Crowd [M]. New Haven: Yale University Press, 1950: 15-30.
[16] Rokeach M. The Role of Values in Public Opinion Research[J]. Public Opinion Quarterly, 1968, 32(4):547-559.
doi: 10.1086/267645
[17] Cooper A, Reimann M. About Face 2.0: The Essentials of Interaction Design[M]. New Jersey: John Wiley & Sons, Inc., 2007: 223-225.
[18] Teixeira C, Sousa P J, Arnaldo M J. User Profiles in Organizational Environments[J]. Campus-Wide Information Systems, 2008, 25(3):128-144.
doi: 10.1108/10650740810886312
[19] 化柏林, 赵辉. 用户画像方法在科技情报需求探测中的应用探讨[J]. 情报理论与实践, 2020, 43(9):93-99.
[19] ( Hua Bolin, Zhao Hui. Discussion about Application on User Profile Method in the Demand Detection of Science and Technology Intelligence[J]. Information Studies: Theory & Application, 2020, 43(9):93-99.)
[20] 赵雅慧, 刘芳霖, 罗琳. 大数据背景下的用户画像研究综述:知识体系与研究展望[J]. 图书馆学研究, 2019(24):13-24.
[20] ( Zhao Yahui, Liu Fanglin, Luo Lin. A Review of User Profile in the Context of Big Data: Knowledge System and Research Prospect[J]. Research on Library Science, 2019(24):13-24.)
[21] 宋美琦, 陈烨, 张瑞. 用户画像研究述评[J]. 情报科学, 2019, 37(4):171-177.
[21] ( Song Meiqi, Chen Ye, Zhang Rui. A Review of User Profile Research[J]. Information Science, 2019, 37(4):171-177.)
[22] 周光华, 辛英, 张雅洁, 等. 医疗卫生领域大数据应用探讨[J]. 中国卫生信息管理杂志, 2013, 10(4):296-300.
[22] ( Zhou Guanghua, Xin Ying, Zhang Yajie, et al. Study on Big Data’s Applications in Medical and Health Field[J]. Chinese Journal of Health Informatics and Management, 2013, 10(4):296-300.)
[23] 赵博. 大数据在金融领域的应用研究[J]. 信息通信技术, 2018, 12(3):22-26.
[23] ( Zhao Bo. Research on the Application of Big Data in Finance Industry[J]. Information and Communications Technologies, 2018, 12(3):22-26.)
[24] 郑淑蓉. 零售业大数据:形成、应用及启示[J]. 理论探索, 2014(2):90-94.
[24] ( Zheng Shurong. Big Data in the Retail Industry: Formation, Application and Enlightenment[J]. Theoretical Exploration, 2014(2):90-94.)
[25] 辛宇, 郑鑫. 大数据驱动与客户生命周期——基于汽车行业的分析[J]. 河南社会科学, 2014, 22(3):71-77.
[25] ( Xin Yu, Zheng Xin. Data Driven and Customer Life Cycle Theory——Analysis of the Automobile Industry as an Example[J]. Henan Social Sciences, 2014, 22(3):71-77.)
[26] 刘凯. 基于K-Means聚类的物流园区用户画像分析[J]. 物流工程与管理, 2020, 42(3):52-54.
[26] ( Liu Kai. User Portrait Analysis of Logistics Park Based on K-Means Clustering[J]. Logistics Engineering and Management, 2020, 42(3):52-54.)
[27] 许超英. 社交网络中意见领袖画像系统设计与实现[D]. 乌鲁木齐: 新疆大学, 2018.
[27] ( Xu Chaoying. Design and Implementation of the Opinion Leader Portrait System in Social Network[D]. Urumqi: Xinjiang University, 2018.)
[28] 徐涛, 黄莉, 李敏蕾, 等. 基于多维细粒度行为数据的居民用户画像方法研究[J]. 电力需求侧管理, 2019, 21(3):47-52.
[28] ( Xu Tao, Huang Li, Li Minlei, et al. Research on Portrait Method of Residential Users Based on Multi-Dimensional Fine-Grained Behavior Data[J]. Power Demand Side Management, 2019, 21(3):47-52.)
[29] Godoy D. Learning User Interests for User Profiling in Personal Information Agents[J]. AI Communications, 2006, 19(4):391-394.
[30] Kim E G, Chun S H. Analyzing Online Car Reviews Using Text Mining[J]. Sustainability, 2019, 11(6):1611.
doi: 10.3390/su11061611
[31] 张奇. 基于用户使用特性的纯电动汽车充电需求分析及续驶里程研究[D]. 南京:东南大学, 2019.
[31] ( Zhang Qi. Research on Charging Demand Analysis and Driving Range of Battery Electric Vehicle Based on User Characteristics[D]. Nanjing: Southeast University, 2019.)
[32] 王震飞. 基于RFM模型的科学网博客博主群体画像研究——以图书馆学、情报学、档案学三个学科领域为例[J]. 情报探索, 2020, 7(11):26-33.
[32] ( Wang Zhenfei. Research on the Group Profiles of Bloggers in Science Net Based on RFM Model: Case Study of Library Science,Information Science,and Archival Science[J]. Information Research, 2020, 7(11):26-33.)
[33] 刘燕, 李露琪, 侯丽. 面向知识服务系统的用户画像研究与应用[J]. 中华医学图书情报杂志, 2020, 29(11):16-23.
[33] ( Liu Yan, Li Luqi, Hou Li. Knowledge Service System-Orientated User Portrait and Its Application[J]. Chinese Journal of Medical Library and Information Science, 2020, 29(11):16-23.)
[34] 张爱卿. 20世纪动机心理研究的历史探索[J]. 华中师范大学学报(人文社会科学版), 1999, 38(3):26-31.
[34] ( Zhang Aiqing. 20th Century’s Motivation Research[J]. Journal of Central China Normal University (Humanities and Social Sciences), 1999, 38(3):26-31.)
[35] Callebaut J. The Naked Consumer Today: Or an Overview of Why Consumers Really Buy Things, and What This Means for Marketing[M]. Chicago: Garant Publishers, 2002: 5-30.
[36] Nitin Nishandar . TNS: 揭秘八大品牌引力之源[J]. 中国广告, 2014 (7):86-88.
[36] ( Nitin Nishandar. TNS: Demystifying the Source of Gravity of the Eight Major Brands[J]. China Advertising, 2014(7):86-88.)
[37] Mangalampalli A, Ratnaparkhi A, Hatch A O, et al. A Feature-Pair-Based Associative Classification Approach to Look-Alike Modeling for Conversion-Oriented User-Targeting in Tail Campaigns[C]// Proceedings of the 20th International Conference Companion on World Wide Web. 2011: 85-86.
[38] Ma Q, Wen M S, Xia Z, et al. A Sub-Linear, Massive-Scale Look-alike Audience Extension System[C]//Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016. 2016:51-67.
[39] Liu Y D, Ge K K, Zhang X, et al. Real-Time Attention Based Look-Alike Model for Recommender System[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 2765-2773.
[40] 黄韬, 刘胜辉, 谭艳娜. 基于K-Means聚类算法的研究[J]. 计算机技术与发展, 2011, 21(7):54-57.
[40] ( Huang Tao, Liu Shenghui, Tan Yanna. Research of Clustering Algorithm Based on K-Means[J]. Computer Technology and Development, 2011, 21(7):54-57.)
[41] Mitchell A. Nine American Lifestyles: Values and Societal Change[J]. Futurist, 1984, 18:4-14.
[42] 吴垠. 关于中国消费者分群范式(China-Vals)的研究[J]. 南开管理评论, 2005, 8(2):9-15.
[42] ( Wu Yin. The Research Towards Model of China-Vals[J]. Nankai Business Review, 2005, 8(2):9-15.)
[43] Fawcett T. An Introduction to ROC Analysis[J]. Pattern Recognition Letters, 2006, 27(8):861-874.
doi: 10.1016/j.patrec.2005.10.010
[44] 任正东, 章骏腾, 任东晓. 基于目标群体指数的大学生画像分析[J]. 黑龙江生态工程职业学院学报, 2021, 34(2):113-116.
[44] ( Ren Zhengdong, Zhang Junteng, Ren Dongxiao. Portrait Analysis of College Students Based on Target Group Index[J]. Journal of Heilongjiang Vocational Institute of Ecological Engineering, 2021, 34(2):113-116.)
[45] 马鑫, 段刚龙, 王建仁, 等. 基于改进轮廓系数法的航空公司客户分群研究[J]. 运筹与管理, 2021, 30(1):140-146.
[45] ( Ma Xin, Duan Ganglong, Wang Jianren, et al. Research on Airline Customer Clustering Based on Improved Silhouette Coefficient Method[J]. Operations Research and Management Science, 2021, 30(1):140-146.)
[46] 践行全球化战略长城汽车将再次出征法兰克福车展[EB/OL]. (2019-09-04). [2021-04-20]. https://www.gwm.com.cn/news_detail-16513.html.
[46] (Implementation of Global Strategy Great Wall Motors will Once Again Set Off at the Frankfurt Motor Show [EB/OL]. (2019-09-04). [2021-04-20]. https://www.gwm.com.cn/news_detail-16513.html.)
[1] Xu Zengxulin, Xie Jing, Yu Qianqian. Designing New Evaluation Model for Talents[J]. 数据分析与知识发现, 2021, 5(8): 122-131.
[2] Chang Zhijun,Qian Li,Xie Jing,Wu Zhenxin,Zhang Hu,Yu Qianqian,Wang Ying,Wang Yongji. Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J]. 数据分析与知识发现, 2021, 5(3): 69-77.
[3] Chen Shiji, Qiu Junping, Yu Bo. Topic Analysis of LIS Big Data Research with Overlay Mapping[J]. 数据分析与知识发现, 2021, 5(10): 51-59.
[4] Zhao Yuxiang,Lian Jingwen. Review of Cultural Heritage Crowdsourcing in the Domain of Digital Humanities[J]. 数据分析与知识发现, 2021, 5(1): 36-55.
[5] Qiu Erli,He Hongwei,Yi Chengqi,Li Huiying. Research on Public Policy Support Based on Character-level CNN Technology[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[6] Wang Jiandong,Yu Shiyang. Principles on Constructing National Economic Brain[J]. 数据分析与知识发现, 2020, 4(7): 2-17.
[7] Jiandong Wang. Monitoring and Forecasting Economic Performance with Big Data[J]. 数据分析与知识发现, 2020, 4(1): 12-26.
[8] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[9] Xiaozhou Dong,Xinkang Chen. E-Coupon and Economic Performance of E-commerce[J]. 数据分析与知识发现, 2019, 3(6): 42-49.
[10] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[11] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[12] Li Qian,Jing Xie,Zhijun Chang,Zhenxin Wu,Dongrong Zhang. Designing Smart Knowledge Services with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 4-14.
[13] Jiying Hu,Jing Xie,Li Qian,Changlei Fu. Constructing Big Data Platform for Sci-Tech Knowledge Discovery with Knowledge Graph[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[14] Jing Xie,Li Qian,Hongbo Shi,Beibei Kong,Jiying Hu. Designing Framework for Precise Service of Scholarly Big Data[J]. 数据分析与知识发现, 2019, 3(1): 63-71.
[15] Bi Datian,Wang Fu,Xu Pengcheng. Analyzing Mobile Library Users and Recommending Services with VSM[J]. 数据分析与知识发现, 2018, 2(9): 100-108.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn