Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (11): 2-9    DOI: 10.11925/infotech.2096-3467.2018.0834
Current Issue | Archive | Adv Search |
Predicting Conversion Rate of APP Advertising with Machine Learning
Zhao Yang(), Yuan Xini, Chen Yawen, Wu Liqiang
School of Information Management, Wuhan University, Wuhan 430072, China
Download: PDF (741 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to predict the conversion rate of APP advertisements with the help of machine learning algorithms, aiming to improve the effectiveness of advertising and marketing activities. [Methods] First, we examined the characteristics of APP advertisements. Then, we applied four machine learning algorithms to predict their conversion rate. The proposed RF+LXFV model was built with Random Forest, Gradient Boosting Decision Tree, Random Forest, LightGBM, XGBoost, Vowpal Wabbit and Field-aware Factorization Machine. Finally, we evaluated the validity and accuracy of the new model with Tencent APP advertising data. [Results] The prediction results of the proposed model achieved higher accuracy than those of the single algorithm. [Limitations] We did not examine the impacts of advertising transformation delay on prediction. [Conclusions] The proposed RF+LXFV model could predict the conversion rate of APP advertising effectively.

Key wordsAPP Advertising      Advertising Conversion Rate Prediction      Machine Learning      RF+LXFV     
Received: 26 July 2018      Published: 11 December 2018
ZTFLH:  TP391  

Cite this article:

Zhao Yang,Yuan Xini,Chen Yawen,Wu Liqiang. Predicting Conversion Rate of APP Advertising with Machine Learning. Data Analysis and Knowledge Discovery, 2018, 2(11): 2-9.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0834     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I11/2

数据集 正样
本数(条)
负样
本数(条)
样本
总数(条)
正样本
占比
训练集 27 236 732 671 759 907 0.0358
测试集 2 414 82 020 84 434 0.0285
随机森林(RF) 梯度提升决策树(GBDT)
特征 权重 特征 权重
clickCount 0.120 advertiserID 0.111
positionID 0.098 positionID 0.091
residence 0.086 creativeID 0.088
age 0.067 adID 0.086
adID 0.064 appCategory 0.080
appCount 0.057 clickCount 0.080
positionType 0.048 campaignID 0.070
appID 0.048 connectionType 0.068
connectionType 0.045 age 0.063
creativeID 0.043 appID 0.060
特征
种类
特征名 变量名 变量解释
广告
特征
广告主ID advertiserID 唯一标识一个特定的广告投放商家
推广计划ID campaignID 唯一标识一个特定的广告推广计划
广告ID adID 唯一标识一个特定的广告
素材ID creativeID 唯一标识广告中使用的素材
用户
特征
年龄 age 用户的年龄, 取值范围[0, 80], 其中0表示未知
常住地 residence 用户长期居住地, 千位和百位表示省份, 十位和个位表示省内城市, 编号0表示未知
用户安装同类APP的个数 appCount 用户通过广告推荐下载并安装过的相同类型的APP数量, 由用户安装流水表统计得到
用户点击该APP广告的次数 clickCount 用户点击当前广告的次数, 由用户点击记录表统计得到
商品
特征
APPID appID 广告中推广的APP编号, 不同广告可指向同一个APP
APP分类 appCategory APP开发者设定的APP类目标签, 分为两层, 使用3位数字编码, 百位数表示一级类目, 十位和个位表示二级类目
情境
特征
广告位ID positionID 广告在APP中投放的具体位置编号
广告位类型 positionType 人工定义的一套广告位分类标准, 如启动屏广告位、Banner广告位等
联网方式 connectionType 用户所使用的移动终端设备当前的联网方式, 包括2G、3G、4G、WiFi和未知
预测模型 Log-Loss AUC
LightGBM 0.11648306368 0.74722242761
XGBoost 0.12105592334 0.77485575487
FFM 0.12200810476 0.69786947007
VW 0.14191362554 0.69648447941
RF 0.11342799887 0.77356316492
RF+LXFV 0.10512076579 0.78637268084
级别 正样本数 负样本数 样本总数 正样本累计分布
1 1 143 7 302 8 445 47.3%
2 304 8 139 8 443 59.9%
3 226 8 218 8 444 69.3%
4 185 8 258 8 443 76.9%
5 164 8 280 8 444 83.7%
6 141 8 302 8 443 89.6%
7 101 8 342 8 443 93.7%
8 88 8 356 8 444 97.4%
9 45 8 398 8 443 99.2%
10 17 8 425 8 442 100%
[1] 艾瑞咨询. 网络广告规模2902.7亿元, 电商广告首超搜索居榜首[OL]. [2018-01-22]. .
[1] (IResearch. The Scale of Online Advertising is 290.27 Billion Yuan, Occupy a Layer Proportion than Searchads [OL]. [2018-01-22].
[2] 张爽. 移动互联网时代的APP广告浅析[J]. 理财:经纶, 2017(10): 57-60.
[2] (Zhang Shuang.Analysis of APP Advertising in the Age of Mobile Internet[J]. Finacial Management, 2017(10): 57-60.)
[3] Cheng H, Cantú-Paz E.Personalized Click Prediction in Sponsored Search[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010: 351-360.
[4] Tagami Y, Ono S, Yamamoto K, et al. CTR Prediction for Contextual Advertising: Learning-to-Rank Approach[C]// Proceedings of the 7th International Workshop on Data Mining for Online Advertising. ACM, 2013: Ariticle No.4.
[5] 李春红, 吴英, 覃朝勇. 基于LASSO变量选择方法的网络广告点击率预测模型研究[J]. 数理统计与管理, 2016, 35(5): 803-809.
doi: 10.13860/j.cnki.sltj.20160922-022
[5] (Li Chunhong, Wu Ying, Qin Chaoyong.Research on Search Engine Advertisement Click Rate Prediction Model Based on LASSO Variable Selection Method[J]. Journal of Applied Statistics and Management, 2016, 35(5): 803-809.)
doi: 10.13860/j.cnki.sltj.20160922-022
[6] 张志强, 周永, 谢晓芹, 等. 基于特征学习的广告点击率预估技术研究[J]. 计算机学报, 2016, 39(4): 780-794.
doi: 10.11897/SP.J.1016.2016.00780
[6] (Zhang Zhiqiang, Zhou Yong, Xie Xiaoqin, et al.Research on Advertising Click-Through Rate Estimation Based on Feature Learning[J]. Chinese Journal of Computers, 2016, 39(4): 780-794.)
doi: 10.11897/SP.J.1016.2016.00780
[7] Chapelle O, Manavoglu E, Rosales R.Simple and Scalable Response Prediction for Display Advertising[J]. ACM Transactions on Intelligent Systems & Technology, 2014, 5(4): 1-34.
doi: 10.1145/2532128
[8] Ahmed A, Low Y, Aly M, et al.Scalable Distributed Inference of Dynamic User Interests for Behavioral Targeting[C]// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011: 114-122.
[9] Richardson M, Dominowska E, Ragno R.Predicting Clicks: Estimating the Click-through Rate for New Ads[C]// Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 521-530.
[10] 单丽莉. 基于张量分解的实时竞价广告响应预测方法[D]. 哈尔滨: 哈尔滨工业大学, 2016.
[10] (Shan Lili.Response Prediction on Real Time Bidding via Tensor Factorization[D]. Harbin: Harbin Institute of Technology. 2016.)
[11] Juan Y, Zhuang Y, Chin W S, et al.Field-aware Factorization Machines for CTR Prediction[C]//Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016: 43-50.
[12] 魏晓航, 于重重, 田嫦丽, 等. 大数据平台下的互联网广告点击率预估模型[J]. 计算机工程与设计, 2017, 38(9): 2504-2508.
doi: 10.16208/j.issn1000-7024.2017.09.038
[12] (Wei Xiaohang, Yu Chongchong, Tian Changli, et al.Internet CTR Prediction Model on Big Data Platform[J]. Computer Engineering and Design, 2017, 38(9): 2504-2508.)
doi: 10.16208/j.issn1000-7024.2017.09.038
[13] 顾智宇, 秦涛, 王斌. 基于转化的互联网广告技术研究[J]. 中文信息学报, 2014, 28(2): 151-158.
doi: 10.3969/j.issn.1003-0077.2014.02.022
[13] (Gu Zhiyu, Qin Tao, Wang Bin.A Survey of Conversion-based Internet Advertising Model[J]. Journal of Chinese Information Processing, 2014, 28(2): 151-158.)
doi: 10.3969/j.issn.1003-0077.2014.02.022
[14] 吴英. 基于贝叶斯方法的网络广告预测模型研究[D]. 南宁: 广西大学, 2016.
[14] (Wu Ying.Online Advertising Prediction Model Based on Bayesian Method[D]. Nanning: Guangxi University, 2016.)
[15] 纪文迪. 时间敏感的转化率预测和归因分析[D]. 上海: 华东师范大学, 2017.
[15] (Ji Wendi.Time-aware Conversion Prediction and Attribution Analysis[D]. Shanghai: East China Normal University, 2017.)
[16] 胡琼方. 移动互联网时代移动广告定价模型研究——以腾讯公司微信公众账号广告为例[D]. 北京: 北京化工大学, 2015.
[16] (Hu Qiongfang.Mobile Advertising Pricing Model Researching of Mobile Internet Era——With Tencent Micro-Channel Public Account Ads[D]. Beijing: Beijing University of Chemical Technology, 2015.)
[17] Jiang Z, Gao S, Dai W.Research on CTR Prediction for Contextual Advertising Based on Deep Architecture Model[J]. Control Engineering & Applied Informatics, 2016, 18(1): 11-19.
[18] 潘书敏, 颜娜, 谢瑾奎. 基于用户相似度和特征分化的广告点击率预测研究[J]. 计算机科学, 2017, 44(2): 283-289.
[18] (Pan Shumin, Yan Na, Xie Jinkui.Study on Advertising Click-through Rate Prediction Based on User Similarity and Feature Differentiation[J]. Computer Science, 2017, 44(2): 283-289.)
[19] 王玙, 刘东苏. 基于联合聚类与用户特征提取的协同过滤推荐算法[J]. 情报学报, 2017, 36(8): 852-858.
[19] (Wang Yu, Liu Dongsu.Collaborative Filtering Algorithm Based on Bi-clustering and User Attribution Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(8): 852-858.)
[20] 沈晶磊, 虞慧群, 范贵生,等. 基于随机森林算法的推荐系统的设计与实现[J]. 计算机科学, 2017, 44(11): 164-167.
[20] (Shen Jinglei, Yu Huiqun, Fan Guisheng, et al.Design and Implementation of Recommender System Based on Random Forest Algorithm[J]. Computer Science, 2017, 44(11): 164-167.)
[21] 田嫦丽, 张珣, 潘博, 等. 互联网广告点击率预估模型中特征提取方法的研究与实现[J]. 计算机应用研究, 2017, 34(2): 334-338.
doi: 10.3969/j.issn.1001-3695.2017.02.003
[21] (Tian Changli, Zhang Xun, Pan Bo, et al.Research and Implementation of Feature Extraction Methods on Internet CTR Prediction Model[J]. Application Research of Computers, 2017, 34(2): 334-338.)
doi: 10.3969/j.issn.1001-3695.2017.02.003
[22] 邓生雄, 雒江涛, 刘勇, 等. 集成随机森林的分类模型[J]. 计算机应用研究, 2015, 32(6): 1621-1624.
doi: 10.3969/j.issn.1001-3695.2015.06.005
[22] (Deng Shengxiong, Luo Jiangtao, Liu Yong, et al.Classification Model Based on Ensemble Random Forests[J]. Application Research of Computers, 2015, 32(6): 1621-1624.)
doi: 10.3969/j.issn.1001-3695.2015.06.005
[23] Zhu J, Shan Y, Mao J C, et al.Deep Embedding Forest: Forest-based Serving with Deep Embedding Features[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017: 1703-1711.
[24] 孙勇, 谭文安, 谢娜, 等. 面向大规模服务性能预测的在线学习方法[J]. 计算机科学与探索, 2017, 11(12): 1922-1930.
[24] (Sun Yong, Tan Wen’an, Xie Na, et al.Online Learning Approach for Performance Prediction in Large-Scale Service Computing[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(12): 1922-1930.)
[25] Grünwald P, De Rooij S.Asymptotic Log-Loss of Prequential Maximum Likelihood Codes[C]// Proceedings of the 18th Annual Conference on Learning Theory. 2005:652-667.
[26] Lobo J M, Jiménez-Valverde A, Real R.AUC: A Misleading Measure of the Performance of Predictive Distribution Models[J]. Global Ecology & Biogeography, 2008, 17(2): 145-151.
doi: 10.1111/j.1466-8238.2007.00358.x
[27] 王敏, 何炬林, 高作汉. 统计与分析预测理论[M]. 武汉: 武汉大学出版社, 2014.
[27] (Wang Min, He Julin, Gao Zuohan.Statistical and Analytical Prediction Theory[M]. Wuhan: Wuhan University Press, 2014.)
[1] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[5] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[6] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[7] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[8] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[9] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[10] Jing Li,Shuxiao Pan,Xueyan Li,Lijing Jia,Yuzhuo Zhao. Screening Critical Patients with Optimized Classifier Based on Multi Objective Quantum[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
[11] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[12] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[13] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[14] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[15] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn