Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (10): 80-92    DOI: 10.11925/infotech.2096-3467.2020.0046
Current Issue | Archive | Adv Search |
Microblog Image Privacy Classification with Deep Transfer Learning
Wang Shuyi(),Liu Sai,Ma Zheng
Management School, Tianjin Normal University, Tianjin 300387, China
Download: PDF (5959 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposed a Social Network Image Privacy classifier based on transfer learning to provide reasonable hints for users to avoid accidentally uploading private information.[Methods] A new standard image dataset was created by gathering and annotating images from the Weibo platform. The deep transfer learning and fine-tuning of various image pre-training models were applied to classify whether the Weibo images contain privacy information or not automatically.[Results] With the same amount of data, the accuracy of transfer learning is improved by at least 30 percent compared to non-transfer learning approaches. Most ResNet deep neural network architectures can achieve more than 88% accuracy with transfer learning. Among them, ResNet50 has the highest recall rate (94.31%), accuracy (90.80%) and F1 value (91.11%), and the shortest testing time (148s). It has been selected out after comprehensive measurements of the above metrics and recommended as the most suitable model structure for current scenario requirements.[Limitations] The amount of labeled data in this study is relatively small, which may not be able to cover all the types of private information.[Conclusions] This study validates the feasibility and efficiency of deep transfer learning in the field of classification of private Weibo images. The result can be applied to various types of social media platforms to warn users about the risk of privacy leaking. The annotated image dataset can be used in others’ further researches as both a foundation and a comparison.

Key wordsPrivacy Protection      Machine Learning      Deep Transfer Learning     
Received: 10 January 2020      Published: 28 July 2020
ZTFLH:  G203  
Corresponding Authors: Wang Shuyi     E-mail: nkwshuyi@gmail.com

Cite this article:

Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning. Data Analysis and Knowledge Discovery, 2020, 4(10): 80-92.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0046     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I10/80

Illustration of Deep Transfer Learning
Flow Chart of the Experiment
隐私信息类别 二级隐私信息类别
个人基本信息 姓名、生日、出生地、性别、国籍
个人生活 民族、宗教、性取向、婚姻状况、酗酒、违法记录
生物识别信息 个人基因、面部特征、指纹、掌纹、耳廓、虹膜、
纹身、肌肤裸露、笔迹
健康信息 医疗记录、病史、身体状况相关指标
证照信息 身份证、驾驶证、护照、居住证、社保卡、军官证、工作证、学生证、车辆牌照
财产信息 银行账号、转账支付记录(包括法定与虚拟货币)、房产信息、借贷信息、收据、票根
通讯信息 电话号码、电子邮箱地址、网络系统账号、IP地址、通讯录(包括本地和在线)、上网记录
位置信息 精准定位、住址、行踪轨迹、住宿信息、经纬度
教育/工作信息 学历、学位、教育经历、成绩单、职业、职位、工作单位、工作经历、培训记录、工作场合
关系信息 家庭关系、社交圈、职业圈、集会
List of Privacy Information Categories
Examples of Training Data
数据集 私密 公开 总计
训练集 685 1 134 1 819
验证集 241 358 599
测试集 299 299 598
Statistics of All Datasets
网络结构 训练时长/s 最优模型轮数 损失值 准确率
AlexNet 4 889 11 0.29 86.24%
ResNet18 4 899 14 0.28 89.05%
ResNet34 4 889 5 0.28 88.56%
ResNet50 4 910 8 0.26 90.38%
ResNet101 4 921 8 0.28 89.39%
ResNet152 5 288 5 0.27 90.05%
Models Training Results
Confusion Matrix of Classification Results
网络结构 测试时长/s 准确率 F1值 精准率 召回率
AlexNet 157 85.28% 85.85% 82.66% 89.30%
ResNet18 195 89.63% 89.74% 88.85% 90.64%
ResNet34 168 87.96% 88.57% 84.29% 93.31%
ResNet50 148 90.80% 91.11% 88.12% 94.31%
ResNet101 180 88.29% 88.78% 85.23% 92.64%
ResNet152 152 87.79% 88.24% 85.09% 91.64%
Testing Results of Transfer Learning Models
Random Examples of ResNet50 Model Testing Results
Wrong Predictions of Model ResNet50
网络结构 测试时长/s 准确率 F1值 精准率 召回率
AlexNet 149 52.17% 66.90% 51.15% 96.66%
ResNet18 152 53.18% 25.93% 62.03% 16.39%
ResNet34 146 53.18% 55.56% 52.87% 58.53%
ResNet50 155 53.68% 60.82% 52.70% 71.91%
ResNet101 159 55.18% 51.45% 56.13% 47.49%
ResNet152 166 51.67% 61.62% 51.10% 77.59%
Testing Results of Non-Transfer Learning Models
[1] Wang N, Xu H, Grossklags J. Third-Party Apps on Facebook: Privacy and the Illusion of Control[C]//Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology. 2011: No. 4.
[2] 顾理平, 杨苗. 个人隐私数据“二次使用”中的边界[J]. 新闻与传播研究, 2016(9):75-86.
[2] ( Gu Liping, Yang Miao. The Boundaries of the “Secondary Use” of Personal Privacy Data[J]. Journalism & Communication, 2016(9):75-86.)
[3] Mayer-Schönberger V. Delete: The Virtue of Forgetting in the Digital Age[M]. Princeton University Press, 2011.
[4] Norberg P A, Horne D R, Horne D A. The Privacy Paradox: Personal Information Disclosure Intentions Versus Behaviors[J]. Journal of Consumer Affairs, 2007,41(1):100-126.
doi: 10.1111/joca.2007.41.issue-1
[5] Wachter S. Normative Challenges of Identification in the Internet of Things: Privacy, Profiling, Discrimination, and the GDPR[J]. Computer Law & Security Review, 2018,34(3):436-449.
[6] Jensen C, Potts C, Jensen C. Privacy Practices of Internet Users: Self-Reports Versus Observed Behavior[J]. International Journal of Human-Computer Studies, 2005,63(1-2):203-227.
doi: 10.1016/j.ijhcs.2005.04.019
[7] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-V4, Inception-Resnet and the Impact of Residual Connections on Learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI-17). CA,USA. 2017: 4278-4284.
[8] Bhalgat Y, Shah M, Awate S. Annotation-Cost Minimization for Medical Image Segmentation Using Suggestive Mixed Supervision Fully Convolutional Networks[OL]. arXiv Preprint, arXiv: 1812. 11302.
[9] Zhou Z W, Shin J, Zhang L, et al. Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7340-7351.
[10] 李璇, 李德, 杨智, 等. 图像中人脸隐私度的定量评估研究[J]. 计算机与数字工程, 2019,47(10):2550-2555.
[10] ( Li Xuan, Li De, Yang Zhi, et al. Quantifying Privacy Levels of Faces in Images[J]. Computer & Digital Engineering, 2019,47(10):2550-2555.)
[11] 李凤华, 孙哲, 牛犇, 等. 跨社交网络的隐私图片分享框架[J]. 通信学报, 2019,40(7):1-13.
[11] ( Li Fenghua, Sun Zhe, Niu Ben, et al. Privacy-Preserving Photo Sharing Framework Cross Different Social Network[J]. Journal on Communications, 2019,40(7):1-13.)
[12] 章坚武, 沈炜, 吴震东. 卷积神经网络的人脸隐私保护识别[J]. 中国图象图形学报, 2019,24(5):744-752.
[12] ( Zhang Jianwu, Shen Wei, Wu Zhendong. Recognition of Face Privacy Protection Using Convolutional Neural Networks[J]. Journal of Image and Graphics, 2019,24(5):744-752.)
[13] Zerr S, Siersdorfer S, Hare J, et al. Privacy-Aware Image Classification and Search[C]//Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Hannover, GER, 2012: 35-44.
[14] Tonge A, Caragea C. Privacy Prediction of Images Shared on Social Media Sites Using Deep Features[OL]. arXiv Preprint, arXiv:1510.08583.
[15] Tonge A, Caragea C. On the Use of “Deep” Features for Online Image Sharing[C]//Proceedings of the Web Conference. 2018: 1317-1321.
[16] Squicciarini A C, Caragea C, Balakavi R. Analyzing Images’ Privacy for the Modern Web[C]//Proceedings of the 25th ACM Conference on Hypertext and Social Media. 2014: 136-147.
[17] Tonge A, Caragea C. Image Privacy Prediction Using Deep Neural Networks[OL]. arXiv Preprint, arXiv: 1903.03695.
[18] Orekondy T, Schiele B, Fritz M. Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images[OL]. arXiv Preprint, arXiv:1703.10660.
[19] 黄兴森. 基于深度学习的图像隐私感知算法研究[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[19] ( Huang Xingsen. Research on the Algorithm of Image Privacy-Aware Based on Deep Learning[D]. Harbin: Harbin Institute of Technology, 2019.)
[20] Haralick R M, Shanmugam K, Dinstein I H. Textural Features for Image Classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1973(6):610-621.
[21] Chandrashekar G, Sahin F. A Survey on Feature Selection Methods[J]. Computers & Electrical Engineering, 2014,40(1):16-28.
[22] Bosch A, Zisserman A, Munoz X. Image Classification Using Random Forests and Ferns[C]//Proceedings of 2007 IEEE 11th International Conference on Computer Vision. 2007: 1-8.
[23] Chapelle O, Haffner P, Vapnik V N. Support Vector Machines for Histogram-Based Image Classification[J]. IEEE Transactions on Neural Networks, 1999,10(5):1055-1064.
doi: 10.1109/72.788646 pmid: 18252608
[24] Lauzon F Q. An Introduction to Deep Learning[C]//Proceedings of the 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA). 2012: 1438-1439.
[25] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1. 2012: 1097-1105.
[26] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016: 770-778.
[27] Philipp G, Song D, Carbonell J G. The Exploding Gradient Problem Demystified-Definition, Prevalence, Impact, Origin, Tradeoffs, and Solutions[OL]. arXiv Preprint, arXiv:1712.05577.
[28] Sorokin A, Forsyth D. Utility Data Annotation with Amazon Mechanical Turk[C]//Proceedings of 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2008: 1-8.
[29] SrivastavaRI N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014,15(1):1929-1958.
[30] Tan C Q, Sun F C, Kong T, et al. A Survey on Deep Transfer Learning[C]//Proceedings of International Conference on Artificial Neural Networks. Springer, 2018: 270-279.
[31] Wang B L, Yao Y S, Viswanath B, et al. With Great Training Comes Great Vulnerability: Practical Attacks Against Transfer Learning[C]// Proceedings of the 27th USENIX Conference on Security Symposium. 2018: 1281-1297.
[32] Yosinski J, Clune J, Bengio Y, et al. How Transferable Are Features in Deep Neural Networks?[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014: 3320-3328.
[33] 龙满生, 欧阳春娟, 刘欢, 等. 基于卷积神经网络与迁移学习的油茶病害图像识别[J]. 农业工程学报, 2018,34(18):194-201.
[33] ( Long Mansheng, Ouyang Chunjuan, Liu Huan, et al. Image Recognition of Camellia Oleifera Diseases Based on Convolutional Neural Network & Transfer Learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2018,34(18):194-201.)
[34] 刘颖, 张帅, 范九伦. 基于迁移学习及特征融合的轮胎花纹图像分类[J]. 计算机工程与设计, 2019,40(5):1401-1406.
[34] ( Liu Ying, Zhang Shuai, Fan Jiulun. Tread Pattern Image Classification with Feature Fusion Based on Transfer Learning[J]. Computer Engineering and Design, 2019,40(5):1401-1406.)
[35] Li X, Zhang L P, Du B, et al. Iterative Reweighting Heterogeneous Transfer Learning Framework for Supervised Remote Sensing Image Classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017,10(5):2022-2035.
doi: 10.1109/JSTARS.2016.2646138
[36] Nguyen L D, Lin D Y, Lin Z P, et al. Deep CNNs for Microscopic Image Classification by Exploiting Transfer Learning and Feature Concatenation[C]//Proceedings of 2018 IEEE International Symposium on Circuits and Systems (ISCAS). 2018. DOI: 10.1109/ISCAS.2018.8351550.
[37] Lee K H, He X D, Zhang L, et al. CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5447-5456.
[38] Han D M, Liu Q G, Fan W G. A New Image Classification Method Using CNN Transfer Learning and Web Data Augmentation[J]. Expert Systems with Applications, 2018,95:43-56.
doi: 10.1016/j.eswa.2017.11.028
[39] Howard J, Gugger S. fastai: A Layered API for Deep Learning[J]. Information, 2020,11(2). DOI: 10.3390/info11020108.
[40] 新浪微博数据中心. 2018微博用户发展报告-应用报告-微博报告-微报告[EB/OL].(2019-03-15)[2020-02-06]. https://data.weibo.com/report/reportDetail?id=433.
[40] (Sina Weibo Data Center. 2018 Weibo User Development Report-Application Report-Weibo Report-Weibo[EB/OL]. (2019-03-15)[2020-02-06]. https://data.weibo.com/report/reportDetail?id=433.)
[41] 苏扬. 娱乐新闻中的明星隐私曝光现象研究[D]. 长沙: 湖南师范大学, 2015.
[41] ( Su Yang. The Research on the Exposure Phenomenon of Stars’ Privacy in Entertainment News[D]. Changsha: Hunan Normal University, 2015.)
[42] Li Y F, Troutman W, Knijnenburg B P, et al. Human Perceptions of Sensitive Content in Photos[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2018: 1671-1676.
[43] Smith L N. Cyclical Learning Rates for Training Neural Networks[C]//Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 2017: 464-472.
[44] Smith L N. A Disciplined Approach to Neural Network Hyper-Parameters: Part 1-Learning Rate, Batch Size, Momentum, and Weight Decay[OL]. arXiv Preprint, arXiv:1803.09820.
[1] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[5] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[6] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[7] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[8] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[9] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[10] Jing Li,Shuxiao Pan,Xueyan Li,Lijing Jia,Yuzhuo Zhao. Screening Critical Patients with Optimized Classifier Based on Multi Objective Quantum[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
[11] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[12] Liu Lina,Qi Jiayin,Zhang Zhenping,Zeng Dan. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[13] Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[14] Lu Wei,Luo Mengqi,Ding Heng,Li Xin. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[15] Wang Li,Zou Lixue,Liu Xiwen. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn