Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (10): 80-92     https://doi.org/10.11925/infotech.2096-3467.2020.0046
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度迁移学习的微博图像隐私分类研究*
王树义(),刘赛,马峥
天津师范大学管理学院 天津 300387
Microblog Image Privacy Classification with Deep Transfer Learning
Wang Shuyi(),Liu Sai,Ma Zheng
Management School, Tianjin Normal University, Tianjin 300387, China
全文: PDF (5959 KB)   HTML ( 20
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 构建基于迁移学习的社交网络图像隐私自动分类器,对用户进行合理的提示,避免用户无意间上传包含隐私信息的内容。【方法】 本文构建并标注了微博图像隐私分类数据集,采用深度迁移机器学习,尝试微调多种不同的图像预训练模型,对新浪微博图片是否包含隐私进行自动化分类。【结果】 以相同的数据量,通过与非迁移学习方式对比,迁移学习的准确率至少提升了30%。迁移学习方式下,大部分ResNet深度神经网络架构的准确率可以达到88%以上。其中,ResNet50拥有最高的召回率(94.31%)、准确率(90.80%)和F1值(91.11%),且测试耗时最短(148 s),综合权衡对比,是最为适合当前场景需求的模型架构。【局限】 标注的数据量相对偏少,可能没有囊括某些其他隐私类型。【结论】 本文验证了深度迁移学习在微博隐私图片分类领域的可行性,可以为社交媒体用户提供隐私曝露预警。构建的微博图片隐私分类数据集为后续研究提供了基础和参考对照标准。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王树义
刘赛
马峥
关键词 隐私保护机器学习深度迁移学习    
Abstract

[Objective] This paper proposed a Social Network Image Privacy classifier based on transfer learning to provide reasonable hints for users to avoid accidentally uploading private information.[Methods] A new standard image dataset was created by gathering and annotating images from the Weibo platform. The deep transfer learning and fine-tuning of various image pre-training models were applied to classify whether the Weibo images contain privacy information or not automatically.[Results] With the same amount of data, the accuracy of transfer learning is improved by at least 30 percent compared to non-transfer learning approaches. Most ResNet deep neural network architectures can achieve more than 88% accuracy with transfer learning. Among them, ResNet50 has the highest recall rate (94.31%), accuracy (90.80%) and F1 value (91.11%), and the shortest testing time (148s). It has been selected out after comprehensive measurements of the above metrics and recommended as the most suitable model structure for current scenario requirements.[Limitations] The amount of labeled data in this study is relatively small, which may not be able to cover all the types of private information.[Conclusions] This study validates the feasibility and efficiency of deep transfer learning in the field of classification of private Weibo images. The result can be applied to various types of social media platforms to warn users about the risk of privacy leaking. The annotated image dataset can be used in others’ further researches as both a foundation and a comparison.

Key wordsPrivacy Protection    Machine Learning    Deep Transfer Learning
收稿日期: 2020-01-10      出版日期: 2020-07-28
ZTFLH:  G203  
基金资助:*本文系国家社会科学基金青年项目“基于信息价格动态揭示的社交媒体用户隐私保护研究”的研究成果之一(15CTQ017)
通讯作者: 王树义     E-mail: nkwshuyi@gmail.com
引用本文:   
王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning. Data Analysis and Knowledge Discovery, 2020, 4(10): 80-92.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0046      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I10/80
Fig.1  深度迁移学习示例
Fig.2  实验流程
隐私信息类别 二级隐私信息类别
个人基本信息 姓名、生日、出生地、性别、国籍
个人生活 民族、宗教、性取向、婚姻状况、酗酒、违法记录
生物识别信息 个人基因、面部特征、指纹、掌纹、耳廓、虹膜、
纹身、肌肤裸露、笔迹
健康信息 医疗记录、病史、身体状况相关指标
证照信息 身份证、驾驶证、护照、居住证、社保卡、军官证、工作证、学生证、车辆牌照
财产信息 银行账号、转账支付记录(包括法定与虚拟货币)、房产信息、借贷信息、收据、票根
通讯信息 电话号码、电子邮箱地址、网络系统账号、IP地址、通讯录(包括本地和在线)、上网记录
位置信息 精准定位、住址、行踪轨迹、住宿信息、经纬度
教育/工作信息 学历、学位、教育经历、成绩单、职业、职位、工作单位、工作经历、培训记录、工作场合
关系信息 家庭关系、社交圈、职业圈、集会
Table 1  隐私信息类别列表
Fig.3  训练数据图像示例
数据集 私密 公开 总计
训练集 685 1 134 1 819
验证集 241 358 599
测试集 299 299 598
Table 2  数据集图像数量(张)
网络结构 训练时长/s 最优模型轮数 损失值 准确率
AlexNet 4 889 11 0.29 86.24%
ResNet18 4 899 14 0.28 89.05%
ResNet34 4 889 5 0.28 88.56%
ResNet50 4 910 8 0.26 90.38%
ResNet101 4 921 8 0.28 89.39%
ResNet152 5 288 5 0.27 90.05%
Table 3  模型训练结果
Fig.4  模型分类结果混淆矩阵
网络结构 测试时长/s 准确率 F1值 精准率 召回率
AlexNet 157 85.28% 85.85% 82.66% 89.30%
ResNet18 195 89.63% 89.74% 88.85% 90.64%
ResNet34 168 87.96% 88.57% 84.29% 93.31%
ResNet50 148 90.80% 91.11% 88.12% 94.31%
ResNet101 180 88.29% 88.78% 85.23% 92.64%
ResNet152 152 87.79% 88.24% 85.09% 91.64%
Table 4  迁移学习模型测试结果
Fig.5  ResNet50判断结果的随机抽取
Fig.6  ResNet50判断错误图片枚举
网络结构 测试时长/s 准确率 F1值 精准率 召回率
AlexNet 149 52.17% 66.90% 51.15% 96.66%
ResNet18 152 53.18% 25.93% 62.03% 16.39%
ResNet34 146 53.18% 55.56% 52.87% 58.53%
ResNet50 155 53.68% 60.82% 52.70% 71.91%
ResNet101 159 55.18% 51.45% 56.13% 47.49%
ResNet152 166 51.67% 61.62% 51.10% 77.59%
Table 5  非迁移模型测试结果
[1] Wang N, Xu H, Grossklags J. Third-Party Apps on Facebook: Privacy and the Illusion of Control[C]//Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology. 2011: No. 4.
[2] 顾理平, 杨苗. 个人隐私数据“二次使用”中的边界[J]. 新闻与传播研究, 2016(9):75-86.
[2] ( Gu Liping, Yang Miao. The Boundaries of the “Secondary Use” of Personal Privacy Data[J]. Journalism & Communication, 2016(9):75-86.)
[3] Mayer-Schönberger V. Delete: The Virtue of Forgetting in the Digital Age[M]. Princeton University Press, 2011.
[4] Norberg P A, Horne D R, Horne D A. The Privacy Paradox: Personal Information Disclosure Intentions Versus Behaviors[J]. Journal of Consumer Affairs, 2007,41(1):100-126.
doi: 10.1111/joca.2007.41.issue-1
[5] Wachter S. Normative Challenges of Identification in the Internet of Things: Privacy, Profiling, Discrimination, and the GDPR[J]. Computer Law & Security Review, 2018,34(3):436-449.
[6] Jensen C, Potts C, Jensen C. Privacy Practices of Internet Users: Self-Reports Versus Observed Behavior[J]. International Journal of Human-Computer Studies, 2005,63(1-2):203-227.
doi: 10.1016/j.ijhcs.2005.04.019
[7] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-V4, Inception-Resnet and the Impact of Residual Connections on Learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI-17). CA,USA. 2017: 4278-4284.
[8] Bhalgat Y, Shah M, Awate S. Annotation-Cost Minimization for Medical Image Segmentation Using Suggestive Mixed Supervision Fully Convolutional Networks[OL]. arXiv Preprint, arXiv: 1812. 11302.
[9] Zhou Z W, Shin J, Zhang L, et al. Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7340-7351.
[10] 李璇, 李德, 杨智, 等. 图像中人脸隐私度的定量评估研究[J]. 计算机与数字工程, 2019,47(10):2550-2555.
[10] ( Li Xuan, Li De, Yang Zhi, et al. Quantifying Privacy Levels of Faces in Images[J]. Computer & Digital Engineering, 2019,47(10):2550-2555.)
[11] 李凤华, 孙哲, 牛犇, 等. 跨社交网络的隐私图片分享框架[J]. 通信学报, 2019,40(7):1-13.
[11] ( Li Fenghua, Sun Zhe, Niu Ben, et al. Privacy-Preserving Photo Sharing Framework Cross Different Social Network[J]. Journal on Communications, 2019,40(7):1-13.)
[12] 章坚武, 沈炜, 吴震东. 卷积神经网络的人脸隐私保护识别[J]. 中国图象图形学报, 2019,24(5):744-752.
[12] ( Zhang Jianwu, Shen Wei, Wu Zhendong. Recognition of Face Privacy Protection Using Convolutional Neural Networks[J]. Journal of Image and Graphics, 2019,24(5):744-752.)
[13] Zerr S, Siersdorfer S, Hare J, et al. Privacy-Aware Image Classification and Search[C]//Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Hannover, GER, 2012: 35-44.
[14] Tonge A, Caragea C. Privacy Prediction of Images Shared on Social Media Sites Using Deep Features[OL]. arXiv Preprint, arXiv:1510.08583.
[15] Tonge A, Caragea C. On the Use of “Deep” Features for Online Image Sharing[C]//Proceedings of the Web Conference. 2018: 1317-1321.
[16] Squicciarini A C, Caragea C, Balakavi R. Analyzing Images’ Privacy for the Modern Web[C]//Proceedings of the 25th ACM Conference on Hypertext and Social Media. 2014: 136-147.
[17] Tonge A, Caragea C. Image Privacy Prediction Using Deep Neural Networks[OL]. arXiv Preprint, arXiv: 1903.03695.
[18] Orekondy T, Schiele B, Fritz M. Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images[OL]. arXiv Preprint, arXiv:1703.10660.
[19] 黄兴森. 基于深度学习的图像隐私感知算法研究[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[19] ( Huang Xingsen. Research on the Algorithm of Image Privacy-Aware Based on Deep Learning[D]. Harbin: Harbin Institute of Technology, 2019.)
[20] Haralick R M, Shanmugam K, Dinstein I H. Textural Features for Image Classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1973(6):610-621.
[21] Chandrashekar G, Sahin F. A Survey on Feature Selection Methods[J]. Computers & Electrical Engineering, 2014,40(1):16-28.
[22] Bosch A, Zisserman A, Munoz X. Image Classification Using Random Forests and Ferns[C]//Proceedings of 2007 IEEE 11th International Conference on Computer Vision. 2007: 1-8.
[23] Chapelle O, Haffner P, Vapnik V N. Support Vector Machines for Histogram-Based Image Classification[J]. IEEE Transactions on Neural Networks, 1999,10(5):1055-1064.
doi: 10.1109/72.788646 pmid: 18252608
[24] Lauzon F Q. An Introduction to Deep Learning[C]//Proceedings of the 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA). 2012: 1438-1439.
[25] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1. 2012: 1097-1105.
[26] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016: 770-778.
[27] Philipp G, Song D, Carbonell J G. The Exploding Gradient Problem Demystified-Definition, Prevalence, Impact, Origin, Tradeoffs, and Solutions[OL]. arXiv Preprint, arXiv:1712.05577.
[28] Sorokin A, Forsyth D. Utility Data Annotation with Amazon Mechanical Turk[C]//Proceedings of 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2008: 1-8.
[29] SrivastavaRI N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014,15(1):1929-1958.
[30] Tan C Q, Sun F C, Kong T, et al. A Survey on Deep Transfer Learning[C]//Proceedings of International Conference on Artificial Neural Networks. Springer, 2018: 270-279.
[31] Wang B L, Yao Y S, Viswanath B, et al. With Great Training Comes Great Vulnerability: Practical Attacks Against Transfer Learning[C]// Proceedings of the 27th USENIX Conference on Security Symposium. 2018: 1281-1297.
[32] Yosinski J, Clune J, Bengio Y, et al. How Transferable Are Features in Deep Neural Networks?[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014: 3320-3328.
[33] 龙满生, 欧阳春娟, 刘欢, 等. 基于卷积神经网络与迁移学习的油茶病害图像识别[J]. 农业工程学报, 2018,34(18):194-201.
[33] ( Long Mansheng, Ouyang Chunjuan, Liu Huan, et al. Image Recognition of Camellia Oleifera Diseases Based on Convolutional Neural Network & Transfer Learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2018,34(18):194-201.)
[34] 刘颖, 张帅, 范九伦. 基于迁移学习及特征融合的轮胎花纹图像分类[J]. 计算机工程与设计, 2019,40(5):1401-1406.
[34] ( Liu Ying, Zhang Shuai, Fan Jiulun. Tread Pattern Image Classification with Feature Fusion Based on Transfer Learning[J]. Computer Engineering and Design, 2019,40(5):1401-1406.)
[35] Li X, Zhang L P, Du B, et al. Iterative Reweighting Heterogeneous Transfer Learning Framework for Supervised Remote Sensing Image Classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017,10(5):2022-2035.
doi: 10.1109/JSTARS.2016.2646138
[36] Nguyen L D, Lin D Y, Lin Z P, et al. Deep CNNs for Microscopic Image Classification by Exploiting Transfer Learning and Feature Concatenation[C]//Proceedings of 2018 IEEE International Symposium on Circuits and Systems (ISCAS). 2018. DOI: 10.1109/ISCAS.2018.8351550.
[37] Lee K H, He X D, Zhang L, et al. CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5447-5456.
[38] Han D M, Liu Q G, Fan W G. A New Image Classification Method Using CNN Transfer Learning and Web Data Augmentation[J]. Expert Systems with Applications, 2018,95:43-56.
doi: 10.1016/j.eswa.2017.11.028
[39] Howard J, Gugger S. fastai: A Layered API for Deep Learning[J]. Information, 2020,11(2). DOI: 10.3390/info11020108.
[40] 新浪微博数据中心. 2018微博用户发展报告-应用报告-微博报告-微报告[EB/OL].(2019-03-15)[2020-02-06]. https://data.weibo.com/report/reportDetail?id=433.
[40] (Sina Weibo Data Center. 2018 Weibo User Development Report-Application Report-Weibo Report-Weibo[EB/OL]. (2019-03-15)[2020-02-06]. https://data.weibo.com/report/reportDetail?id=433.)
[41] 苏扬. 娱乐新闻中的明星隐私曝光现象研究[D]. 长沙: 湖南师范大学, 2015.
[41] ( Su Yang. The Research on the Exposure Phenomenon of Stars’ Privacy in Entertainment News[D]. Changsha: Hunan Normal University, 2015.)
[42] Li Y F, Troutman W, Knijnenburg B P, et al. Human Perceptions of Sensitive Content in Photos[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2018: 1671-1676.
[43] Smith L N. Cyclical Learning Rates for Training Neural Networks[C]//Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 2017: 464-472.
[44] Smith L N. A Disciplined Approach to Neural Network Hyper-Parameters: Part 1-Learning Rate, Batch Size, Momentum, and Weight Decay[OL]. arXiv Preprint, arXiv:1803.09820.
[1] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] 陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] 苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] 柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[9] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[10] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[11] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[12] 王若佳,张璐,王继民. 基于机器学习的在线问诊平台智能分诊研究[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[13] 李纲,周华阳,毛进,陈思菁. 基于机器学习的社交媒体用户分类研究 *[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[14] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[15] 高广尚. 关于实体解析基本方法的研究和述评*[J]. 数据分析与知识发现, 2019, 3(5): 27-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn