Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (2): 58-63    DOI: 10.11925/infotech.2096-3467.2017.0809
Current Issue | Archive | Adv Search |
Data Masking Analysis Based on Internet Big Data
Zhou Qianyi(), Wang Yamin, Wang Chuang
School of Economics and Management, Xidian University, Xi’an 710126, China
Download: PDF (546 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to improve the classification results of anonymous groups and then obtain better data masking model and algorithm. [Methods] First, we modified the dimension judgment standards based on k-anonymity. Then, we used the KD tree as storage structure to construct a new algorithm. Third, we implemented the proposed algorithm with Python. Finally, we examined the feasibility and effectiveness of the new algorithm with the number of anonymous groups and the percentage of NCP. [Results] The new algorithm could maximize the number of anonymous groups generated by the whole dataset, while the percentage of NCP was lower than similar algorithms. [Limitations] For datasets with significant degree of dispersion, the dimension of the loop computation was cumbersome. [Conclusions] The proposed algorithm could improve the availability of the anonymous groups and reduce the data loss.

Key wordsData Masking      k-anonymity      Integer Division     
Received: 15 August 2017      Published: 07 March 2018
ZTFLH:  TP391 G35  

Cite this article:

Zhou Qianyi,Wang Yamin,Wang Chuang. Data Masking Analysis Based on Internet Big Data. Data Analysis and Knowledge Discovery, 2018, 2(2): 58-63.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0809     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I2/58

层面 内容 实现机制
数据管理层面 安全管理、访问控制、审计追溯 数据库管理系统DBMS
数据内容层面 应用层 脱敏数据的使用 数据分析、挖掘算法
脱敏层 隐私数据脱敏层 隐私数据脱敏算法
数据层 数据库、知识库、规则库 敏感数据分类分级
资源层 计算资源、网络资源 脱敏数据集的来源
所属类别 对应属性
显示标识符 name、phone、ID、address等
准标识符 age、workclass、education_num、marital_status、
occupation、race、sex、native_country
敏感信息 relationship
非敏感信息 fnlwgt、education、capital-gain、capital-loss、
hours-per-week
域名 数据类型 描述
取整划分符号 T(d) 关系表 假设表T(d)d个准标识符, 即d维空间。
Qi 属性值 准标识符中的第i个属性。
P 点集 每个Qi对应的实域序列$\left\{ q(i,1),q(i,2),\cdots ,q(i,{{t}_{i}}) \right\}$中的集合。
Ω 点集 能够覆盖P的最小的多维矩形区域, 即KD树中的Range
q(i, j) 元素取值 对应Qi的域中的第j个元素, 且$1\le i\le d,1\le j\le {{t}_{i}}=\left| {{Q}_{i}} \right|$。
$\prod{_{i}(p)}$ 属性值 一个点p在这个d维空间中的第i维上的投影。
构建KD树的符号 Node-data 数据矢量 某个属性的取值(划分标准), 或者某个点的取值。(叶子节点)
Range 空间矢量 待划分的点的集合, 此上述的Ω相同。
split 整数 代表维度的序号, 通常分割超面是垂直于坐标轴的。
left k-d树 每一次分割的左节点, 递归的实现KD树左侧的划分。
right k-d树 每一次分割的右节点, 递归的实现KD树右侧的划分。
parent k-d树 父节点
[1] 穆良, 程良伦. 基于k-匿名位置隐私保护的自适应学习模型[J]. 计算机工程与应用, 2017, 53(18): 89-94, 101.
[1] (Mu Liang, Cheng Lianglun.Adaptive Learning Model Based on K-anonymity Location Privacy Protection[J]. Computer Engineering and Applications, 2017, 53(18): 89-94, 101.)
[2] 叶云, 石聪聪, 余勇, 等. 保护隐私的分布式朴素贝叶斯挖掘[J]. 应用科学学报, 2017, 35(1): 1-10.
doi: 10.3969/j.issn.0255-8297.2017.01.001
[2] (Ye Yun, Shi Congcong, Yu Yong, et al.Privacy-Preserving Distributed Naive Bayes Data Mining[J]. Journal of Applied Sciences— Electronics and Information Engineering, 2017, 35(1): 1-10.)
doi: 10.3969/j.issn.0255-8297.2017.01.001
[3] 王静, 闫仁武, 刘亚梅. 多敏感属性K-匿名模型的实现[J]. 计算机与数字工程, 2017, 45(7): 1368-1372.
[3] (Wang Jing, Yan Renwu, Liu Yamei.Implementation of K-anonymous Model with Multi-sensitive Attributes[J]. Computer & Digital Engineering, 2017, 45(7): 1368-1372.)
[4] 王良, 王伟平, 孟丹. FVS k-匿名: 一种基于k-匿名的隐私保护方法[J]. 高技术通讯, 2015, 25(3): 228-238.
doi: 10.3772/j.issn.1002-0470.2015.03.002
[4] (Wang Liang, Wang Weiping, Meng Dan.FVS K-anonymity: An Anonymous Privacy Protection Method Based on K-anonymity[J]. Chinese High Technology Letters, 2015, 25(3): 228-238.)
doi: 10.3772/j.issn.1002-0470.2015.03.002
[5] 郑路倩, 韩建民, 鲁剑锋, 等. 抵制时空位置点链接攻击的(k, δ, l)-匿名模型[J]. 计算机科学与探索, 2015, 9(9): 1108-1121.
doi: 10.3778/j.issn.1673-9418.1409079
[5] (Zheng Luqian, Han Jianmin, Lu Jianfeng, et al.(k, δ, l)-Anonymity Model to Resist Spatio-Temporal Point Linkage Attack[J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(9): 1108-1121.)
doi: 10.3778/j.issn.1673-9418.1409079
[6] 吴英杰. 隐私保护数据发布: 模型与算法[M]. 北京: 清华大学出版社, 2015: 7-16.
[6] (Wu Yingjie.Privacy Preserving Data Publishing: Models and Algorithms [M]. Beijing: Tsinghua University Press, 2015: 7-16.)
[7] 吴英杰, 唐庆明, 倪巍伟, 等. 基于取整划分函数的k匿名算法[J]. 软件学报, 2012, 23(8): 2138-2148.
doi: 10.3724/SP.J.1001.2012.04157
[7] (Wu Yingjie, Tang Qingming, Ni Weiwei, et al.Algorithm for k-Anonymity Based on Rounded Partition Function[J]. Journal of Software, 2012, 23(8): 2138-2148.)
doi: 10.3724/SP.J.1001.2012.04157
[8] Xu J, Wang W, Pei J, et al.Utility-Based Anonymization Using Local Recording[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD). 2006: 785-790.
[9] Ghinita G, Karras P, Kalnis P, et al.Fast Data Anonymization with Low Information Loss[C]//Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment. 2007: 758-769.
[10] 陈天莹, 陈剑锋. 大数据环境下的智能数据脱敏系统[J]. 通信技术, 2016, 49(7): 915-922.
[10] (Chen Tianying, Chen Jianfeng.Intelligent Data Masking System for Big Data Productive Environment[J]. Communications Technology, 2016, 49(7): 915-922.)
[1] Wang Pingshui. Research on Anonymous Privacy-Preserving Techniques Based on Clustering[J]. 现代图书情报技术, 2010, 26(11): 53-58.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn