[Objective] This paper aims to improve the classification results of anonymous groups and then obtain better data masking model and algorithm. [Methods] First, we modified the dimension judgment standards based on k-anonymity. Then, we used the KD tree as storage structure to construct a new algorithm. Third, we implemented the proposed algorithm with Python. Finally, we examined the feasibility and effectiveness of the new algorithm with the number of anonymous groups and the percentage of NCP. [Results] The new algorithm could maximize the number of anonymous groups generated by the whole dataset, while the percentage of NCP was lower than similar algorithms. [Limitations] For datasets with significant degree of dispersion, the dimension of the loop computation was cumbersome. [Conclusions] The proposed algorithm could improve the availability of the anonymous groups and reduce the data loss.
周倩伊, 王亚民, 王闯. 基于互联网大数据的脱敏分析技术研究[J]. 数据分析与知识发现, 2018, 2(2): 58-63.
Zhou Qianyi,Wang Yamin,Wang Chuang. Data Masking Analysis Based on Internet Big Data. Data Analysis and Knowledge Discovery, 2018, 2(2): 58-63.
(Mu Liang, Cheng Lianglun.Adaptive Learning Model Based on K-anonymity Location Privacy Protection[J]. Computer Engineering and Applications, 2017, 53(18): 89-94, 101.)
(Ye Yun, Shi Congcong, Yu Yong, et al.Privacy-Preserving Distributed Naive Bayes Data Mining[J]. Journal of Applied Sciences— Electronics and Information Engineering, 2017, 35(1): 1-10.)
doi: 10.3969/j.issn.0255-8297.2017.01.001
(Wang Jing, Yan Renwu, Liu Yamei.Implementation of K-anonymous Model with Multi-sensitive Attributes[J]. Computer & Digital Engineering, 2017, 45(7): 1368-1372.)
(Wang Liang, Wang Weiping, Meng Dan.FVS K-anonymity: An Anonymous Privacy Protection Method Based on K-anonymity[J]. Chinese High Technology Letters, 2015, 25(3): 228-238.)
doi: 10.3772/j.issn.1002-0470.2015.03.002
(Zheng Luqian, Han Jianmin, Lu Jianfeng, et al.(k, δ, l)-Anonymity Model to Resist Spatio-Temporal Point Linkage Attack[J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(9): 1108-1121.)
doi: 10.3778/j.issn.1673-9418.1409079
[6]
吴英杰. 隐私保护数据发布: 模型与算法[M]. 北京: 清华大学出版社, 2015: 7-16.
[6]
(Wu Yingjie.Privacy Preserving Data Publishing: Models and Algorithms [M]. Beijing: Tsinghua University Press, 2015: 7-16.)
(Wu Yingjie, Tang Qingming, Ni Weiwei, et al.Algorithm for k-Anonymity Based on Rounded Partition Function[J]. Journal of Software, 2012, 23(8): 2138-2148.)
doi: 10.3724/SP.J.1001.2012.04157
[8]
Xu J, Wang W, Pei J, et al.Utility-Based Anonymization Using Local Recording[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD). 2006: 785-790.
[9]
Ghinita G, Karras P, Kalnis P, et al.Fast Data Anonymization with Low Information Loss[C]//Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment. 2007: 758-769.
(Chen Tianying, Chen Jianfeng.Intelligent Data Masking System for Big Data Productive Environment[J]. Communications Technology, 2016, 49(7): 915-922.)