[Objective] This paper aims to improve the classification results of anonymous groups and then obtain better data masking model and algorithm. [Methods] First, we modified the dimension judgment standards based on k-anonymity. Then, we used the KD tree as storage structure to construct a new algorithm. Third, we implemented the proposed algorithm with Python. Finally, we examined the feasibility and effectiveness of the new algorithm with the number of anonymous groups and the percentage of NCP. [Results] The new algorithm could maximize the number of anonymous groups generated by the whole dataset, while the percentage of NCP was lower than similar algorithms. [Limitations] For datasets with significant degree of dispersion, the dimension of the loop computation was cumbersome. [Conclusions] The proposed algorithm could improve the availability of the anonymous groups and reduce the data loss.
周倩伊, 王亚民, 王闯. 基于互联网大数据的脱敏分析技术研究[J]. 数据分析与知识发现, 2018, 2(2): 58-63.
Zhou Qianyi,Wang Yamin,Wang Chuang. Data Masking Analysis Based on Internet Big Data. Data Analysis and Knowledge Discovery, 2018, 2(2): 58-63.
(Ye Yun, Shi Congcong, Yu Yong, et al.Privacy-Preserving Distributed Naive Bayes Data Mining[J]. Journal of Applied Sciences— Electronics and Information Engineering, 2017, 35(1): 1-10.)
(Wang Liang, Wang Weiping, Meng Dan.FVS K-anonymity: An Anonymous Privacy Protection Method Based on K-anonymity[J]. Chinese High Technology Letters, 2015, 25(3): 228-238.)
(Zheng Luqian, Han Jianmin, Lu Jianfeng, et al.(k, δ, l)-Anonymity Model to Resist Spatio-Temporal Point Linkage Attack[J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(9): 1108-1121.)
吴英杰. 隐私保护数据发布: 模型与算法[M]. 北京: 清华大学出版社, 2015: 7-16.
(Wu Yingjie.Privacy Preserving Data Publishing: Models and Algorithms [M]. Beijing: Tsinghua University Press, 2015: 7-16.)