1Big Data Institute, Central South University, Changsha 410083, China 2National Engineering Laboratory for Medical Big Data Application Technology,Central South University, Changsha 410083, China 3Life Science College, Central South University, Changsha 410083, China
[Objective] This paper tries to improve the efficiency of k-anonymity algorithm and the quality of published data. [Methods] Based on the recognition rates and multi-branch-tree forest structure, we designed a new k-anonymous algorithm (MFBRR). It conducted bottom-up reviews of data according to properties of the generalization tree, and calculated the recognition rates. Then, we selected the target leaf nodes to prune the tree, which reduced the information loss. Finally, the MFBRR-γ algorithm was proposed based on parallel computing and multi-thread processing. [Results] We evaluated our algorithms with hierarchical precision and operation time using the “Adult” data sets. The hierarchical precisions of MFBRR and MFBRR-γ were 0.97 and 0.88. It took the MFBRR and MFBRR-γ algorithms 1 457 minutes and 12.08 minutes (γ=100) to process 30,000 data sets. The MFBRR algorithm achieved hierarchical precision of 0.93 with health care data. [Limitations] We only examined our models with two data sets. [Conclusions] The proposed algorithms could reduce the information loss due to anonymity and improve the quality of published data.
陈先来, 罗霄, 刘莉, 李忠民, 安莹. 基于识别率的多叉树森林k-匿名算法*[J]. 数据分析与知识发现, 2020, 4(12): 14-25.
Chen Xianlai, Luo Xiao, Liu Li, Li Zhongmin, An Ying. k-Anonymity Algorithm of Multi-Branch-Tree Forest Based on Recognition Rate. Data Analysis and Knowledge Discovery, 2020, 4(12): 14-25.
( Niu Chenchen, Zhou Chang, Zhang Bian . Research on Personal Privacy Protection Under Big Data Background[J]. Journal of Xi’an Aeronautical University, 2017,35(1):73-76.)
( Wang Shuang, Yin Congying . Discussion and Analysis on the Privacy Protection in the Age of Big Data in Healthcare[J]. Journal of Medical Informatics, 2019,40(1):2-5.)
[3]
Puri V, Sachdeva S, Kaur P . Privacy Preserving Publication of Relational and Transaction Data: Survey on the Anonymization of Patient Data[J]. Computer Science Review, 2019,32:45-61.
( Wang Pingshui, Wang Jiandong . Survey of Research on Anonymization Privacy-Preserving Techniques[J]. Journal of Chinese Computer Systems, 2011,32(2):248-252.)
[5]
Sweeney L . K-anonymity: A Model for Protecting Privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002,10(5):557-570.
[6]
Machanavajjhala A, Kifer D, Gehrke J , et al. l-diversity: Privacy Beyond k-anonymity[J]. ACM Transactions on Knowledge Discovery from Data, 2007,1(1):1-12.
[7]
Li N H, Li T C, Venkatasubramanian S. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity [C]//Proceedings of IEEE 23rd International Conference on Data Engineering. 2007: 106-115.
( Wang Jing, Yan Renwu, Liu Yamei . Implementation of K-anonymous Model with Multi-sensitive Attributes[J]. Computer & Digital Engineering, 2017,45(7):1368-1372.)
[10]
Murakami K, Uno T . Optimization Algorithm for k-anonymization of Datasets with Low Information Loss[J]. International Journal of Information Security, 2017,17(6):631-644.
[11]
Fung B C M, Wang K, Yu P S. Top-Down Specialization for Information and Privacy Preservation [C]//Proceedings of the 21st International Conference on Data Engineering. 2005: 205-216.
[12]
Meyerson A, Williams R. On the Complexity of Optimal K-anonymity [C]//Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2004: 223-228.
[13]
Aggarwal G, Feder T, Kenthapadi K, et al. Anonymizing Tables [C]//Proceedings of the 10th International Conference on Database Theory. 2005: 246-258.
[14]
Domingo-Ferrer J, Torra V . Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation[J]. Data Mining and Knowledge Discovery, 2005,11(2):195-212.
[15]
Salari M, Jalili S, Mortazavi R . TBM, a Transformation Based Method for Microaggregation of Large Volume Mixed Data[J]. Data Mining and Knowledge Discovery, 2017,31(1):65-91.
[16]
Monedero D R, Mezher A M, Colome X C , et al. Efficient k-anonymous Microaggregation of Multivariate Numerical Data via Principal Component Analysis[J]. Information Sciences, 2019,503:417-443.
doi: 10.1016/j.ins.2019.07.042
[17]
Abidi B, Ben Y S, Perera C . Hybrid Microaggregation for Privacy Preserving Data Mining[J]. Journal of Ambient Intelligence and Humanized Computing, 2020,11(1):23-38.
( State Administration for Market Regulation, Standardization Administration: Jin Tao, Xie Anming, Chen Xing , et al. GB/T 37964-2019. Information Security Technology—Guide for De-Identifying Personal Information[S]. 2019.)
[19]
Mortazavi R, Erfani S H . GRAM: An Efficient (k, l) Graph Anonymization Method[J]. Expert Systems with Applications, 2020,153:113454.
[20]
Wang Q F, Zhu G, Wang C B , et al. Research on Privacy-Preserving Methods of Electronic Medical Records[J]. Journal of Physics: Conference Series, 2019,1176(2):22-29.
[21]
Sweeney L. Datafly: A System for Providing Anonymity in Medical Data [C]// Proceedings of the IFIP TC11 WG11.3 11th International Conference on Database Securty XI: Status and Prospects. 1997: 356-381.
( Song Mingqiu, Wang Lin, Jiang Baoyan , et al. K-Anonymity Algorithm Based on Multi Attribute Generalization[J]. Journal of University of Electronic Science and Technology of China, 2017,46(6):896-901.)
( Gu Yonghao, Guo Zhenyang, Liu Weixin . Research on Performance Evaluation Method of Anonymization Privacy Preservation Technologies[J]. Journal of Information Security Research, 2019,5(4):293-297.)