|
|
k-Anonymity Algorithm of Multi-Branch-Tree Forest Based on Recognition Rate |
Chen Xianlai1,2(),Luo Xiao3,Liu Li3,Li Zhongmin3,An Ying1,2 |
1Big Data Institute, Central South University, Changsha 410083, China 2National Engineering Laboratory for Medical Big Data Application Technology,Central South University, Changsha 410083, China 3Life Science College, Central South University, Changsha 410083, China |
|
|
Abstract [Objective] This paper tries to improve the efficiency of k-anonymity algorithm and the quality of published data. [Methods] Based on the recognition rates and multi-branch-tree forest structure, we designed a new k-anonymous algorithm (MFBRR). It conducted bottom-up reviews of data according to properties of the generalization tree, and calculated the recognition rates. Then, we selected the target leaf nodes to prune the tree, which reduced the information loss. Finally, the MFBRR-γ algorithm was proposed based on parallel computing and multi-thread processing. [Results] We evaluated our algorithms with hierarchical precision and operation time using the “Adult” data sets. The hierarchical precisions of MFBRR and MFBRR-γ were 0.97 and 0.88. It took the MFBRR and MFBRR-γ algorithms 1 457 minutes and 12.08 minutes (γ=100) to process 30,000 data sets. The MFBRR algorithm achieved hierarchical precision of 0.93 with health care data. [Limitations] We only examined our models with two data sets. [Conclusions] The proposed algorithms could reduce the information loss due to anonymity and improve the quality of published data.
|
Received: 27 September 2020
Published: 25 December 2020
|
|
Corresponding Authors:
Chen Xianlai
E-mail: chenxianlai@csu.edu.cn
|
[1] |
牛晨晨, 周畅, 张昪 . 大数据背景下的个人隐私保护研究[J]. 西安航空学院学报, 2017,35(1):73-76.
|
[1] |
( Niu Chenchen, Zhou Chang, Zhang Bian . Research on Personal Privacy Protection Under Big Data Background[J]. Journal of Xi’an Aeronautical University, 2017,35(1):73-76.)
|
[2] |
王爽, 尹聪颖 . 健康医疗大数据时代的隐私保护探析[J]. 医学信息学杂志, 2019,40(1):2-5.
|
[2] |
( Wang Shuang, Yin Congying . Discussion and Analysis on the Privacy Protection in the Age of Big Data in Healthcare[J]. Journal of Medical Informatics, 2019,40(1):2-5.)
|
[3] |
Puri V, Sachdeva S, Kaur P . Privacy Preserving Publication of Relational and Transaction Data: Survey on the Anonymization of Patient Data[J]. Computer Science Review, 2019,32:45-61.
|
[4] |
王平水, 王建东 . 匿名化隐私保护技术研究综述[J]. 小型微型计算机系统, 2011,32(2):248-252.
|
[4] |
( Wang Pingshui, Wang Jiandong . Survey of Research on Anonymization Privacy-Preserving Techniques[J]. Journal of Chinese Computer Systems, 2011,32(2):248-252.)
|
[5] |
Sweeney L . K-anonymity: A Model for Protecting Privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002,10(5):557-570.
|
[6] |
Machanavajjhala A, Kifer D, Gehrke J , et al. l-diversity: Privacy Beyond k-anonymity[J]. ACM Transactions on Knowledge Discovery from Data, 2007,1(1):1-12.
|
[7] |
Li N H, Li T C, Venkatasubramanian S. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity [C]//Proceedings of IEEE 23rd International Conference on Data Engineering. 2007: 106-115.
|
[8] |
方跃坚, 朱锦钟, 周文 , 等. 数据挖掘隐私保护算法研究综述[J]. 信息网络安全, 2017(2):6-11.
|
[8] |
( Fang Yuejian, Zhu Jinzhong, Zhou Wen , et al. A Survey on Data Mining Privacy Protection Algorithms[J]. Netinfo Security, 2017(2):6-11.)
|
[9] |
王静, 闫仁武, 刘亚梅 . 多敏感属性K-匿名模型的实现[J]. 计算机与数字工程, 2017,45(7):1368-1372.
|
[9] |
( Wang Jing, Yan Renwu, Liu Yamei . Implementation of K-anonymous Model with Multi-sensitive Attributes[J]. Computer & Digital Engineering, 2017,45(7):1368-1372.)
|
[10] |
Murakami K, Uno T . Optimization Algorithm for k-anonymization of Datasets with Low Information Loss[J]. International Journal of Information Security, 2017,17(6):631-644.
|
[11] |
Fung B C M, Wang K, Yu P S. Top-Down Specialization for Information and Privacy Preservation [C]//Proceedings of the 21st International Conference on Data Engineering. 2005: 205-216.
|
[12] |
Meyerson A, Williams R. On the Complexity of Optimal K-anonymity [C]//Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2004: 223-228.
|
[13] |
Aggarwal G, Feder T, Kenthapadi K, et al. Anonymizing Tables [C]//Proceedings of the 10th International Conference on Database Theory. 2005: 246-258.
|
[14] |
Domingo-Ferrer J, Torra V . Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation[J]. Data Mining and Knowledge Discovery, 2005,11(2):195-212.
|
[15] |
Salari M, Jalili S, Mortazavi R . TBM, a Transformation Based Method for Microaggregation of Large Volume Mixed Data[J]. Data Mining and Knowledge Discovery, 2017,31(1):65-91.
|
[16] |
Monedero D R, Mezher A M, Colome X C , et al. Efficient k-anonymous Microaggregation of Multivariate Numerical Data via Principal Component Analysis[J]. Information Sciences, 2019,503:417-443.
doi: 10.1016/j.ins.2019.07.042
|
[17] |
Abidi B, Ben Y S, Perera C . Hybrid Microaggregation for Privacy Preserving Data Mining[J]. Journal of Ambient Intelligence and Humanized Computing, 2020,11(1):23-38.
|
[18] |
国家市场监督管理总局, 国家标准化管理委员会: 金涛, 谢安明, 陈星 , 等. GB/T 37964-2019. 信息安全技术个人信息去标识化指南[S]. 2019.
|
[18] |
( State Administration for Market Regulation, Standardization Administration: Jin Tao, Xie Anming, Chen Xing , et al. GB/T 37964-2019. Information Security Technology—Guide for De-Identifying Personal Information[S]. 2019.)
|
[19] |
Mortazavi R, Erfani S H . GRAM: An Efficient (k, l) Graph Anonymization Method[J]. Expert Systems with Applications, 2020,153:113454.
|
[20] |
Wang Q F, Zhu G, Wang C B , et al. Research on Privacy-Preserving Methods of Electronic Medical Records[J]. Journal of Physics: Conference Series, 2019,1176(2):22-29.
|
[21] |
Sweeney L. Datafly: A System for Providing Anonymity in Medical Data [C]// Proceedings of the IFIP TC11 WG11.3 11th International Conference on Database Securty XI: Status and Prospects. 1997: 356-381.
|
[22] |
宋明秋, 王琳, 姜宝彦 , 等. 多属性泛化的K-匿名算法[J]. 电子科技大学学报, 2017,46(6):896-901.
|
[22] |
( Song Mingqiu, Wang Lin, Jiang Baoyan , et al. K-Anonymity Algorithm Based on Multi Attribute Generalization[J]. Journal of University of Electronic Science and Technology of China, 2017,46(6):896-901.)
|
[23] |
谷勇浩, 郭振洋, 刘威歆 . 匿名化隐私保护技术性能评估方法研究[J]. 信息安全研究, 2019,5(4):293-297.
|
[23] |
( Gu Yonghao, Guo Zhenyang, Liu Weixin . Research on Performance Evaluation Method of Anonymization Privacy Preservation Technologies[J]. Journal of Information Security Research, 2019,5(4):293-297.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|