Please wait a minute...
Data Analysis and Knowledge Discovery  0, Vol. Issue (): 1-    DOI: 10.11925/infotech.2096-3467. 2020.0952
Current Issue | Archive | Adv Search |
k-Anonymity Algorithm of multi-branch-tree Forest Based on Recognition Rate
Chen Xianlai,Luo Xiao,Liu Li,Li Zhongmin,An Ying
(Big Data Institute, Central South University, Changsha 410083, China)
(Life Science College, Central South University, Changsha 410083 , China)
(National Engineering Laboratory for Medical Big Data Application Technology, Central South University, Changsha 410083, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] To improve the efficiency of algorithms and the quality of published data in k- anonymity models and to reduce the information loss caused by anonymity.

[Methods] Based on the recognition rate and multi-branch-tree forest, a k-anonymous algorithm (named as MFBRR) was designed. It traversed the data from bottom to top according to the properties of the generalization tree, calculated the recognition rate, and selected the target leaf node to prune the tree to reduce the information loss of the anonymous data. Then, the algorithm was further improved by using parallel computing and multi-thread processing and the algorithm MFBRR-γ was proposed, which further improved the efficiency of the algorithm. The algorithms were evaluated with hierarchical precision and operation time through experiments.

[Results]The algorithms were tested using the data “adult” . The hierarchical precision of MFBRR was 0.973 and that of MFBRR-γ was 0.905. When the size of data set was 30000,the MFBRR algorithm took 1457 minutes and MFBRR - γ 12.08 minutes (γ = 100). MFBRR had been applied to health care data and achieved good results with hierachical precision of 0.943.

[Limitations]Using two data set for research, data types may be considered incomplete.

[Conclusions]The proposed algorithm MFBRR and its improved MFBRR-γ can realize the requirement of k-anonymity of data and reduce the information loss caused by anonymity. It can improve the quality of published data.

Key words k-anonymity algorithm      Data quality      Data publishing      Multi-branch-tree forest      Recognition rate      
Published: 11 November 2020
ZTFLH:  TP393  

Cite this article:

Chen Xianlai, Luo Xiao, Liu Li, Li Zhongmin, An Ying. k-Anonymity Algorithm of multi-branch-tree Forest Based on Recognition Rate . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467. 2020.0952     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y0/V/I/1

[1] Liu Huoyu, Wang Dongbo. Research and Implementation of Data Preprocessing Oriented to Paper Similarity Detection[J]. 现代图书情报技术, 2015, 31(5): 50-56.
[2] Wang Yuefen,Zhang Chengzhi,Zhang Beibei,Wu Tingting. A Survey of Data Cleaning[J]. 现代图书情报技术, 2007, 2(12): 50-56.
[3] Shi Xiaogang,Huang Tiejun. Auto Check of Digital Book’s Content and Structure[J]. 现代图书情报技术, 2005, 21(8): 23-26.
[4] Qin Feng,Tang Xiang,Duan Yongwei. The Study and Fulfill about Criterion of Key Word in Citation Indexes[J]. 现代图书情报技术, 2004, 20(4): 87-89.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn