New Technology of Library and Information Service  2007, Vol. 2 Issue (12): 50-56    DOI: 10.11925/infotech.1003-3513.2007.12.11
A Survey of Data Cleaning
Wang Yuefen1,2  Zhang Chengzhi1,2,3  Zhang Beibei1,2  Wu Tingting1,2
1(Department of Information Management, Nanjing University of Science & Technology, Nanjing 210094,China)
2(Laboratory for Enterprise Innovation Service, Nanjing University of Science & Technology, Nanjing 210094,China)
3(Institute of Scientific & Technical Information of China, Beijing 100038,China)
Data cleaning problem is surveyed in this paper. Firstly, the background of data cleaning problem and research status is explained. Then, the definition and objects of data cleaning are given. The basic principle and some models of data cleaning are presented. Related algorithms and tools are analyzed and evaluation methods of data cleaning are proposed. Finally, the future research topics and application related to data cleaning problems are discussed.

Key wordsData cleaning      Data quality      Duplicate record detect      Outlier data detect     
Received: 17 September 2007      Published: 25 December 2007


Corresponding Authors: Wang Yuefen     E-mail:
About author:: Wang Yuefen,Zhang Chengzhi,Zhang Beibei,Wu Tingting

Cite this article:

Wang Yuefen,Zhang Chengzhi,Zhang Beibei,Wu Tingting. A Survey of Data Cleaning. New Technology of Library and Information Service, 2007, 2(12): 50-56.

