An efficient and effective duplication detection method in large database applications

Zhang, Ji (2010) An efficient and effective duplication detection method in large database applications. In: 4th International Conference on Network and System Security (NSS 2010) , 1-3 Sep 2010, Melbourne, Australia.

Abstract

In this paper, we developed a robust data cleaning
technique, called PC-Filter+ (PC stands for partition
comparison) based on its predecessor, for effective and efficient duplicate record detection in large databases. PC-Filter+ provides more flexible algorithmic options for constructing the Partition Comparison Graph (PCG). In addition, PC-Filter+ is able to deal with duplicate detection under different memory constraints.


Statistics for USQ ePrint 18208
Statistics for this ePrint Item
Item Type: Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Refereed: Yes
Item Status: Live Archive
Additional Information: © 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Faculty/School / Institute/Centre: Historic - Faculty of Sciences - Department of Maths and Computing (Up to 30 June 2013)
Faculty/School / Institute/Centre: Historic - Faculty of Sciences - Department of Maths and Computing (Up to 30 June 2013)
Date Deposited: 16 Feb 2011 01:53
Last Modified: 22 Feb 2015 23:34
Uncontrolled Keywords: database management; duplicate detection; quality control
Fields of Research : 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
15 Commerce, Management, Tourism and Services > 1503 Business and Management > 150307 Innovation and Technology Management
08 Information and Computing Sciences > 0803 Computer Software > 080309 Software Engineering
Socio-Economic Objective: E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences
Identification Number or DOI: 10.1109/NSS.2010.78
URI: http://eprints.usq.edu.au/id/eprint/18208

Actions (login required)

View Item Archive Repository Staff Only