An efficient and effective duplication detection method in large database applications

Zhang, Ji (2010) An efficient and effective duplication detection method in large database applications. In: 4th International Conference on Network and System Security (NSS 2010) , 1-3 Sep 2010, Melbourne, Australia.


In this paper, we developed a robust data cleaning
technique, called PC-Filter+ (PC stands for partition
comparison) based on its predecessor, for effective and efficient duplicate record detection in large databases. PC-Filter+ provides more flexible algorithmic options for constructing the Partition Comparison Graph (PCG). In addition, PC-Filter+ is able to deal with duplicate detection under different memory constraints.

Statistics for USQ ePrint 18208
Statistics for this ePrint Item
Item Type: Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Refereed: Yes
Item Status: Live Archive
Additional Information: © 2010 IEEE. Permanent restricted access to published version, due to publisher copyright restrictions.
Faculty / Department / School: Historic - Faculty of Sciences - Department of Maths and Computing
Date Deposited: 16 Feb 2011 01:53
Last Modified: 22 Feb 2015 23:34
Uncontrolled Keywords: database management; duplicate detection; quality control
Fields of Research : 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
15 Commerce, Management, Tourism and Services > 1503 Business and Management > 150307 Innovation and Technology Management
08 Information and Computing Sciences > 0803 Computer Software > 080309 Software Engineering
Socio-Economic Objective: E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences
Identification Number or DOI: 10.1109/NSS.2010.78

Actions (login required)

View Item Archive Repository Staff Only