On memory and I/O efficient duplication detection for multiple self-clean data sources

Zhang, Ji and Shu, Yanfeng and Wang, Hua (2010) On memory and I/O efficient duplication detection for multiple self-clean data sources. In: DASFAA 2010: 15th International Conference on Database Systems for Advanced Applications , 1-4 Apr 2010, Tsukuba, Japan.

Metadata

HTML CitationEndNoteDublin CoreReference Manager

Full text available as:

[img]
Preview
PDF (Documentation) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
279Kb

Official URL: http://www.springerlink.com/content/v33x10q6635244p3/fulltext.pdf

Identification Number or DOI: doi: 10.1007/978-3-642-14589-6_14

Abstract

In this paper, we propose efficient algorithms for duplicate detection from multiple data sources that are themselves duplicate-free. When developing these algorithms, we take the full consideration of various possible cases given the workload of data sources to be cleaned and the available memory. These algorithms are memory and I/O efficient, being able to reduce the number of pair-wise record comparison and minimize the total page access cost involved in the cleaning process. Experimental evaluation demonstrates that the algorithms we propose are efficient and are able to achieve better performance than SNM and random access methods.

Item Type:Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Additional Information:Author version not held. Published version unable to be displayed. Series Name: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): volume 6193
Uncontrolled Keywords:access cost; cleaning process; data source; duplicate detection; duplication detection; efficient algorithm; experimental evaluation; multiple data sources; random access
Fields of Research (FOR2008):08 Information and Computing Sciences > 0806 Information Systems > 080604 Database Management
08 Information and Computing Sciences > 0803 Computer Software > 080303 Computer System Security
08 Information and Computing Sciences > 0803 Computer Software > 080309 Software Engineering
Subjects:UNSPECIFIED
Socio-Economic Objective (SEO2008):E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences
ID Code:8485
Deposited By:
Deposited On:20 Oct 2010 10:24
Last Modified:17 Feb 2012 14:31

Archive Staff Only: edit this record