Zhang, Ji and Shu, Yanfeng and Wang, Hua (2010) On memory and I/O efficient duplication detection for multiple self-clean data sources. In: DASFAA 2010: 15th International Conference on Database Systems for Advanced Applications , 1-4 Apr 2010, Tsukuba, Japan.
Metadata
| HTML Citation | EndNote | MODS | Dublin Core | Reference Manager |
Full text available as:
| PDF (Documentation) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader 279Kb |
Official URL: http://www.springerlink.com/content/v33x10q6635244p3/fulltext.pdf
Identification Number or DOI: doi: 10.1007/978-3-642-14589-6_14
Abstract
In this paper, we propose efficient algorithms for duplicate detection from multiple data sources that are themselves duplicate-free. When developing these algorithms, we take the full consideration of various possible cases given the workload of data sources to be cleaned and the available memory. These algorithms are memory and I/O efficient, being able to reduce the number of pair-wise record comparison and minimize the total page access cost involved in the cleaning process. Experimental evaluation demonstrates that the algorithms we propose are efficient and are able to achieve better performance than SNM and random access methods.
| Item Type: | Conference or Workshop Item (Commonwealth Reporting Category E) (Paper) |
|---|---|
| Additional Information: | Author version not held. Published version unable to be displayed. Series Name: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): volume 6193 |
| Uncontrolled Keywords: | access cost; cleaning process; data source; duplicate detection; duplication detection; efficient algorithm; experimental evaluation; multiple data sources; random access |
| Fields of Research (FOR2008): | 08 Information and Computing Sciences > 0806 Information Systems > 080604 Database Management 08 Information and Computing Sciences > 0803 Computer Software > 080303 Computer System Security 08 Information and Computing Sciences > 0803 Computer Software > 080309 Software Engineering |
| Subjects: | UNSPECIFIED |
| Socio-Economic Objective (SEO2008): | E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences |
| ID Code: | 8485 |
| Deposited By: | |
| Deposited On: | 20 Oct 2010 10:24 |
| Last Modified: | 17 Feb 2012 14:31 |
Archive Staff Only: edit this record
