A novel approach to data deduplication over the engineering-oriented cloud systems

Sun, Zhe and Shen, Jun and Yong, Jianming (2013) A novel approach to data deduplication over the engineering-oriented cloud systems. Integrated Computer Aided Engineering, 20 (1). pp. 45-57. ISSN 1069-2509

[img] Text (Submitted Version)
DeduplicationForICAE(revised)V4[1].doc

Download (767Kb)
[img]
Preview
Text (Published Version)
Sun_Shjen_Yong_ICAE_v20n1_PV.pdf

Download (512Kb) | Preview

Abstract

This paper presents a duplication-less storage system over the engineering-oriented cloud computing platforms. Our deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as back-end. Hadoop distributed file system (HDFS) is a common distribution file system on the cloud, which is used with Hadoop database (HBase). We use HDFS to build up a mass storage system and employ HBase to build up a fast indexing system. With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment. The simulation results demonstrate that our deduplication storage system is sufficiently accurate and efficient for distributed and cooperative data intensive engineering applications.


Statistics for USQ ePrint 23556
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: © 2013 - IOS Press and the author(s). Published paper available from Research Online which is the open access institutional repository for the University of Wollongong http://ro.uow.edu.au/infopapers/2543.
Faculty / Department / School: Historic - Faculty of Business and Law - School of Information Systems
Date Deposited: 30 Jul 2013 01:56
Last Modified: 09 Feb 2015 02:08
Uncontrolled Keywords: cloud storage; data deduplication; Hadoop database; Hadoop distributed file system
Fields of Research : 21 History and Archaeology > 2102 Curatorial and Related Studies > 210201 Archival, Repository and Related Studies
08 Information and Computing Sciences > 0806 Information Systems > 080607 Information Engineering and Theory
08 Information and Computing Sciences > 0803 Computer Software > 080309 Software Engineering
Socio-Economic Objective: B Economic Development > 89 Information and Communication Services > 8903 Information Services > 890301 Electronic Information Storage and Retrieval Services
Identification Number or DOI: 10.3233/ICA-120418
URI: http://eprints.usq.edu.au/id/eprint/23556

Actions (login required)

View Item Archive Repository Staff Only