A deduplication algorithm based on data similarity and delta encoding

  • Bin Song
  • , Limin Xiao*
  • , Guangjun Qin
  • , Li Ruan
  • , Shida Qiu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Satellite applications such as remote sensing application are overwhelmed with vast quantities of data. Nevertheless, the storage resources in the satellite are so limited that it should be used more efficient. The similarity between the remote sensing data is high, but the dissimilar parts of the data distribute irregularly. When using the traditional deduplication algorithm to split the file into chunks, a large amount of chunks are exactly similar but not the same, which results in the bad effect of data deduplication. We propose a deduplication algorithm based on data similarity and delta encoding to reduce the usage of storage resources. The data similarity analysis can find out the similar data. The delta encoding technology can reduce the usage of storage resources. Through experiments on remote sensing application data, we have achieved deduplication ratios up to 30:1, and analyzed how the chunksize affect the experiment results.

Original languageEnglish
Title of host publicationGeo-Spatial Knowledge and Intelligence - 4th International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem, GRMSE 2016, Revised Selected Papers
EditorsHanning Yuan, Jing Geng, Fuling Bian
PublisherSpringer Verlag
Pages245-253
Number of pages9
ISBN (Print)9789811039683
DOIs
StatePublished - 2017
Event4th International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem, GRMSE 2016 - Kowloon, Hong Kong SAR
Duration: 18 Nov 201620 Nov 2016

Publication series

NameCommunications in Computer and Information Science
Volume699
ISSN (Print)1865-0929

Conference

Conference4th International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem, GRMSE 2016
Country/TerritoryHong Kong SAR
City Kowloon
Period18/11/1620/11/16

Keywords

  • Deduplication
  • Delta encoding
  • Satellite
  • Similarity

Fingerprint

Dive into the research topics of 'A deduplication algorithm based on data similarity and delta encoding'. Together they form a unique fingerprint.

Cite this