ERMS: An elastic replication management system for HDFS

  • Zhendong Cheng*
  • , Zhongzhi Luan
  • , You Meng
  • , Yijing Xu
  • , Depei Qian
  • , Alain Roy
  • , Ning Zhang
  • , Gang Guan
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Hadoop Distributed File System (HDFS) is a distributed storage system that stores large-scale data sets reliably and streams those data sets to applications at high bandwidth. HDFS provides high performance, reliability and availability by replicating data, typically three copies of every data. The data in HDFS changes in popularity over time. To get better performance and higher disk utilization, the replication policy of HDFS should be elastic and adapt to data popularity. In this paper, we describe ERMS, an elastic replication management system for HDFS. ERMS provides an active/standby storage model for HDFS. It utilizes a complex event processing engine to distinguish real-time data types, and then dynamically increases extra replicas for hot data, cleans up these extra replicas when the data cool down, and uses erasure codes for cold data. ERMS also introduces a replica placement strategy for the extra replicas of hot data and erasure coding parities. The experiments show that ERMS effectively improves the reliability and performance of HDFS and reduce storage overhead.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
PublisherIEEE Computer Society
Pages32-40
Number of pages9
ISBN (Print)9780768548449
DOIs
StatePublished - 2012
Event2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012 - Beijing, China
Duration: 24 Sep 201228 Sep 2012

Publication series

NameProceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012

Conference

Conference2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
Country/TerritoryChina
CityBeijing
Period24/09/1228/09/12

Keywords

  • Elastic
  • HDFS
  • Replication Management

Fingerprint

Dive into the research topics of 'ERMS: An elastic replication management system for HDFS'. Together they form a unique fingerprint.

Cite this