Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Providing Fault-tolerance for Parallel Programs on Grid (FT-MPICH) Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University.
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
Henry C. H. Chen and Patrick P. C. Lee
Chapter 19: Network Management Business Data Communications, 5e.
Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Chapter 9 Designing Systems for Diverse Environments.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Chapter 3 : Distributed Data Processing
Reusability and Portability Chapter 8 CSCI Reusability and Portability  The length of the development process is critical.  No matter how high.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
An ide for teaching and learning prolog
Data Structures and Programming.  John Edgar2.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
GDC Workshop Session 1 - Storage 2003/11. Agenda NAS Quick installation (15 min) Major functions demo (30 min) System recovery (10 min) Disassembly (20.
Team Members Lora zalmover Roni Brodsky Academic Advisor Professional Advisors Dr. Natalya Vanetik Prof. Shlomi Dolev Dr. Guy Tel-Zur.
Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.
Redundant Array of Independent Disks
Module 9 Review Questions 1. The ability for a system to continue when a hardware failure occurs is A. Failure tolerance B. Hardware tolerance C. Fault.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Computing in the RAIN: A Reliable Array of Independent Nodes Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel.
TRƯỜNG ĐẠI HỌC CÔNG NGHỆ Bộ môn Mạng và Truyền Thông Máy Tính.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Chapter 1 In-lab Quiz Next week
Cluster Reliability Project ISIS Vanderbilt University.
Components of Database Management System
HA-OSCAR Chuka Okoye Himanshu Chhetri. What is HA-OSCAR? “High Availability Open Source Cluster Application Resources”
Chapter © 2006 The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/ Irwin Chapter 7 IT INFRASTRUCTURES Business-Driven Technologies 7.
.Sense A Secure Framework for Sensor Network Data Acquisition, Monitoring and Command Screenshots We present.Sense, an end-to-end security framework for.
Distributed systems A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility –A well.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
OSIsoft High Availability PI Replication
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
Author(s) Politehnica University of Bucharest Automatic Control and Computers Faculty Computer Science Department Robocheck – Integrated Code Validation.
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
Coding and Algorithms for Memories Lecture 13 1.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
RobuSTore: Performance Isolation for Distributed Storage and Parallel Disk Arrays Justin Burke, Huaxia Xia, and Andrew A. Chien Department of Computer.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Cloud Computing Vs RAID Group 21 Fangfei Li John Soh Course: CSCI4707.
Seminar On Rain Technology
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
Pouya Ostovari and Jie Wu Computer & Information Sciences
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
File system: Ceph Felipe León fi Computing, Clusters, Grids & Clouds Professor Andrey Y. Shevel ITMO University.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
Enterprise Storage at the Institute for Advanced Study
Self Healing and Dynamic Construction Framework:
Chapter 2: The Linux System Part 1
Fault Tolerance Distributed Web-based Systems
UNIT IV RAID.
Chapter 2: Operating-System Structures
Mark McKelvin EE249 Embedded System Design December 03, 2002
Chapter 2: Operating-System Structures
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University Motivation HyFS Architecture HyFS Components File System Interface (FUSE) FUSE is a user space file system, which is a good tool to quickly develop a file system prototype File Operation Lib (fopen, fread, fwrite, fseek, fclose) This library supports fault tolerance ability for a single file. It provides POSIX file operation API. It can be used independently from HyFS Erasure Codes Lib (encoding, decoding) All encoding/decoding functions related with erasure codes are implemented in this library. This library can be also used outside of HyFS Network File System (NFS) HyFS is a stackable file system. To be a highly distributed file system, HyFS is configured to run on NFS HyFS Features High Flexibility A general framework is designed to support any erasure codes, and applications decide which code to use Easy to configure POSIX File API POSIX file operation API is supported by a library which is independent from HyFS, and it can be readily used by applications File System Level HyFS is a Linux file system, which can be installed in any popular Linux system. It has been tested on Ubuntu and SUSE Erasure Codes Support Some erasure codes are so far supported for academic research Building highly available distributed file systems is crucial to any commercial or scientific application. HyFS is designed to achieve this goal by employing erasure codes to implement distributed file system. HyFS provides a general framework to support any erasure code to be used. Thus, by applying different erasure codes, HyFS offers high flexibility for customizations to meet various application requirements. High Availability Requirement Storage devices are not as high available as we expect In large data centers, hardware failure is a common thing Proper data redundancy is the key to provide high reliability, availability and survivability Existing Solutions Most current fault tolerance file systems use replication as the redundancy scheme, which suffers from: High cost of hardware purchase and maintenance Performance in writing data, multiple replication of the same data Future Work Performance Test Overhead of encoding and decoding Examine key factors for the HyFS performance Scalability Study HyFS scalability when deployed to a large network Support More Functions Latent error recovery: latent sector error correction decoding algorithms are to be included Data modification detection: error detection algorithms are to be integrated to check data integrity HyFS and its supported erasure codes HyFS Demo HyFS Ideas Erasure Codes and File System Novel use of erasure codes in file system to achieve high availability with affordable cost File data will be stored to multiple storage nodes by employing erasure codes (encoding process) File data can be constructed from some of these storage nodes (decoding process) Erasure Codes MDS Erasure Codes (n, k) A message has k bits, we store it as n bits by adding (n-k) redundancy bits. To recover the message, we only need to have ANY k bits among these n bits Erasure Codes in HyFS When saving a file, we store it to n storage nodes. When reading the file, we only need to have ANY k accessible storage nodes to recover the file, thus we can tolerate up to (n-k) storage node failures Demo Steps Data Preparation A static html file with its pictures files is copied to a HyFS file system. HyFS is configured to use EVENODD codes (5, 3) as the erasure code and five flash drives as the storage nodes Fault Tolerance Test 1:Startup HyFS file system in a command console 2:All five flash drives are connected. Launch Firefox to check the static html file stored in HyFS. It succeeds 3:Detach one flash drive, and open Firefox to check the html files availability in HyFS. It succeeds 4: Detach two flash drives, and open Firefox again to check the html files availability in HyFS. It still succeeds HyFSDevelopment Tips It is being developed in the NISL lab at Wayne State University It starts from scratch It is developed in C language to get the best performance It has 17,000 lines of code so far, and it is still under active development HyFS Architecture (1)(2) (3) (4)