CMPE 252A : Computer Networks

Slides:



Advertisements
Similar presentations
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Advertisements

Henry C. H. Chen and Patrick P. C. Lee
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Lecturer: Ghadah Aldehim
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Networking Objectives Understand what the following policies will contain – Disaster recovery – Backup – Archiving – Acceptable use – failover.
A Case for Redundant Arrays of Inexpensive Disks (RAID) -1988
Remote Backup Systems.
Primary-Backup Replication
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
Section 7 Erasure Coding Overview
Web Caching? Web Caching:.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Raymond Leung and Jack Y.B. Lee Department of Information.
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Internet Networking recitation #12
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
RAID RAID Mukesh N Tekwani
I. Basic Network Concepts
Distributed Systems CS
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Outline Announcements Fault Tolerance.
Outline Midterm results summary Distributed file systems – continued
EECS 498 Introduction to Distributed Systems Fall 2017
Static Routing 1st semester
The Google File System (GFS)
Overview Continuation from Monday (File system implementation)
A Redundant Global Storage Architecture
Consistency and Replication
IT351: Mobile & Wireless Computing
The Google File System (GFS)
Switching Techniques.
RAID Redundant Array of Inexpensive (Independent) Disks
The Google File System (GFS)
The Google File System (GFS)
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Specialized Cloud Architectures
Chapter 2: The Linux System Part 5
13.3 Accelerating Access to Secondary Storage
Distributed Systems CS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
RAID RAID Mukesh N Tekwani April 23, 2019
Remote Backup Systems.
Static Routing 2nd semester
The Google File System (GFS)
Presentation transcript:

CMPE 252A : Computer Networks Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 15

Midterm Exam next week Mainly focus on Lectures 2, 3, 4, 5, 7, 9. Also covers the very basic material of other lectures. One-side paper allowed.

Forward Error Correction using Erasure Codes L. Rizzo, “Effective Erasure Codes for Reliable Computer Communication Protocols,” ACM SIGCOMM Computer Communication Review, April 1997

Erasure Codes Erasures are missing packets in a stream (n, k) code Uncorrectable errors at the link layer Losses at congested routers (n, k) code k blocks of source data are encoded to n blocks of encoded data, such that the source data can be reconstructed from any subset of k encoded blocks each block is a data item which can be operated on with arithmetic operations

Erasure Codes Erasures are missing packets in a stream (n, k) code Uncorrectable errors at the link layer Losses at congested routers (n, k) code k blocks of source data are encoded to n blocks of encoded data, such that the source data can be reconstructed from any subset of k encoded blocks each block is a data item which can be operated on with arithmetic operations

k fixed-length packets

Applications of Erasure Codes When you don’t know which packet will be lost Particularly good for large-scale multicast of long files or packet flows Different packets are missing at different receivers – the same redundant packet(s) can be used by (almost) all receivers with missing packets

Erasure Codes Two data items, x and y, with the same length x⊕y (exclusive or) is in a same length of x or y. If you know any two of (x, y, x⊕y ), you can recover x and y.

An Ensemble of Replication and Erasure Codes for CFS Ma, Nandagopal, Puttaswamy, Banerjee

Amazon S3

a single failure a DC is completely inaccessible to the outside world. In reality, data stored in a particular DC is internally replicated (on different networks in the same DC) to tolerate equipment failures.

k-resiliency data is accessible even in the event of simultaneous failures of k different duplicates.

Cloud Amazon S3 and Google FS provide resiliency against two failures, using AllReplica model. a total of three copies of each data item are stored To tolerate two simultaneous failures

AllReplica Given any k, store k+1 full copies of each file in the system. One copy is stored in a primary DC and k other copies are stored in k backup DCs. The primary DC serves read and write requests directly. Upon write operations, k backup DCs are updated on file close.

Cloud To reduce storage cost, cloud file systems (CFSs) are transitioning from replication to erasure codes Using allreplica, we need 6 data items for x and y to reach 2-resiliency How about x, x, x⊕y, y, y.

Erasure code-Cloud An overview of our approach for cloud file systems. A file is divided into m data chunks and k redundant chunks using an erasure code. In this example, m = 3 and k = 2. Each chunk is stored in a different data center, which is called a backup data center. A copy of the whole file is cached temporarily in a primary data center upon write requests, in order to serve subsequent read and write requests. Upon a read request, the requested blocks can also be cached to serve subsequent reads of the same blocks.

Allcode divide an object into multiple data chunks (say m data chunks), and use encoding schemes to produce k redundant chunks . both data chunks and redundant chunks, are stored in different locations . Only m out of m + k chunks are necessary to reconstruct the object.

AllCode Upon a read request, if a full copy of the file has already been reconstructed after file-open, the primary DC serves the read directly. Otherwise, the primary DC contacts other backup DCs to get requested data and then it serves read.

AllCode For the first write operation after file-open, the primary DC contacts other m−1 backup DCs to reconstruct the whole file while it performs write on the file. The full copy is kept until file-close and subsequent read/write can be served directly from this copy until file closes. The primary DC updates the modified data chunks and all the redundant chunks on file-close

AllCode efficient because instead of storing whole copies, it stores chunks and thus reduces the overall cost of storage. requires reconstructing the entire object and updating all the redundant chunks upon write requests.

Allcode When erasure codes are applied to CFSs , they lead to significantly high access latencies and large bandwidth overhead between data centers.

Goal provide an access latency close to that of the AllReplica scheme. and storage effectiveness close to that of the AllCode scheme.

DESIGN OVERVIEW Consistency Data accesses

Consistency replication-on-close consistency model. as soon as a file is closed after a write, the next file open request will see the updated version of the file. Very Hard in Cloud A key issue that needs to be considered in a CFS is the type of consistency. As described in Section I, we seek to achieve the same level of consistency provided in current cloud services,

Consistency In a cloud, this requires that the written parts of the file be updated before any other new file open requests are processed. May cause large delays.

Assumption The authors assume that all requests are sent to a primary DC that is responsible for that file. All file operations are essentially serialized via the primary DC, thus assuring such consistency.

CAROM Cache A Replica On Modification uses a block-level cache at each data center. “Cache” here refers to local file accesses that can be in a hard-disk or even DRAM at a local DC. Each DC acts as the primary DC for a set of files in the CFS. When files are being accessed, accesses are sent to this primary DC. Any block accessed for the first time is immediately cached.

Observation the same block is often accessed in a very short span of time. caching reduce access latency of successive operations on the same data block.

Read request For a read request, retrieve requested data blocks from data chunks and cache them in the primary DC. If a subsequent read request arrives for those cached blocks, then use the cached data to serve this request, without retrieving from the data chunks.

Write When a write is performed, two operations in parallel: 1.) the written blocks are stored in the cache of the primary DC and the write is acknowledged to the user. 2.) reconstruct the entire file from any m out of m+k chunks and update it in cache.

File-close redundant chunks are recalculated using erasure code and the updated chunks are written across the other DCs. However, still keep the entire file in cache even after the update is complete, until Cache is full. Because the same file can be accessed again

Cache size

Hybrid store one complete primary copy of the file, and the backup copies are split using a (m + k,m) erasure code. the primary replica answer the read/write queries from users. The chunks are mainly used to provide resiliency. Upon write requests, all modified data chunks plus redundant chunks are updated on file-close as in AllCode.

Summary Storage: save cost up to 60% over replication schemes. bandwidth : save cost up to 43% over erasure code based schemes

Trace details Use CIFS network trace to collect from two larger-scale, enterprise-class file servers deployed in the NetApp. One file server is deployed in the corporate DC that hosts data used by over 1000 marketing, sales, and finance employees, another is deployed in the engineering DC used by over 500 engineering employees .

Table total of 3.8 million unique files representing over 35 TB of data

Storage-time For k = 2, CAROM saves more than 50% (53%) in storagetime compared to AllReplica, and more than 40% (42%) compared to Hybrid. The percentage of savings compared to AllReplica is as high as 60% for k = 3.

Storage-time

Bandwidth cost

Bandwidth cost bandwidth cost of CAROM is only slightly higher than that of Hybrid, which indicates that most read/write requests are served by cached copies. This proves the effectiveness of caching in CAROM.

Monetary cost Cost: (1) the cost of storing the data for a month in terms of GBs. (2) the cost of I/O operation, and the cost of the bandwidth while transferring data in and out of the data centers.

Cost

Cache size & Cost

Cache size & Cost

Access latency hit : data is available at the local disk in the primary DC. miss : data has to be fetched from a backup DC (A, B) denote local disk access latency (A) and remote disk access latency (B) in units of milli-seconds,

Access latency CAROM provides read access latencies that are close to those of AllReplica (local disk access latencies) , rather than those of AllCode (remote disk access latencies). (m = 5, CACHE SIZE IN CAROM IS 2GB FOR CORP AND 5G FOR ENG

End 1/1/2019