Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance Hakim Weatherspoon Joint with Lakshmi Ganesh, Tudor.

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee.
Remus: High Availability via Asynchronous Virtual Machine Replication
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance - and - Message Logging: Pessimistic, Optimistic, Causal,
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance August 2008 Hakim Weatherspoon, Lakshmi Ganesh, Tudor.
Overview Distributed vs. decentralized Why distributed databases
Computer Science Storage Systems and Sensor Storage Research Overview.
Module – 12 Remote Replication
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
National Manager Database Services
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Proven Techniques for Maximizing Availability Maximum Availability Architecture Lawrence To, Shari Yamaguchi High Availability Systems Group Systems Technologies.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Disaster Recovery as a Cloud Service Chao Liu SUNY Buffalo Computer Science.
ETTM: A Scalable, Fault-tolerant Network Manager Colin Dixon Hardeep Uppal UMass Amherst Vjekoslav Brajkovic, Dane Brandon, Thomas Anderson and Arvind.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Network Support for Cloud Services Lixin Gao, UMass Amherst.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
CIS 725 Wireless networks. Low bandwidth High error rates.
J.H.Saltzer, D.P.Reed, C.C.Clark End-to-End Arguments in System Design Reading Group 19/11/03 Torsten Ackemann.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
DotHill Systems Data Management Services. Page 2 Agenda Why protect your data?  Causes of data loss  Hardware data protection  DMS data protection.
TCP Throughput Collapse in Cluster-based Storage Systems
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Page 1 of John Wong CTO Twin Peaks Software Inc. Mirror File System A Multiple Server File System.
Continuous Access Overview Damian McNamara Consultant.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
SQLintersection Session SQL37 SQL Server 2012 Availability Groups Aaron Bertrand
Obile etworking M-TCP : TCP for Mobile Cellular Networks Kevin Brown and Suresh Singh Department of Computer Science Univ. of South Carolina.
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), Lakshmi Ganesh (UT Austin), and Hakim Weatherspoon.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
1 Distributed Databases BUAD/American University Distributed Databases.
Net Optics Confidential and Proprietary 1 Bypass Switches Intelligent Access and Monitoring Architecture Solutions.
StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003.
OSIsoft High Availability PI Replication
1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.
High Availability in DB2 Nishant Sinha
© Logicalis Group HA options Cross Site Mirroring (XSM) and Orion Jonathan Woods.
Slingshot: Time-Critical Multicast for Clustered Applications Mahesh Balakrishnan Stefan Pleisch Ken Birman Cornell University.
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.
Database Laboratory Regular Seminar TaeHoon Kim Article.
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
Ivy: A Read/Write Peer-to- Peer File System Authors: Muthitacharoen Athicha, Robert Morris, Thomer M. Gil, and Benjie Chen Presented by Saurabh Jha 1.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
Determining BC/DR Methods
Operating System Reliability
Operating System Reliability
Real IBM C exam questions and answers
Operating System Reliability
Operating System Reliability
Cisco Prime NAM for WAN Optimization Deployment
Operating System Reliability
Operating System Reliability
Operating System Reliability
Presentation transcript:

Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance Hakim Weatherspoon Joint with Lakshmi Ganesh, Tudor Marian, Mahesh Balakrishnan, and Ken Birman TRUST Autumn 2008 Conference November 12 th, 2008

Critical Infrastructure Protection and Compliance  U.S. Department of Treasury Study Financial Sector vulnerable to significant data loss in disaster Need new technical options  Risks are real, technology available, Why is problem not solved?

Bold New (Distributed Systems) World…  High performance applications coordinate over vast distances Systems coordinate to cache, mirror, tolerate disaster, etc. Enabled by cheap storage and cpus, commodity datacenters, and plentiful bandwidth  Scale is amazing, but tradeoffs are old ones

Mirroring and speed of light dilemma…  Want asynchronous performance to local data center  And want synchronous guarantee Primary site Remote mirror async sync Conundrum: there is no middle ground

Mirroring and speed of light dilemma…  Want asynchronous performance to local data center  And want synchronous guarantee Primary site Remote mirror sync Conundrum: there is no middle ground Local-sync Remote-sync

Challenge  How can we increase reliability of local-sync protocols? Given many enterprises use local-sync mirroring anyways  Different levels of local-sync reliability Send update to mirror immediately Delay sending update to mirror – deduplication reduces BW

Talk Outline  Introduction  Enterprise Continuity How data loss occurs How we prevent it A possible solution  Evaluation  Conclusion

How does loss occur? Primary site Remote mirror  Rather, where do failures occur?  Rolling disasters Packet loss Partition Site Failure Power Outage

Enterprise Continuity: Network-sync Local-sync Network-syncRemote-sync Wide-area network Primary site Remote mirror

Enterprise Continuity Middle Ground  Use network level redundancy and exposure reduces probability data lost due to network failure Primary site Remote mirror Data Packet Repair Packet Network-level Ack Storage-level Ack

Enterprise Continuity Middle Ground  Network-sync increases data reliability reduces data loss failure modes, can prevent data loss if At the same time primary site fail network drops packet And ensure data not lost in send buffers and local queues  Data loss can still occur Split second(s) before/after primary site fails… Network partitions Disk controller fails at mirror Power outage at mirror  Existing mirroring solutions can use network-sync

Smoke and Mirrors File System  A file system constructed over network-sync Transparently mirrors files over wide-area Embraces concept: file is in transit (in the WAN link) but with enough recovery data to ensure that loss rates are as low as for the remote disk case! Group mirroring consistency

Mirroring consistency and Log-Structured File System B2B1 append( B1,B2 ) V1R1I2B4B3I1 append( V1.. ) V1 R1 I2 I1 B2 B1 B3B4

Talk Outline  Introduction  Enterprise Continuity  Evaluation  Conclusion

Evaluation  Demonstrate SMFS performance over Maelstrom In the event of disaster, how much data is lost? What is system and app throughput as link loss increases? How much are the primary and mirror sites allowed to diverge?  Emulab setup 1 Gbps, 25ms to 100ms link connects two data centers Eight primary and eight mirror storage nodes 64 testers submit 512kB appends to separate logs  Each tester submits only one append at a time

Data loss as a result of disaster  Local-sync unable to recover data dropped by network  Local-sync+FEC lost data not in transit  Network-sync did not lose any data Represents a new tradeoff in design space Primary site Remote mirror - 50 ms one-way latency - FEC(r,c) = (8,3) Local- sync Network- sync Remote- sync

High throughput at high latencies

Application Throughput  App throughput measures application perceived performance  Network and Local-sync+FEC tput significantly greater than Remote-sync(+FEC)

…There is a tradeoff

Latency Distributions

Future Work  Apply Network-sync technologies to large “DataNets”  High-speed online processing of massive quantities of real-time data: Performance enhancement proxies (PEP) Deep packet inspection (DPI) Sensor network monitoring Malware / worm signature detection Software switch, IPv4 – IPv6 gateway  Current state of the art – solution in the kernel  User process cannot keep up with kernel/NIC

Conclusion  Technology response to critical infrastructure needs  When does the filesystem return to the application? Fast — return after sending to mirror Safe — return after ACK from mirror  SMFS — return to user after sending enough FEC  Network-sync: Lossy Network  Lossless Network  Disk!  Result: Fast, Safe Mirroring independent of link length!

 Questions?