High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.

Slides:



Advertisements
Similar presentations
Ethernet Switch Features Important to EtherNet/IP
Advertisements

Remus: High Availability via Asynchronous Virtual Machine Replication
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Evidor: The Evidence Collector Software using for: Software for lawyers, law firms, corporate law and IT security departments, licensed investigators,
Oct 21, 2004CS573: Network Protocols and Standards1 IP: Addressing, ARP, Routing Network Protocols and Standards Autumn
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Inside the Internet. INTERNET ARCHITECTURE The Internet system consists of a number of interconnected packet networks supporting communication among host.
Chapter 3 Chapter 3: Server Hardware. Chapter 3 Learning Objectives n Describe the base system requirements for Windows NT 4.0 Server n Explain how to.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Transis 1 Fault Tolerant Video-On-Demand Services Tal Anker, Danny Dolev, Idit Keidar, The Transis Project.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
File Systems (2). Readings r Silbershatz et al: 11.8.
Chapter 2 Chapter 2: Planning for Server Hardware.
Chapter 1 An Introduction to Networking
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
Chapter 4 COB 204. What do you need to know about hardware? 
Network Structure Students should be aware of what is available in order to –create and use an ICT network: communication devices.
How the Internet Works. The Internet and the Web The Web is actually just one of many computer applications that run on the Internet Among others are.
Common Devices Used In Computer Networks
Networking Basics Lesson 1 Introduction to Networks.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Improving Network I/O Virtualization for Cloud Computing.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Introduction to Network Basic 1. Agenda – - Internetworking Basic – - OSI Layer – - TCP/IP Model – - IP Addressing – - Subnetting & VLSM – - The Internal.
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Chapter 2 Chapter 2: Planning for Server Hardware.
Local-Area-Network (LAN) Architecture Department of Computer Science Southern Illinois University Edwardsville Fall, 2013 Dr. Hiroshi Fujinoki
Practical Byzantine Fault Tolerance
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Lecture (Mar 23, 2000) H/W Assignment 3 posted on Web –Due Tuesday March 28, 2000 Review of Data packets LANS WANS.
Group 2 Bernard Smith Thomas Laborde Hannah Prather Fault Tolerance Environment Power Topology and Connectivity Servers Hurricane Preparedness Network.
VMware vSphere Configuration and Management v6
SECURING SELF-VIRTUALIZING ETHERNET DEVICES IGOR SMOLYAR, MULI BEN-YEHUDA, AND DAN TSAFRIR PRESENTED BY LUREN WANG.
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Term 2, 2011 Week 2. CONTENTS Communications devices – Modems – Network interface cards (NIC) – Wireless access point – Switches and routers Communications.
1 Bus topology network. 2 Data is sent to all computers, but only the destination computer accepts 02608c
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Rehab AlFallaj.  Network:  Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and do specific task.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Data Communication Network Models
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
Microsoft Advertising 16:9 Template Light Use the slides below to start the design of your presentation. Additional slides layouts (title slides, tile.
SMOOTHWALL FIREWALL By Nitheish Kumarr. INTRODUCTION  Smooth wall Express is a Linux based firewall produced by the Smooth wall Open Source Project Team.
IP: Addressing, ARP, Routing
Chapter 1: Introduction
Information Technology Deanship
VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)
Introduction to Networks
CT1303 LAN Rehab AlFallaj.
Introduction to Networks
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Storage Virtualization
AMCOM Digital Archive Design Review - Week 3.
Event Building With Smart NICs
Switching Techniques.
Specialized Cloud Architectures
A simple network connecting two machines
Example of an early computer system. Classification of operating systems. Operating systems can be grouped into the following categories: Supercomputing.
Presentation transcript:

High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The Hebrew University of Jerusalem

Storage Area Network (SAN) §SAN is a high-speed special purpose subnetwork of shared storage devices. §SAN makes all storage devices available to all servers on a LAN or WAN.

SAN From Microsoft web site

SAN services §Disk mirroring §Backup and restore §Archival and retrieval of archived data §Data migration §Fault tolerance §Data sharing among different servers

Fault Tolerance by State Machine Replication §An internal server is “cloned” (replicated) §Each replica maintains its own state §The states of replicas are kept consistent by applying transactions to all the replicas in the same order §The order of transactions is established by a total order algorithm

Types of Total Order Algorithms §Symmetric (Lamport 78) l Messages are ordered based on a logical timestamp and sender’s id §Leader-based (Sequencer) l All messages are ordered by a special process called sequencer

Data Sharing and Locking §In order to keep shared files consistent, a lock mechanism is to be implemented taking into account that: l locking is a frequent event in SAN l SAN is expected to provide high throughput l lock requests should be served promptly (low latency)

Hardware Solution §Low latency with high throughput is achievable by means of special hardware §Building a novel hardware, however, is expensive and time-consuming §Is it possible to achieve the goal using off-the-shelf hardware?

Network Message Order §Messages are ordered by the network §The order may be not the same for different replicas due to: l Message losses l Internal loopback §Neutralizing measures: l A flow control mechanism to minimize message losses. l Disabling loopback (but how to discover the order of my own messages without loopback?)

Network Order m1m2 m1 m2 m1 m2 m1m2 m1 m2m1 P1P2 P3

The solution is Virtual Sequencer §Each machine is equipped with TWO Network Interface Cards (NIC1 and NIC2) §Data is sent via NIC1 §Data is received via NIC2 §NIC1 cards are connected to switch A §NIC2 cards are connected to switch B §Switch A and the link from it to switch B serve as Virtual Sequencer

Two-Switch Network m2 m1 m2m1 m2m1 m2 m1 m2 Switch ASwitch B P3 P2 P1

Two-Switch Network

Implementation Issues §Recommended Switch Features l IGMP snooping (highly recommended) l Traffic shape (recommended) l Jumbo frames (optional) §Required OS Feature l Ability to disable Reverse Path Forward (RPF) checking

Test Bed §5 Pentium 500 Mhz machines with 32 bit x 33 Mhz PCI bus §Each machine is running Debian GNU/Linux §Each machine is equipped with 2 Intel Pro1000MT Desktop Adapters §2 Dell 6024 switches

Order Guarantees §Preliminary Order (PO) l PO is guessed on the bases off the network order by each a machine and can be changed later §Uniform Total Order (UTO) l UTO is never changed ( even for faulty machines ) l UTO is established by collecting acknowledgments from all the machines

All-to-All Experiment Number of Machines Throughput (Mb/s) PO Latency (ms) UTO Latency (ms) Attention: PCI bus is a real bottleneck!

Disjoint sets of senders and receivers Receivers Senders Attention: Scalable in number of senders. Less scalable in number of receivers due to increase in ACKs number.

Latency vs. Throughput (All-to-All)

Latency vs. Throughput (disjoint sets of senders and receivers)

Locks and Preliminary Order §Lock requests may be served faster based on preliminary order §Each server may rejected/accepted a lock according to Preliminary Order §Only iff the request order has been guessed correctly by a server, the response from the server is taken into account

Packet Aggregation §In the experiments, results were obtained for MTU-size messages, however §Lock requests are usually small messages §A solution implementing simple aggregation of lock requests will increase the locking mechanism latency §To keep the latency low, a modification of Nagle algorithm was used to perform “smart” packet aggregation.

Packet Aggregation Results

Future Work §Obtaining Patent §Application Development: l Games l Grid Computation §Using NICs with CPU to: l implement zero-copy driver l send and aggregate acknowledgments by NIC §Checking scalability