High-Performance Object Access in OSD Storage Subsystem Yingping Lu.

Slides:



Advertisements
Similar presentations
Network-I/O Convergence in Too Fast Networks: Threats and Countermeasures David R. Cheriton Stanford University.
Advertisements

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
© 2007 Cisco Systems, Inc. All rights reserved.ICND1 v1.0—1-1 Building a Simple Network Understanding the TCP/IP Transport Layer.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
IP –Based SAN extensions and Performance Thao Pham CS 622 Fall 07.
Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.
OSI Model MIS 416 – Module II Spring 2002 Networking and Computer Security.
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.
Data Communications Architecture Models. What is a Protocol? For two entities to communicate successfully, they must “speak the same language”. What is.
Split-OS: Operating System Architecture for a Cluster of Intelligent Devices Kalpana Banerjee, Aniruddha Bohra, Suresh Gopalakrishnan, Murali Rangarajan.
Research Agenda on Efficient and Robust Datapath Yingping Lu.
Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of.
OSI Model 7 Layers 7. Application Layer 6. Presentation Layer
Data Networking Fundamentals Unit 7 7/2/ Modified by: Brierley.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
COE 342: Data & Computer Communications (T042) Dr. Marwan Abu-Amara Chapter 2: Protocols and Architecture.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Supporting iWARP Compatibility and Features for Regular Network Adapters P. BalajiH. –W. JinK. VaidyanathanD. K. Panda Network Based Computing Laboratory.
File Systems and N/W attached storage (NAS) | VTU NOTES | QUESTION PAPERS | NEWS | VTU RESULTS | FORUM | BOOKSPAR ANDROID APP.
Signature Verbs Extension Richard L. Graham. Data Integrity Field (DIF) Used to provide data block integrity check capabilities (CRC) for block storage.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
ISO Layer Model Lecture 9 October 16, The Need for Protocols Multiple hardware platforms need to have the ability to communicate. Writing communications.
Presentation on Osi & TCP/IP MODEL
1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
LWIP TCP/IP Stack 김백규.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Lecture 3 Review of Internet Protocols Transport Layer.
Internet Addresses. Universal Identifiers Universal Communication Service - Communication system which allows any host to communicate with any other host.
William Stallings Data and Computer Communications 7 th Edition Data Communications and Networks Overview Protocols and Architecture.
The NE010 iWARP Adapter Gary Montry Senior Scientist
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
TCP/IP Honolulu Community College Cisco Academy Training Center Semester 2 Version 2.1.
User-mode I/O in Oracle 10g with ODM and DAFS Jeff Silberman Systems Architect Network Appliance Session id: Margaret Susairaj Server Technologies.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Chapter 131 Distributed Processing, Client/Server, and Clusters Chapter 13.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Protocols and Architecture Slide 1 Use of Standard Protocols.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Internet Protocol Storage Area Networks (IP SAN)
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
Progress in Standardization of RDMA technology Arkady Kanevsky, Ph.D Chair of DAT Collaborative.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Tgt: Framework Target Drivers FUJITA Tomonori NTT Cyber Solutions Laboratories Mike Christie Red Hat, Inc Ottawa Linux.
Infiniband Architecture
Fabric Interfaces Architecture – v4
Introduction to Networks
Storage Networking Protocols
Application taxonomy & characterization
OSI Reference Model Unit II
OSI Model 7 Layers 7. Application Layer 6. Presentation Layer
Transport Layer 9/22/2019.
Unit – III Network Essentials
Presentation transcript:

High-Performance Object Access in OSD Storage Subsystem Yingping Lu

Outline OSD Overview Problem and common approaches Related work Initial Proposal Issues

Design Objectives of OSD Scalability (local area-enterprise-global) High-performance (high throughput, low latency) Cross platform High availability (resilient to device, machine failure) Support both permanent, mobile and even disconnected clients Security (authentication, access control, transmission and data storage encryption) Data sharing Manageability?

Region Communication Entities: Client Metadata Manager OSD device Communication Paths: Client to metadata server Client to OSD device Metadata to OSD device Metadata to Metadata

Problem The network bandwidth is getting faster and faster (10Gb/s is on the road). OSD Application requires high performance How to efficiently deliver object data between OSD device and client?

Potential Measures Potential performance improvement measures – Locality-based Migration (reduce transmission time) – Migrate to the location closer to client. Replication (reduce transmission time) – Replicate a copy within the client’s proximity. – Can replicate data object or metadata. Cache (reduce disk access time/transmission time) – Where: client, metadata server, object device, etc. – What: data object, metadata, locking. – How long: TTL, lease, renewal.

Performance Improvement Measures (cont.) Improvement measures – Aggregation (Device grouping) Improve the aggregate I/O throughput and reliability Works like a RAID system – Data path-based Decouple the control path from data path Reduce the length of critical path in the data access level.

Performance Constraints Consistency (in updating, reconciliation) Locking and serialization Security Small data size access Crash recovery

Leveraging Data Access Path Streamline the end system Zero copy/RDMA User level programming/OS bypass TCP offloading Improve the transport system Large window size Explicit congestion notification Selective acknowledgement Connection splitting (mobile) Explicit congestion control protocol (XCP)

What’s Wrong With End System Streamlining end systems – Problems: the end system cannot provide the potential bandwidth to applications. Memory copy Context switching Interrupt service Checksumming generation Protocol processing

End System Overhead Streamlining the end system – Overhead Per packet – Protocol processing (execute code, allocate/release buffer) – access control – Interrupt service time for each received packet – Kernel context switching Per byte – Checksum generation – Memory copy – Data transmission

Streamlining End System Solutions – RDMA (Zero copy) – One system-wide buffer pool – User level networking (bypassing kernel) – TCP offloading – Jumbo packets – Interrupt coalescing – Scatter/gather list

Related work Previous work: – I/O Lite – VI (Myrinet, Servernet) – SDP – InfiniBand – SRP – DAT (Direct Access Transport collaborative) – DAFS (SNIA) – NFS/RDMA (SNIA) – RDMA over TCP/IP

I/O Lite Purpose: Reduce memory copy Approach: Maintain a global buffer pool in the system Allow application, IPC, file system, network subsystem to share one copy of data Pros: – Reduce memory copy – Useful for read-only buffer Cons: – System rewritten – Buffer update is difficult

RDMA Extend DMA’s semantics across machine boundary Two operations: RDMA read, RDMA Write Memory registration: memory needs “pinned” A descriptor carries the src, dest address, length A special hardware (nic) handle the RDMA operation. Pros: – Zero copy – Offload CPU processing Cons – Need Special hardware – Need reprogramming

Remote DMA Scenario Host AHost B RDMA Engine (NIC) RDMA Engine (NIC) Buffer A CPU Buffer B CPU 1 2 3

Virtual Interface Architecture (VIA) Goal:low latency, high throughput by direct access to NIC, zero copy Programming abstract: VI(queue pair) Components: consumer,VI provider(UA, KA, NIC) Operations: RDMA, Send/Receive Present a standard of RDMA operations and VI abstract

InfiniBand An emerging I/O interconnect technology Decouple I/O from CPU Adopt a serial, switched- based fabric Provide a unified communication mechanism (4 layers) Provide VI support (Verb, QP, RDMA, etc.) Implement VI concept in a standard network

SCSI RDMA Protocol (SRP) Goal: provide a SCSI access across IB fabric Exploit the IB RDMA to transfer SCSI data Enable SAN based on IB It’s targeted specifically for IB, not suitable for IP It’s block-level (SCSI) access, (can be object level?)

DAFS and NFS/RDMA DAFS is being developed by DAFS consortium A light weight file sharing protocol for local data sharing Leverage NFS4.0 Exploit RDMA mechanism to transfer file data. Being developed by SNIA NFS/RDMA group Enable NFS to exploit the new networking technology (VIA, IB) Make changes to RPC/XDR to use RDMA semantics Target at local area environment

Socket Direct Protocol (SDP) Microsoft’s solution in datacenter (2000) Retain the same socket programming interface Bypass the TCP/IP processing in kernel Support RDMA semantic Not routable, works in a data center or cluster

RDMA over TCP/IP Developed by rdmaconsortium Support RDMA over TCP/IP network Consisted of three components: RDMAP, DDP, MPA RDMAP: provide RDMA operations DDP: direct data placement MPA: handle framing SCTP: stream-control transport protocol SCTP DDP RDMAP ULP TCP MPA IP

Summary Link-level – No routing info carried – Rely on the underlying link-level switch to forward – Restricted to data center, cluster environment – Examples: VIA, InfiniBand, SRP, SDP, DFAS, NFS/RDMA Transport-level – Carries TCP/IP header – Can traverse to IP network – Process framing, direct data placement.

OSD Requirements Direct delivery from object device – Direct transmission between initiator and target device – This is the critical data path Secure delivery – No security channel is assumed, encryption of transmitted object is necessary QoS requirement – Object may have specific QoS requirement Mobile client – Client may be connect, disconnect connected again. – Error can occur during transmission

Initial Proposal: OSD/Secure RDMA This is a ULP-based RDMA – The RDMA is tightly integrated with OSD protocol Leverage RDMA over TCP/IP – Extend the communication to IP network OSD device initiate RDMA request Security-enabled RDMA – The underlying transport support security QoS support – Virtual Lane-type mechanism to provide QoS support

OSD/Secure RDMA Architecture OSD Client OSD controller OSD VIPL Object Manager Buffers Disk Driver NIC VI NIC driver OSD Device Application OSD VIPL Buffers NIC VI NIC driver IP network

Protocol Stacks OSD/RDMA maps OSD to RDMA DDP provide the direct data placement The underlying transport can be either SCTP or MPA with TCP. IPSec is used as security protocol (object encryption) SCTP DDP OSD/RDMA OSD Protocol TCP MPA IP/IPSec Intelligent NIC OSD Consumer OSD VIPL Consumer

Data Access Case – Get an Object OSD ClientOSD Device Request an obj with Obj id, credential, descriptor RDMA write Data packet RDMAWrCompl 1* 2* 1*: need first get access permission and establish an session. Register memory Post a send request 2*: Validate the request. Register a memory buffer Fetch the object from disk or cache to the buffer Post a RDMA write request

Issues to be solved Elaborate OSD object transfer protocol. – Should we simply consider SCSI/OSD? – What would be new requirement, e.g. security? The integration of iSCSI over RDMA. – The establishment of session OSD session/iSCSI session/RDMA connection/TCP connection Sequence? Persistence vs. transient? – Define the format of OSD/RDMA packet Memory descriptor Commands (login, logout, CMD) Flow-control

Issues Integration of RDMA with OSD (cont.) – Define a set of standard API for OSD/RDMA Create a session Register memory Post a work queue element Query status, etc. Integration with security – IPSec vs. SSL? Handle QoS requirement – QoS attributes, how to specify in an object – QoS assurance: credit-based flow control?