Management Scalability Author: Todd Rimmer Date: April 2014.

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

SDN Controller Challenges
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Modelling and Analysing of Security Protocol: Lecture 10 Anonymity: Systems.
Lesson 17: Configuring Security Policies
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
SDN and Openflow.
Exploring Improvement to Verbs Tom Stachura, OFA TAC Co-chair #OFADevWorkshop.
Azure Services Platform Piotr Zierhoffer. Agenda Cloud? What is Azure? Environment Basic glossary Architecture Element description Deployment.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 22 Introduction to Computer Networks.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
MIS 431 Chapter 71 Ch. 7: Advanced File Management System MIS 431 Created Spring 2006.
ATS SSL Updates ATS Summit Spring 2015 Susan Hinrichs.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
Copyright © 2007 InfiniBand ® Trade Association. Other names and brands are properties of their respective owners. IB Cross-Subnet Communication OpenFabrics.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
2012 High Performance Computing Speed, Low Latency, and Parallel Programming in Financial Services 2012 High Performance Computing Speed, Low Latency,
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Crystal And Elliott Edward M. Kwang President. Crystal Version Standard - $145 Professional - $350 Developer - $450.
OFED 1.x Roadmap & Release Process November 06 Jeff Squyres, Woodruff, Robert J, Betsy Zeller, Tziporet Koren,
1 Solving the records management problem A cloud-computing approach to archiving Amanda Kleha Product Marketing, Google May 20, 2008.
An IPSec-based Host Architecture for Secure Internet Multicast R. Canetti, P-C. Cheng, F.Giraud, D. Pendarakis, J.R. Rao, P. Rohatgi, IBM Research D. Saha.
Austin code camp 2010 asp.net apps with azure table storage PRESENTED BY CHANDER SHEKHAR DHALL
Virtual techdays INDIA │ august 2010 SQL Azure – Tips and Tricks Ramaprasanna Chellamuthu │ Developer Evangelist, Microsoft.
The Open Fabrics Verbs Working Group Pavel Shamis and Liran Liss.
OFED 1.2 Lessons, 1.3 Planning and Field Support May 07 Tziporet Koren.
InfiniBand Routing Solution Approach Yaron Haviv, CTO, Voltaire
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Scalable name and address resolution infrastructure -- Ira Weiny/John Fleck #OFADevWorkshop.
Update on Scalable SA Project #OFADevWorkshop Hal Rosenstock Mellanox Technologies.
Cassandra - A Decentralized Structured Storage System
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
Crystal And Elliott Edward M. Kwang President. Objective A brief demo of Crystal Report to entice you –People spend thousand of dollars to attend Crystal.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
TAC Report Tom Stachura & Diego Crupnicoff OFA TAC chairs #OFADevWorkshop.
OFED 1.3 InfiniBand Management Update Hal Rosenstock.
OpenFabrics Interface WG A brief introduction Paul Grun – co chair OFI WG Cray, Inc.
Open MPI OpenFabrics Update April 2008 Jeff Squyres.
OFED 1.2 Management Update Hal Rosenstock.
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
Turn Bare Metal Into Silver Lining With SCVMM 2012, Today! Mark Rhodes OBS SESSION CODE: SEC313 (c) 2011 Microsoft. All rights reserved.
OpenFabrics Developers Summit SC06 QoS Update and Implementation RFC Eitan Zahavi, Mellanox Technologies Nov 2006.
InfiniBand Routing in OFA Jason Gunthorpe – Obsidian Sean Hefty – Intel Hal Rosenstock – Voltaire.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Brian Lauge Pedersen Senior DataCenter Technology Specialist Microsoft Danmark.
Aaron Stanley King. What is SQL Azure? “SQL Azure is a scalable and cost-effective on- demand data storage and query processing service. SQL Azure is.
Presented by: Aaron Stanley King.  Benefits of SQL Azure  Features of SQL Azure  Demos, Demos, Demos!  How to query in SQL Azure  More Demos!  Recent.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
SUSE Linux Enterprise Server for SAP Applications
Cassandra - A Decentralized Structured Storage System
Zueyong Zhu† and J. William Atwood‡
op5 Monitor - Scalable Monitoring
Informatica PowerCenter Performance Tuning Tips
VirtualGL.
Simple Connectivity Between InfiniBand Subnets
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Module 1–Windows AppFabric Cache
Application taxonomy & characterization
Building and running HPC apps in Windows Azure
Building global and highly-available services using Windows Azure
Database System Architectures
Michael Stephenson Microsoft MVP - Azure
Presentation transcript:

Management Scalability Author: Todd Rimmer Date: April 2014

Agenda Projected HPC Scalability Requirements Key Challenges –Path Record –IPoIB –Mgmt Security –Partitioning –Multicast –Notices –SA interaction Call to Action March 30 – April 2, 2014#OFADevWorkshop 2

Projected HPC Scalability Requirements Perf increase 2x/year Rapidly increasing node counts HPC and Cloud Due to slower pace of interconnect speed growth need multi-rail clusters HCA counts will grow even faster March 30 – April 2, 2014#OFADevWorkshop 3

Key Mgmt Scalability Bottlenecks PathRecord Query IPoIB ARP March 30 – April 2, 2014#OFADevWorkshop 4

PathRecord Query Today User apps –Use rdma CM ibacm optional cache –Use libumad and hand craft –Use UD QPs –Hand build PR (MPIs) –Other (via IPoIB …) Kernel ULPs –Call ib_sa March 30 – April 2, 2014#OFADevWorkshop 5 No single place to put PR optimizations

PathRecord Query Scalability Need to 1 st standardize a user space API –Libfabrics (OFI WG) and RDMA CM are logical choices Use API in all ULPs, benchmarks, demos, tools, diagnostics, etc. –Both kernel and user space –So everyone benefits from scalability improvements Decouple API from IPoIB –Multi rail clusters may not want IPoIB on all rails March 30 – April 2, 2014#OFADevWorkshop 6

PathRecord Query Need a plugin architecture behind the API Need a variety of plugins –Small clusters can do direct PathRecord query –Modest clusters can do PathRecord caching –Large clusters need PathRecord replicas or ibssa –Huge clusters need algorithmic approaches Topology dependent optimizations –Permit research and experimentation Start with direct, ibssa and cached plugin March 30 – April 2, 2014#OFADevWorkshop 7 One size does not fit everyone

IPoIB ARP Scalability Need a multi-tiered approach in IPoIB –Modest clusters can do standard ARP/broadcast Perhaps with long ARP timeouts (hours, days) –Large clusters need pre-loaded ARP tables –Huge clusters need algorithmic approaches Topology dependent Need to 1 st standardize a plug-in API API needs to tie into PathRecord Plug-In Implement std ARP and pre-loaded plugins 1st March 30 – April 2, 2014#OFADevWorkshop 8

Other Mgmt Issues Umad security Partitioning Multicast Notices SA interaction pacing March 30 – April 2, 2014#OFADevWorkshop 9

Mgmt Security Umad security issues –Requires root access by default –Use of umad by applications forces opening security –Umad is too easy a vehicle to attack server or cluster First steps –Rapidly move applications away from using umad –Simplify API, remove apps hand building packets Multicast membership, Notices, etc –Remove need for SM and diagnostics to be root Need ability for secured umad use 10 March 30 – April 2, 2014#OFADevWorkshop

Partitioning Proper Operation will be necessary for HPC Cloud Don’t assume full membership in default partition –Carefully reading of IBTA reveals: –Default partition is just a power on default, not a guarantee –If it was a guarantee, IBTA partitioning would be useless everyone could use 0xffff to talk to anyone –Only guarantee is membership in 0x7fff to permit SA query Fix P_Key assumptions in SA queries, ibacm, tools, etc –Proper use of PathRecord query will solve most of this –Search local P_Key table to decide if 0x7fff or 0xffff present IPoIB react to P_Key table changes during Port Initialize –especially entry 0 PKey indexes can change between boot and port Active 11 March 30 – April 2, 2014#OFADevWorkshop Part A Part B Part C 0x7fff Part

Multicast Multicast in IBTA –Each node can join/leave a group only once –Multicast join/leave are for whole node Multicast use goes beyond just IPoIB –ibacm, MPI collectives, kernel bypass for FSI –RDMA CM has some APIs, needs to coordinate w/kernel Need API w/kernel muxing of multicast membership –IBTA compliant node level interactions with SM/SA –Allow multiple processes, kernel and user to join a group –Automated cleanup when processes die –Also removes another need for umad access by apps 12 March 30 – April 2, 2014#OFADevWorkshop

Notices Use of Notices by applications is scalability issue –Can force O(N) messages from SM on each event –Example: turn off 100 nodes in 10K fabric -> 1M notices –Example: turn off 50K nodes in 100K fabric -> 2.5B notices At host need Notice muxing –Each node register/receive/deregister only once –Need kernel muxing of notice registration –Need kernel muxing of notice delivery/ack –Need cleanup when processes die –Also removes another need to umad access by applications Should we restrict or disable use of notices? 13 March 30 – April 2, 2014#OFADevWorkshop

SA Interaction Scalability Centralization of PR, Multicast and Notices is 1 st step This then permits tuning of SA interactions based on scale SA Response Timeout/Retry Handling –Clients today use fixed timeouts –Timeouts chosen a priori without knowledge of SA nor fabric load Need centralized config of timeouts and retry settings –As opposed to per application constants Retries should perform non-linear backoff SA Busy Response Handling –Present OFA code does immediate retry –Prevents SA from using BUSY to pace its workload –SA forced to discard BUSY should cause client backoff before attempting retry –Non-linear backoff also recommended March 30 – April 2, 2014#OFADevWorkshop 14

Next Steps Lets all collaborate to solve these challenges Your participation in discussion is encouraged 15 March 30 – April 2, 2014#OFADevWorkshop Lets be committed to solving these long standing issues

Summary Cluster sizes will grow year over year OFA has some long standing scalability issues Solutions are possible Lets all commit to making it happen March 30 – April 2, 2014#OFADevWorkshop 16

#OFADevWorkshop Thank You