1 What's Next for the Net? - Grid Computing Internet2 Member Meeting Sept 21,2005 Debbie Montano
2 Global Grid – Networking Debbie Montano –Director R&E Alliances, Force10 Networks Force10 Networks –GigE / 10 GigE switch/routers Will our networks be able to provide the high- speed access that Grid users will need and demand? Grid - Sharing Resources –Computing Cycles –Software –Databases / Storage –Network Bandwidth…!
3 Global Grid – Vision to Reality Themes… Networks WILL keep up (or catch up) with needs of Grids Flexible use of Bandwidth will become integral to Grids Ethernet is key
4 Networks will support Grids If Grids are the driving applications, the network will be there The need is recognized for –robust networks –increased bandwidth –new network infrastructure –To support vast amounts of data and grid collaborations Example: SC2005 supercomputing & high performance networking conference: –Over 55 x 10 Gbps of WAN bandwidth is converging on Seattle –Approx 40 x 10 GigE of bandwidith for Bandwidth Challenge
5 TeraGrid – NSF investment NSF investing $150M – on top of the initial > $100M investment -- to ensure access to and use of this Grid resource! Most TeraGrid nodes use Force10 switch/routers for access to users Credits: Graphics: N.R. Fuller, National Science Foundation Bottom images (left to right): (1) A. Silvestri, AMANDA Project, University of California, Irvine; (2) B. Minsker, University of Illinois, Urbana-Champaign, using an MT3DMS model developed at the Army Corps of Engineers and modified by C. Zheng, University of Alabama; (3) M. Wheeler, University of Texas, Austin; J. Saltz, Ohio State University; M. Parashar, Rutgers University; (4) P. Coveney, University College London / Pittsburgh Supercomputing Center; (5) A. Chourasia, Visualization Services, San Diego Supercomputer Center and The Southern California Earthquake Center Community Modeling Environment
6 Top 500: Customer Segment Segment Industry55.0%52.8% Research22.0%22.2% Academic16.0%18.6% Classified3.0%3.4% Vendor3.8%2.8% Others0.2% - In the top 500 supercomputers, more than half of the clusters are owned by Industry That type of investment will drive efficient use and the necessary supporting infrastructure Over 41% of clusters are in research & academic environments. The days of exclusive ownership and control are being replaced by sharing across disciplines, across university systems, research labs, states and even around the world
7 CERN – International Resource CERN – International Resource; International Collaboration Scientific partners around the world Investing in networking: –Announced Monday, 9/19/2005, CERN will deploy the TeraScale E-Series family of switch/routers as the foundation of its new 2.4 Terabit per second (Tbps) high performance grid computing farm –The TeraScale E-Series will connect more than 8,000 processors and storage devices –Also provides the first intercontinental 10 Gigabit Ethernet WAN links in a production network
8 State & Regional Investment Networking Investment at all Layers Regional Optical Networks (RONs) are Growing –State and Universities investing in their own fiber and optical infrastructure to ensure affordable growth and abundant bandwidth –Southern Light Rail –I-Light Indiana –LEARN – Texas –Louisiana Optical Networking Initiative (LONI) Additional GigaPOP Layer 2/3 Services Costs are continuing to go down –Ethernet port costs, for example, continuing to drop –Densities for GigE and 10 GigE continuing to improve –Lower cost technologies being used more
9 Flexibility of Bandwidth Lots of Bandwidth but “smart” use High Speed links dedicated to specific grids versus shared flexible use of bandwidth Network links as a resource on the grid itself, to be shared, managed and allocated as a needed Need flexible layers above the “dedicated lambdas”
10 New Architectures: HOPI Abilene Network Abilene core router Force10 E600 Switch/Router NLR Optical Terminal Abilene Network NLR 10 GigE Lambda OPTICAL PACKET NLR Optical Terminal Optical Cross Connect 10 GigE Backbone Control Measurement Support OOB HOPI Node Regional Optical Network (RON) GigaPOP GigaPOP
11 Ethernet is Key Local Area Network (LAN) Metropolitan Area Network (MAN) –Metro Ethernet –Ethernet Aggregation Wide Area Network (WAN) –Carriers moving to ethernet and IP services –WAN PHY (Physical Interface) playing a role All the way down to CPU-to-CPU communication in supercomputers –Ethernet adoption is continuing to grow
12 What Drives Grid / Cluster Topology? Four Networking Requirements WAN 2 Gigabit Fiber SAN FiberConnect 10 User directory and applications 5000 Linux” compute” cluster nodes 700Mbytes/sec Users Interconnect (node-to-node communication) 15 TByte I/O To Storage Storage Management I/O To Users (Campus backbone or WAN)
13 Grids / Clusters System Interconnects –Node-to-node: Inter-processor Communication (IPC) –Management Network –I/O to users, outside world (campus, LAN, WAN) –Storage, servers & storage subsystems IPC Interconnect Technology – GigE now #1 –Top 500 Supercomputers –Ethernet Rapid Growth –Favored in Clusters Other System Interconnection –Major reliance on Ethernet Type Ethernet35.2%42.4% Myrinet38.6%28.2% SP Switch9.2%9.0% NUM Alink3.4%4.2% Crossbar4.6%4.2% Proprietary3.4% Infiniband2.2%3.2% Quadrics4.0%2.6% Other2.8% -
14 Interconnects – Ethernet NICs Speedup methods –Stateless offload (performance improvement without breaking I/O stack, compatible with off- the-shelf OS TCP/IP) –TOE - TCP Offload Engine –OS bypass / eliminate context switching –RDMA / remote DMA / eliminate payload copying –iWARP / combination of TOE, OS Bypass, and RDMA Hot 10 GbE NIC vendors:
15 Management I/O What Makes Sense? Management network is ALWAYS required –Out-of-band, in-band, control & management –CPU & memory utilization per node, system temperature, cooling. Management has to touch each node – device density is important, helping to simplify topology If the cluster is in trouble, management network is needed to fix it – must be reliable! With Ethernet, Management is FREE
16 User Gateway What Makes Sense? Ethernet is ALWAYS the user gateway –Dominant installed base & knowledge base –End systems are connected via Ethernet Ethernet advantages –No distance limitation –5 microseconds per mile –7 Gbps over 20km (541 GB of data in 10 min.) –Data center or cluster core switch/router extends directly into the LAN –Less devices, simplifying topology
17 Compute-Intensive 256 nodes Data-Intensive 128 nodes Compute- Intensive 814 nodes Extensible Backplane Network LA Hub Chicago Hub 30Gb/s Visualization 112 nodes Data collection analysis 55 nodes 40 Gb/s An Example Of Long Distance Sharing NSF / DoE TeraGrid 30Gb/s Data Sets Stored Here Data Set Moved Here for Computing
18 Role of Ethernet – Benefits Industry Standard (IEEE) Ubiquitous (Everywhere) and proven Technology Standard Communication Technology when the Cluster Talks to the Rest of the World (Grid) Does Not Suffer From distance Limitations Scales to 1000’s and even 10,000’s of nodes Allows for Single Fabric Design Easy to Configure, Manage, and Administer for Cluster Environments (Competing Fabrics require cumbersome multichassis solutions & COMPLEX mapping) 53% yr/yr reduction in price / bit in 15 yrs (ref: Gartner) Almost All Shipping Servers Include one or more 1000Base-TX NICs w/ TOE
19 Global Grid – Vision to Reality Themes… Networks WILL keep up (or catch up) with needs of Grids Flexible use of Bandwidth will become integral to Grids Ethernet is key
20 Thank You