Presentation is loading. Please wait.

Presentation is loading. Please wait.

TeraGrid and I-WIRE: Models for the Future? Rick Stevens and Charlie Catlett Argonne National Laboratory The University of Chicago.

Similar presentations


Presentation on theme: "TeraGrid and I-WIRE: Models for the Future? Rick Stevens and Charlie Catlett Argonne National Laboratory The University of Chicago."— Presentation transcript:

1 TeraGrid and I-WIRE: Models for the Future? Rick Stevens and Charlie Catlett Argonne National Laboratory The University of Chicago

2 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett TeraGrid Interconnect Objectives Traditional: Interconnect sites/clusters using WAN WAN bandwidth balances cost and utilization- objective to keep utilization high to justify high cost of WAN bandwidth TeraGrid: Build a wide area “machine room” network TeraGrid WAN objective to handle peak M2M traffic Partnering with Qwest to begin with 40 Gb/s and grow to ≥80 Gb/s within 2 years. Long-Term TeraGrid Objective Build Petaflops capable distributed system, requiring Petabytes storage and a Terabit/second network. Current objective is to step toward this goal. Terabit/second network will require many lambdas operating at minimum OC-768 and its architecture is not yet clear.

3 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Outline and Major Issues Trends in national cyberinfrastructure development TeraGrid as a model for advanced grid Infrastructure I-WIRE as a model for advanced regional fiber infrastructure What is needed for these models to succeed Recommendations

4 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Trends Cyberinfrastructure Advent of regional dark fiber infrastructure Community owned and managed (via 20 yr IRUs) Typically supported by state or local resources Lambda services (IRUs) viable replacements for bandwidth service contracts Need to be structured with built in capability escalation (BRI) Need strong operating capability to exploit this Regional (NGO) groups moving faster (much faster!) than national network providers and agencies A viable path to putting bandwidth on a Moore’s law curve Source of new ideas for national infrastructure architecture

5 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett OC-48 Cloud 0.5 GB/s78 MB/s 2000 s (33 min) 13k s (3.6h) Traditional Cluster Network Access 64 GB 1 TB 1024 MB GbE OC-12 Traditionally, high-performance computers have been islands of capability separated by wide area networks that provide a fraction of a percent of the internal cluster network bandwidth. (Time to move entire contents of memory) High performance cluster system interconnect using Myrinet with very high bisection bandwidth (hundreds of GB/s) with external connection of n x GbE, n is small integer.

6 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Interconnect To Build a Distributed Terascale Cluster… Big Fast Interconnect 4096 GB 10 TB 64 GB 5 GB/s 200 s (3.3 min) 10 TB 5 GB/s = 200 nodes x 25 MB/s (=20% of GbE per node) TeraGrid is building a “machine room” network across the country while increasing external cluster bandwidth to many GbE. Requires edge systems that handle n x 10 GbE and hubs that handle minimum 10 x 10 GbE. (Time to move entire contents of memory and application state on rotating disk) Each node with external GbE

7 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett 13.6 TF Linux TeraGrid 32 5 5 Router or Switch/Router 32 quad-processor McKinley Servers (128p @ 4GF, 8GB memory/server) Fibre Channel Switch HPSS ESnet HSCC MREN/Abilene Starlight 10 GbE 16 quad-processor McKinley Servers (64p @ 4GF, 8GB memory/server) NCSA 500 Nodes 8 TF, 4 TB Memory 240 TB disk SDSC 256 Nodes 4.1 TF, 2 TB Memory 225 TB disk Caltech 32 Nodes 0.5 TF 0.4 TB Memory 86 TB disk Argonne 64 Nodes 1 TF 0.25 TB Memory 25 TB disk IA-32 nodes 4 Juniper M160 OC-12 OC-48 OC-12 574p IA-32 Chiba City 128p Origin HR Display & VR Facilities = 32x 1GbE = 64x Myrinet = 32x FibreChannel Myrinet Clos Spine = 8x FibreChannel OC-12 OC-3 vBNS Abilene MREN Juniper M40 1176p IBM SP Blue Horizon OC-48 NTON 32 24 8 32 24 8 4 4 Sun E10K 4 1500p Origin UniTree 1024p IA-32 320p IA-64 2 14 8 Juniper M40 vBNS Abilene Calren ESnet OC-12 OC-3 8 Sun Starcat 16 GbE = 32x Myrinet HPSS 256p HP X-Class 128p HP V2500 92p IA-32 24 Extreme Black Diamond 32 quad-processor McKinley Servers (128p @ 4GF, 12GB memory/server) OC-12 ATM Calren 2 2

8 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett TeraGrid Network Architecture Cluster interconnect using multi-stage switch/router tree with multiple 10 GbE external links Separation of cluster aggregation and site border routers necessary for operational reasons Phase 1: Four routers or switch/routers each with three OC-192 or 10 GbE WAN PHY MPLS to allow for >10 Gb/s between any two sites Phase 2: Add Core routers or switch/routers Each with ten OC-192 or 10 GbE WAN PHY Ideally should be expandable with additional 10 Gb/s interfaces

9 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Los Angeles 710 N. Lakeshore (Starlight) Chicago 1 mi Option 1: Full Mesh with MPLS Cluster Aggregation Switch/Router One Wilshire (Carrier Fiber Collocation Facility) Qwest San Diego POP Site Border Router or Switch/Router 2200mi 140mi25mi 115mi20mi 455 N. Cityfront Plaza (Qwest Fiber Collocation Facility) CaltechSDSC NCSA ANL Caltech Cluster SDSC Cluster NCSA Cluster ANL Cluster DWDM OC-192 10 GbE Cienna Corestream DWDM DWDM TBD Other site resources IP Router

10 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Expansion Capability: “Starlights” Los Angeles One Wilshire (Carrier Fiber Collocation Facility) Qwest San Diego POP 2200mi 140mi25mi 115mi20mi 455 N. Cityfront Plaza (Qwest Fiber Collocation Facility) CaltechSDSC NCSA ANL Caltech Cluster SDSC Cluster NCSA Cluster ANL Cluster Regional Fiber Aggregation Points Additional Sites And Networks 710 N. Lakeshore (Starlight) Chicago DWDM OC-192 10 GbE Cienna Corestream DWDM DWDM TBD Cluster Aggregation Switch/Router Site Border Router or Switch/Router Other site resources 1 mi IP Router IP Router (packets) or Lambda Router (circuits)

11 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Partnership: Toward Terabit/s Networks Aggressive Current-Generation TeraGrid Backplane 3 x 10 GbE per site today with 40 Gb/s in core Grow to 80 Gb/s or higher core within 18-24 months Requires hundreds of Gb/s in core/hub devices Architecture Evaluation for Next-Generation Backplane Higher Lambda-Counts, Alternative Topologies OC-768 lambdas Parallel Persistent Testbed Use of 1 or more Qwest 10 Gb/s lambdas to keep next-generation technology and architecture testbeds going at all times. Partnership with Qwest and local fiber/transport infrastructure to test OC- 768 and additional lambdas. Can provide multiple, additional dedicated regional10 Gb/s lambdas and dark fiber for OC-768 testing beginning 2q 2002 via I-WIRE.

12 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett UIUC/NCSA Starlight (NU-Chicago) Argonne UChicago IIT UIC Illinois Century Network James R. Thompson Ctr City Hall State of IL Bldg 4 12 4 2 2 4 18 410 12 2 Level(3) 111 N. Canal McLeodUSA 151/155 N. Michigan Doral Plaza Qwest 455 N. Cityfront UC Gleacher 450 N. Cityfront I-Wire Logical and Transport Topology Next Steps- -Fiber to FermiLab, other sites -Additional fiber to ANL, UIC -DWDM terminals at Level(3), McLeodUSA locations -Experiments with OC-768, Optical Switching/Routing

13 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Gigapops  Terapops (OIX) Gigapop data from Internet2 Pacific Lightrail TeraGrid Interconnect

14 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Leverage Regional/Community Fiber Experimental Interconnects

15 Argonne National Laboratory + University of ChicagoR. Stevens / C. Catlett Recommendations ANIR Program should support Interconnection of fiber islands via bit rate independent or advanced ’s (BRI s) Hardware to light-up community fibers and build out advanced testbeds People resources to run these research community driven infrastructures A next gen connection program will not help advance state of the art Lambda services need to be BRI


Download ppt "TeraGrid and I-WIRE: Models for the Future? Rick Stevens and Charlie Catlett Argonne National Laboratory The University of Chicago."

Similar presentations


Ads by Google