Presentation is loading. Please wait.

Presentation is loading. Please wait.

SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.

Similar presentations


Presentation on theme: "SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program."— Presentation transcript:

1 SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program Director, Grids and clusters San Diego Supercomputer Center January 2004

2 SoCal Infrastructure Year 1 Mod-0, UCSD

3 SoCal Infrastructure Building an Experimental Apparatus Mod-0 Optiputer Ethernet (Packet) Based –Focused as an Immediately-usable High-bandwidth Distributed Platform –Multiple Sites on Campus ( a Few Fiber Miles ) –Next-generation Highly-scalable Optical Chiaro Router at Center of Network Hardware Balancing Act –Experiments Really Require Large Data Generators and Consumers –Science Drivers Require Significant Bandwidth to Storage –OptIPuter Predicated on Price/performance curves of > 1GE networks System Issues –How does one Build and Manage a Reconfigurable Distributed Instrument?

4 SoCal Infrastructure Year 2 – Mod-0, UCSD

5 SoCal Infrastructure Southern Cal Metro Extension Year 2

6 SoCal Infrastructure Aggregates Year 1 (Network Build) –Chiaro Router Purchased, Installed, Working (Feb) –5 sites on Campus. Each with 4 GigE Uplinks to Chiaro –Private Fiber, UCSD-only. –~40 Individual nodes, Most Shared with Other Projects –Endpoint resource poor. Network Rich Year 2 (Endpoint Enhancements) –Chiaro Router – Additional Line Cards, IPV6, Starting 10GigE Deployment –8 Sites on Campus –h 3 Metro Sites –Multiple Virtual Routers for Connection to Campus, CENIC HPR, others –> 200 Nodes. Most are Donated (Sun and IBM). Most Dedicated to OptIPuter –Infiniband Test Network on 16 nodes + Direct IB Switch to GigE –Enough Resource to Support Data-intensive Activity, –Slightly network poor. Year 3 + (Balanced Expansion Driven by Research Requirements) –Expand 10GigE deployments –Bring Network, Endpoint, and DWDM (Mod-1) Forward Together –Aggregate at Least a Terabit (both Network and Endpoints) by Year 5

7 SoCal Infrastructure Web Information on the SD OptIPuter http://web.optiputer.net Need folks to start using resources and feedback how things can work better Intention is give full control of resources to experiments Experiments themselves should be of defined timeframe –This is a shared instrument The resource endpoints are experimental –NO BACKUPS –DO NOT EXPECT 7/24 (We don’t have the staff) –THINGS WILL BREAK

8 SoCal Infrastructure High-Level Program Bullets UCSD will complete deployment of a 150+ node distributed test bed that consists of compute, storage, visualization, and instrument endpoints. Management policies will be put in place so that experiments can be assigned physical hardware and then have specialized (experiment-specific) software loaded on assigned nodes. This forms the Core Southern California OptIPuter Test bed In year 2, we won an IBM SURS Grant which allowed us to define a larger storewidth evaluation platform than in the original program plan. This cluster, deployed in January 2004 is 48 nodes with 6 spindles per node. This enables middleware and applications to understand how lambdagrids enable applications. As part of the SURS grant a small IB test fabric was purchased. The Topspin switch includes 4 gigE uplinks to allow investigation of IB to Ethernet communication.

9 SoCal Infrastructure Revised Program Plan Bullets UCSD will complete deployment of a 150+ node distributed test bed that consists of compute, storage, visualization, and instrument endpoints. Management policies will be put in place so that experiments can be assigned physical hardware and then have specialized (experiment-specific) software loaded on assigned nodes. This forms the Core Southern California Optiputer Test bed –Account creation, cataloging of resources, MRTG-based network monitoring, ganglia resource monitoring are already available as a web page at the web.optiputer.net UCSD deployed test bed structure will have a best case bisection bandwidth of 5:1 and a worst case of 8:1. Monitoring will be put in place to measure when these links are saturated UCSD, UCI, and USC/ISI Connections of Optiputer Nodes is expected to be made through CENIC CalREN HPR 1 gigabit/s circuits. This will form a southern California OptIPuter Test bed. At UCSD, connection from the Campus Border Router will be made to the existing Chiaro OptIPuter Router. IPv6 software will be made a standard part of the base OptIPuter software stack by the end of year 2.

10 SoCal Infrastructure Revised Program Plan Bullets Part 2 In year 2, we won an IBM SURS Grant which allowed us to define a larger storewidth evaluation platform than originally proposed. This cluster, deployed in January 2004 is 48 nodes with 6 spindles per node. This enables middleware and applications to understand how lambdagrids enable applications. The 48 Node Storage Cluster will employ a 48-port gigabit switch with a single 10 GigE Uplink to the Chiaro Enstara Router. This will achieve a 5:1 external (bisection) for this particular OptIPuter node. As part of the SURS grant a small IB test fabric was purchased. The Topspin switch includes 4 gigE uplinks to allow investigation of IB to Ethernet Communication..

11 SoCal Infrastructure Some Extracts from the Program Plan USC, UCI and SDSU, in addition to UCSD, will install OptIPuter nodes and link them to the CENIC dark fiber experimental network. –UCSD, UCI, and USC/ISI Connections of Optiputer Nodes is expected to be made through CENIC CalREN HPR 1 gigabit/s circuits. This will form a southern California OptIPuter Test bed. At UCSD, connection from the Campus Border Router will be made to the existing Chiaro OptIPuter Router. UIC will work with NU and UCSD to measure if a 4:1 local bisection bandwidth can be achieved –UCSD will deploy a test bed structure where best bisection bandwidth is 5:1 and the worst is 8:1. Monitoring will be put in place to measure when these links are saturated UCSD will continue to work with its campus and CENIC to connect all OptIPuter sites in southern California. UIC will continue to build up the facilities at StarLight to accept several new 10Gb links, notably from Asia, the UK and CERN. –See above

12 SoCal Infrastructure IPv6: Tom Hutton (UCSD/SDSC) is being given one month of funding to develop an IPv6 OptIPuter testbed –IPv6 software will be made a standard part of the base OptIPuter software stack by the end of year 2. IPv^ experiments will include moving data from electron microscopes at NCMIR to optiputer Resources over IPv6 and measuring performance. Extend the Chiaro network to SDSU, UCI and USC Work towards 10GigE integration (whether point-to-point, switch, or routed, needs to be determined). –The 48 Node Storage Cluster will employ a 48-port gigabit switch with a single 10 GigE Uplink to the Chiaro Enstara Router. This will achieve a 5:1 external (bisection) for this particular OptIPuter node.

13 SoCal Infrastructure UCSD (Papadopoulos) is developing a storewidth evaluation platform. Some clusters will have 16 nodes with 4-6 drives each. UCSD is planning on 25% of cluster platforms to be high-bandwidth storage nodes. –Refinement: In year 2, we won an IBM SURS Grant which allowed us to define a larger storewidth evaluation platform than originally proposed. This cluster, deployed in January 2004 is 48 nodes with 6 spindles per node. This enables middleware and applications to understand how lambdagrids enable applications. UCSD’s (Papadopoulos) dynamic credit scheme design will be dependent on the availability of InfiniBand hardware –As part of the SURS grant a small IB test fabric was purchased. The Topspin switch includes 4 gigE uplinks to allow investigation of IB to Ethernet Communication.. Working with Alan Benner at IBM, initial investigations into the integration of IB and GigE connected nodes will be explored. UCSD will complete deployment of a 150+ node distributed testbed that consists of compute, storage, and visualization endpoints. Management software will be put in place so that experiments can be assigned physical hardware and then have specialized (experiment-specific) software loaded on assigned nodes. This forms the Core Southern California Optiputer Testbed –Account creation, cataloging of resources, MRTG-based network monitoring, ganglia resource monitoring are already available as a web page at the web.optiputer.net

14 SoCal Infrastructure Cluster Hardware Complete installation of a visualization cluster at UCSD/SIO Purchase and install four general-purpose IA-32 clusters at key campus sites (compute and storage); clusters will all have PCI-X (100 MHz, 1Gbps). Note: Use of InfiniBand is dependent on emerging company positions/products and on result of IBM equipment proposal. Expand an IA-32 system to 16- or 32-way. Connect one or two campus sites with 32-node clusters at 10GigE (3:1 campus bisection ratio) Integrate the SIO/SDSU Panoram systems into the OptIPuter network Install and configure a variety of OptIPuter management workstations (work has already begun): Dell IA32 and HP IA64 workstations, integrated with the Chiaro network, and equipped with National Middleware Initiative (NMI) software capabilities, OptIPuter accounts, shared storage, revision control and other general support services

15 SoCal Infrastructure Other Areas Where OptIPuter Platform needs to support research activities Software Architecture Activities –USC will begin designing Lambda Overlay Protocols and the interface between the cluster communication fabric and the wide-area optical network. Design of a transparent (MAN/WAN) lambda  OS-bypass (local cluster communication) messaging transport framework will be undertaken. –UCSD will collaborate with UCI to begin exploring security models using packet-level redundancy and multi-pathing –The first-generation OptIPuter System Software will be released to partner sites for application development and evaluation. –UCI will begin design of the Time-Triggered Message-Triggered Object (TMO) programming framework for the OptIPuter –TAMU will include network protocols and middleware in its performance analysis work. –USC (Bannister, et. al.) will document XCP’s design, develop a protocol specification, and publish XCP header format and other protocol parameters as Internet Drafts. –Work on Quanta/RBUDP and SABUL will continue.

16 SoCal Infrastructure System Architecture (Chien) Prototype Distributed Virtual Computer alternatives on OptIPuter prototypes Prototype novel Group Transport Protocols (GTP) and demonstrate on OptIPuter prototypes Do detailed studies of resource management in OptIPuter systems using the MicroGrid.

17 SoCal Infrastructure Data Visualization Activities UIC, UCSD and SDSU will deploy data management, visualization and collaboration software systems developed in Year 1 to OptIPuter application teams, both for testing and for feedback. UIC will continue to enhance computing and visualization cluster hardware/software to achieve very-large-field real-time volume rendering. UIC, UCSD and UCI will integrate the data middleware and high-level data services into an OptIPuter application and begin using the data middleware to develop data-mining algorithms for clustering data streaming over the OptIPuter. In addition, high-level software abstractions for remote data retrieval and for accessing bundled computing resources as high-level computational units or peripherals will be studied. Multi-site synchronous and near-synchronous visualization will be demonstrated.

18 SoCal Infrastructure From the Site Review Response Focused questions that we said we would address in Year 2 –How to we control lambdas and how do protocols influence their utility? –How is a LambdaGrid different from a Grid in terms of middleware? –How can lambdas enhance collaboration? –How are applications quantitatively helped by LambdaGrids? What we said would happen at the AHM –Distribute research focus topics as soon as NSF approves the Year 2 Program Plan –Have individuals refine year 2 deliverables (described in the Cooperative Agreement and PPP) –Have individuals send refined deliverables to Team Leaders by the end of December –Have Team Leaders and their teams review and summarize at the AHM –Finalize at AHM and submit to NSF shortly thereafter

19 SoCal Infrastructure Goals for this Group Experiments to be run – many overlaps with other teams Construction and use plans of the SD OptIPuter Infrastructure Software Integration activities –Program Plan says we will make SW available to all OptIPuter participants Details and Proposed Timelines –Exact dates not essential –Dependency ordering of activities IS essential –Who? –For Centralized infrastructure, we need to know who to help. –6 and 12 and 18 Month plans are useful Shortened Timeframe –2005 Program plan due end of June 2004, so 5 months is key horizon date. –Site Review/Program Plan Review is likely in July. Give Time for vacations in Aug and Sept

20 SoCal Infrastructure Group Comments Questions of other groups –What needs to be “persistent” software capability, even when researchers are changing kernels, middleware and other software pieces. –How are people going to use storage –How to get data in and out of OptIPuter –Serving out blocks, file systems –Federated Storage (not in 2004) –48 node (300 spindle) cluster of storage is available Root access to nodes needed by experiments (expected) What about security constrictions (firewalls) and high-performance –? For Opti security researchers –Line rate, near line rate. How much variety in performance of endpoints ? –Are basic benchmarks useful? If so, can we get basic benchmarks to run –Listing of capabilities of nodes: Storage, memory, speed, connectivity What sort of performance monitoring do groups need. Experiment priority -

21 SoCal Infrastructure Experiments Replica management of very large data sets (NASA requirements) –100 Million Grid Points – Terabyte-sized chunks. How to really move this –How to compute in one place and store on a different optiputer node


Download ppt "SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program."

Similar presentations


Ads by Google