National Computational Science The NSF TeraGrid: A Pre-Production Update 2 nd Large Scale Cluster Computing Workshop FNAL 21 Oct 2002 Rémy Evard,

Slides:



Advertisements
Similar presentations
Electronic Visualization Laboratory University of Illinois at Chicago EVL Optical Networking Research Oliver Yu Electronic Visualization Laboratory University.
Advertisements

I-WIRE Background State Funded Infrastructure to support Networking and Applications Research $6.5M Total Funding $4M FY00-01 (in hand) $2.5M FY02 (approved.
Grids and Biology: A Natural and Happy Pairing Rick Stevens Director, Mathematics and Computer Science Division Argonne National Laboratory Professor,
University of Illinois at Chicago The Future of STAR TAP: Enabling e-Science Research Thomas A. DeFanti Principal Investigator, STAR TAP Director, Electronic.
Distributed Data Processing
WestGrid Collaboration and Visualization Brian Corrie Collaboration and Visualization Coordinator WestGrid/SFU.
1 US activities and strategy :NSF Ron Perrott. 2 TeraGrid An instrument that delivers high-end IT resources/services –a computational facility – over.
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
High Performance Computing Course Notes Grid Computing.
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
ANL NCSA PICTURE 1 Caltech SDSC PSC 128 2p Power4 500 TB Fibre Channel SAN 256 4p Itanium2 / Myrinet 96 GeForce4 Graphics Pipes 96 2p Madison + 96 P4 Myrinet.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
The Interplay of Funding Policy for infrastructure at NSF Richard S. Hirsh.
Knowledge Environments for Science and Engineering: Current Technical Developments James French, Information and Intelligent Systems Division, Computer.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
Simo Niskala Teemu Pasanen
Network, Operations and Security Area Tony Rimovsky NOS Area Director
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Research & Academic Computing Bradley C. Wheeler Associate Vice President & Dean.
TeraGrid and I-WIRE: Models for the Future? Rick Stevens and Charlie Catlett Argonne National Laboratory The University of Chicago.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
GigaPoP Transport Options: I-WIRE Positioning for the Bandwidth Tsunami Virtual Internet2 Member Meeting Oct 4, 2001 Linda Winkler Argonne National Laboratory.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Delivering Circuit Services to Researchers: The HOPI Testbed Rick Summerhill Director, Network Research, Architecture, and Technologies, Internet2 Joint.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Copyright 2004 National LambdaRail, Inc N ational L ambda R ail Update 9/28/2004 Debbie Montano Director, Development & Operations
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director,
March 17, 2006CIP Status Meeting March 17, 2006 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Project Report at CIP AG Meeting.
Russ Hobby Program Manager Internet2 Cyberinfrastructure Architect UC Davis.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
Authors: Ronnie Julio Cole David
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Capability Computing – High-End Resources Wayne Pfeiffer Deputy Director NPACI & SDSC NPACI.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
1 Recommendations Now that 40 GbE has been adopted as part of the 802.3ba Task Force, there is a need to consider inter-switch links applications at 40.
University of Illinois at Chicago StarLight: Applications-Oriented Optical Wavelength Switching for the Global Grid at STAR TAP Tom DeFanti, Maxine Brown.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
Sergiu April 2006June 2006 Overview of TeraGrid Resources and Services Sergiu Sanielevici, TeraGrid Area Director for User.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
December 10, 2003Slide 1 International Networking and Cyberinfrastructure Douglas Gatchell Program Director International Networking National Science Foundation,
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Lawrence H. Landweber National Science Foundation SC2003 November 20, 2003
Network, Operations and Security Area Tony Rimovsky NOS Area Director
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
NSF Middleware Initiative Purpose To design, develop, deploy and support a set of reusable, expandable set of middleware functions and services that benefit.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Southern California Infrastructure Philip Papadopoulos Greg Hidley.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Run - II Networks Run-II Computing Review 9/13/04 Phil DeMar Networks Section Head.
Charlie Catlett UIUC/NCSA Starlight International Optical Network Hub (NU-Chicago) Argonne National Laboratory U Chicago IIT UIC.
Clouds , Grids and Clusters
Joint Techs, Columbus, OH
Patrick Dreher Research Scientist & Associate Director
Global Grid Forum (GGF) Orientation
Presentation transcript:

National Computational Science The NSF TeraGrid: A Pre-Production Update 2 nd Large Scale Cluster Computing Workshop FNAL 21 Oct 2002 Rémy Evard, TeraGrid Site Lead Argonne National Laboratory

A Pre-Production Introspective Overview of The TeraGrid –For more information: – –See particularly the “TeraGrid primer”. –Funded by the National Science Foundation –Participants: –NCSA –SDSC –ANL –Caltech –PSC, starting in October 2002 Grid Project Pondering –Issues encountered while trying to build a complex, production grid.

Motivation for TeraGrid The Changing Face of Science –Technology Drivers –Discipline Drivers –Need for Distributed Infrastructure The NSF’s Cyberinfrastructure –“provide an integrated, high-end system of computing, data facilities, connectivity, software, services, and sensors that …” –“enables all scientists and engineers to work on advanced research problems that would not otherwise be solvable” –Peter Freeman, NSF Thus the Terascale program A key point for this workshop: –TeraGrid is meant to be an infrastructure supporting many scientific disciplines and applications.

Historical Context Terascale funding arrived in FY00 Three competitions so far: –FY00 – Terascale Computing System –Funded PSC’s EV68 6TF Alpha Cluster –FY01 – Distributed Terascale Facility (DTF) –Initial TeraGrid Project –FY02 – Extensible Terascale Facility (ETF) –Expansion of the TeraGrid An additional competition is now underway for community participation in ETF

ANL: VisualizationCaltech: Data collection analysis SDSC: Data Intensive IA64 IA32 IA64 4 TF IA TB Storage Sun Server Visualization nodes.1 TF IA TB Storage 0.4 TF IA-64 IA-32 Datawulf 80 TB Storage LA Switch/Router Chicago Switch/Router IA32 Storage Server Disk Storage Cluster Visualization Cluster LEGEND Distributed Terascale Facility (DTF) TeraGrid IA64 40 Gb/s 30 Gb/s NCSA: Compute Intensive IA64 6 TF IA TB Storage 30 Gb/s Focus on Homogeneity: Linux, Globus Toolkit, Itanium2 TeraGrid Lite up now Network up Dec 02 DTF production mid-03

Extensible TeraGrid Facility NCSA: Compute-IntensiveSDSC: Data-IntensivePSC: Compute-Intensive IA64 Pwr4 EV68 IA32 EV7 IA64 Sun 10 TF IA large memory nodes 230 TB Storage 4 TF IA-64 DB2, Oracle Servers 500 TB Disk Storage 6 PB Tape Storage 1.1 TF Power4 6 TF EV68 71 TB Storage 0.3 TF EV7 shared-memory 150 TB Storage Server.5 TF IA Visualization nodes 20 TB Storage 0.4 TF IA-64 IA32 Datawulf 80 TB Storage Extensible Backplane Network LA Hub Chicago Hub IA32 Storage Server Disk Storage Cluster Shared Memory Visualization Cluster LEGEND 30 Gb/s IA64 30 Gb/s Sun ANL: VisualizationCaltech: Data collection analysis 40 Gb/s Backplane Router PSC integrated Q3 03

Extensible TeraGrid Facility NCSA: Compute-IntensiveSDSC: Data-IntensivePSC: Compute-Intensive IA64 Pwr4 EV68 IA32 EV7 IA64 Sun 10 TF IA large memory nodes 230 TB Storage 4 TF IA-64 DB2, Oracle Servers 500 TB Disk Storage 6 PB Tape Storage 1.1 TF Power4 6 TF EV68 71 TB Storage 0.3 TF EV7 shared-memory 150 TB Storage Server.5 TF IA Visualization nodes 20 TB Storage 0.4 TF IA-64 IA32 Datawulf 80 TB Storage Extensible Backplane Network LA Hub Chicago Hub IA32 Storage Server Disk Storage Cluster Shared Memory Visualization Cluster LEGEND 30 Gb/s IA64 30 Gb/s Sun ANL: VisualizationCaltech: Data collection analysis 40 Gb/s Backplane Router PSC integrated Q Dual PIVs 64 Dual Madison 32 Dual PIVs 66 Dual Itanium2 32 Dual PIVs 256 Quad Itanium2 128 Dual Power4 256 Dual Itanium2 700 Dual Madison 128 EV7 750 Quad EV68

Argonne ETF Cluster Schematic 8 STARLIGHT - GbE Existing Facilities Gigabit Enet 10 Gigabit Enet Myrinet Fibre Channel Direct Video Link 3 x 10 GbE Myrinet 2000 CLOS switch 2 storage nodes Display Walls AG Nodes CAVE xIBM FASTt TB FC Storage login nodes 64 compute nodes.5 TF IA-64 4 mgmt nodes 96 DTF Backplane Connection Cisco GSR MREN – OC-12 HSCC – OC-48 ESNet – OC-12 Linux Cluster 574p IA-32 HPSS and ADSM 120 TB archive per 96 viz nodes Pentium4.9 TF GeForce4 graphics 64 Linux Cluster 350p Pentium4 8 Juniper Border Router Cisco 6509 Charlie Catlett

TeraGrid Objectives Create significant enhancement in capability –Beyond capacity, provide basis for exploring new application capabilities Deploy a balanced, distributed system –Not a “distributed computer” but rather –A distributed “system” using Grid technologies –Computing and data management –Visualization and scientific application analysis Define an open and extensible infrastructure –An “enabling cyberinfrastructure” for scientific research –Extensible beyond the original four sites

Where We Are Early access to McKinley At Intel Jan ‘01 DTF Proposal Submitted To NSF Jan ‘02Jan ‘03 Early McKinleys at TG sites for Testing/benchmarking TeraGrid clusters installed TeraGrid Operations Center Prototype Day Ops Production Basic Grid svcs Linux clusters SDSC SP NCSA O2K Core Grid services deployment Initial apps on McKinley TeraGrid Networking Deployment TeraGrid prototype at SC2001, 60 Itanium Nodes, 10Gbs network IA-64 systems TeraGrid Prototypes Grid Services on Current Systems Networking Operations 10Gigabit Enet testing “TeraGrid Lite” systems and Grids testbed Advanced Grid services testing Applications Clusters upgraded to Madison DTF TeraGrid in production ETF Approved Apps deployed at SC2001 on prototype. Early apps being tested on TG-Lite.

Challenges and Issues Technology and Infrastructure –Networking –Computing and Grids –Others (not covered in this talk): –Data –Visualization –Operation –… Social Dynamics To Be Clear… –While the following slides discuss problems and issues in the spirit of this workshop, the TG project is making appropriate progress and is on target for achieving milestones.

Networking Goals Support high bandwidth between sites –Remote access to large data stores –Large data transfers –Inter-cluster communication Support extensibility to N sites – 4 <= N <= 20 (?) Operate in production, but support network experiments. Isolate the clusters from network faults and vice versa.

NSFNET 56 Kb/s Site Architecture VAX Fuzzball 1024MB 4 MB/s1 MB/s.007 MB/s 256 s (4 min) 1024 s (17 min)150,000 s (41 hrs) Across the roomAcross the country Bandwidth in terms of burst data transfer and user wait time.

OC-48 Cloud (2.4Gb/s) 0.5 GB/s78 MB/s 2000 s (33 min) 13k s (3.6h) 2002 Cluster-WAN Architecture 1 TB n x GbE (small n) OC-12 (622Mb/s) Across the roomAcross the country

Interconnect To Build a Distributed Terascale Cluster… Big Fast Interconnect (OC-800ish) 10 TB 5 GB/s 40 Gb/s 2000 s (33 min) 10 TB n x GbE (40 <= n <= 200)

TeraGrid Interconnect: Qwest Partnership 1 2 Caltech/JPL SDSC/UCSD NCSA/UIUC ANL Physical # denotes λ count Light Paths (Logical) La Jolla San Diego LA Pasadena Chicago Urbana Argonne Phase 0 (June 2002) Phase 1 (November 2002) Original: Lambda Mesh Extensible: Central Hubs

Extensible TeraGrid Facility NCSA: Compute-IntensiveSDSC: Data-IntensivePSC: Compute-Intensive IA64 Pwr4 EV68 IA32 EV7 IA64 Sun 10 TF IA large memory nodes 230 TB Storage 4 TF IA-64 DB2, Oracle Servers 500 TB Disk Storage 6 PB Tape Storage 1.1 TF Power4 6 TF EV68 71 TB Storage 0.3 TF EV7 shared-memory 150 TB Storage Server.5 TF IA Visualization nodes 20 TB Storage 0.4 TF IA-64 IA32 Datawulf 80 TB Storage Extensible Backplane Network LA Hub Chicago Hub IA32 Storage Server Disk Storage Cluster Shared Memory Visualization Cluster LEGEND 30 Gb/s IA64 30 Gb/s Sun ANL: VisualizationCaltech: Data collection analysis 40 Gb/s Backplane Router PSC integrated Q3 03

Teragrid Logical Network Diagram Los Angeles 2200mi 140mi25mi 115mi15mi Caltech Cluster SDSC Cluster NCSA Cluster ANL Cluster Starlight Chicago I-WIRE

UI-Chicago Illinois Inst. Tech Argonne Nat’l Lab (approx 25 miles SW) Northwestern Univ-Chicago “Starlight” U of Chicago I-55 Dan Ryan Expwy (I-90/94) I-290 I-294 UIUC/NCSA Urbana (approx 140 miles South) ANL IITUIC UChicago Main UIUC/NCSA Starlight / NU-C ICN UChicago Gleacher Ctr Commercial Fiber Hub I-WIRE Geography Status: Done: ANL, NCSA, Starlight Laterals in process: UC, UIC, IIT Investigating extensions to Northwestern Evanston, Fermi, O’Hare, Northern Illinois Univ, DePaul, etc.

UIUC/NCSA Starlight (NU-Chicago) Argonne UChicago IIT UIC State/City Complex James R. Thompson Ctr City Hall State of IL Bldg Level(3) 111 N. Canal McLeodUSA 151/155 N. Michigan Doral Plaza Qwest 455 N. Cityfront UC Gleacher Ctr 450 N. Cityfront State of Illinois I-WIRE I-WIRE timeline –1994: Governor interest –schools and networks –1997: task force formed –1999: I-WIRE funding approved –2002: fiber in place and operational Features –fiber providers –Qwest, Level(3) –McLeodUSA, 360Networks –10 segments –190 route miles and 816 fiber miles –longest segment is 140 miles –4 strands minimum to each site Numbers indicate fiber count (strands) NU-Evanston FNAL

UIUC/NCSA Starlight (NU-Chicago) Argonne UChicago IIT UIC State/City Complex James R. Thompson Ctr City Hall State of IL Bldg UC Gleacher Ctr 450 N. Cityfront I-Wire Transport Each of these three ONI DWDM systems have capacity of up to 66 channels, up to 10 Gb/s per channel Protection available in Metro Ring on a per-site basis TeraGrid Linear 3x OC192 1x OC48 First light: 6/02 Metro Ring 1x OC48 per site First light: 8/02 Starlight Linear 4x OC192 4x OC48 (  8x GbE) Operational Qwest 455 N. Cityfront

Network Policy Decisions The TG backplane is a closed network, internal to the TG sites. –Open question: what is a TG site? The TG network gear is run by the TG network team. –I.e. not as individual site resources.

Network Challenges Basic Design and Architecture –We think we’ve got this right. Construction –Proceeding well. Operation –We’ll see.

Computing and Grid Challenges Hardware configuration and purchase –I’m still not 100% sure what we’ll be installing. –The proposal was written in early –The hardware is being installed in late –The IA-64 line of processors is young. –Several vendors, all defining new products, are involved. –Recommendations: –Try to avoid this kind of long-wait, multi vendor situation. –Have frequent communication with all vendors about schedule, expectations, configurations, etc.

Computing and Grid Challenges Understanding application requirements and getting people started before the hardware arrives. Approach: TG-Lite –a small PIII testbed –4 nodes at each site –Internet/Abilene connectivity –For early users and sysadmins to test configurations.

Computing and Grid Challenges Multiple sites, one environment: –Sites desire different configurations. –Distributed administration. –Need a coherent environment for applications. –Ideal: binary compatibility Approach: service definitions.

NSF TeraGrid: 14 TFLOPS, 750 TB HPSS 574p IA-32 Chiba City HR Display & VR Facilities Myrinet 1176p IBM SP Blue Horizon Sun E10K 1500p Origin UniTree 1024p IA p IA-64 HPSS 256p HP X-Class 128p HP V p IA-32 NCSA: Compute-Intensive ANL: Visualization Caltech: Data collection analysis SDSC: Data-Intensive

Defining and Adopting Standard Services IA-64 Linux TeraGrid Cluster Runtime File-based Data Service Collection-based Data Service Volume-Render Service Interactive Collection-Analysis Service IA-64 Linux Cluster Interactive Development Finite set of TeraGrid services- applications see standard services rather than particular implementations… …but sites also provide additional services that can be discovered and exploited.

Strategy: Define Standard Services Finite Number of TeraGrid Services –Defined as specifications, protocols, API’s –Separate from implementation (magic software optional) Extending TeraGrid –Adoption of TeraGrid specifications, protocols, API’s –What protocols does it speak, what data formats are expected, what features can I expect (how does it behave) –Service Level Agreements (SLA) –Extension and expansion via: –Additional services not initially defined in TeraGrid –e.g. Alpha Cluster Runtime service –Additional instantiations of TeraGrid services –e.g. IA-64 runtime service implemented on cluster at a new site Example: File-based Data Service –API/Protocol: Supports FTP and GridFTP, GSI authentication –SLA –All TeraGrid users have access to N TB storage –available 24/7 with M % availability –>= R Gb/s read, >= W Gb/s write performance

Standards  Cyberinfrastructure Runtime File-based Data Service Collection-based Data Service Visualization Services Interactive Collection-Analysis ServiceInteractive Development IA-64 Linux Clusters Alpha Clusters IA-32 Linux Clusters Visualization Services Data/InformationComputeAnalysis Relational dBase Data Service TeraGrid Certificate Authority Certificate Authority Grid Info Svces If done openly and well… other IA-64 cluster sites would adopt TeraGrid service specifications, increasing users’ leverage in writing to the specification others would adopt the framework for developing similar services on different architectures

Computing and Grid Challenges Architecture –Individual clusters architectures are fairly solid. –Aggregate architecture is a bigger question. –Being defined in terms of services. Construction and Deployment –We’ll see, starting in December. Operation –We’ll see. Production by June 2003.

Social Issues: Direction 4 sites tend to have 4 directions. –NCSA and SDSC have been competitors for over a decade. –This has created surprising cultural barriers that must be recognized and overcome. –Including PSC, a 3 rd historical competitor, will complicate this. –ANL and Caltech are smaller sites with fewer resources but specific expertise. And opinions.

Social Issues: Organization Organization is a big deal. –Equal/fair participation among sites. –To the extreme credit of the large sites, this project has been approached as 4 peers, not 2 tiers. This has been extremely beneficial. –Project directions and decisions affect all sites. –How best to distribute responsibilities but make coordinated decisions? –Changing the org chart is a heavyweight operation, best to be avoided…

The ETF Organizational Chart Site Coordination Committee Site Leads Project Director Rick Stevens (UC/ANL) Technical Coordination Committee Project-wide Technical Area Leads Clusters Pennington (NCSA) Networking Winkler (ANL) Grid Software Kesselman (ISI) Butler (NCSA) Data Baru (SDSC) Applications WIlliams (Caltech) Visualization Papka (ANL) Performance Eval Brunett (Caltech) Chief Architect Dan Reed (NCSA) Executive Committee Fran Berman, SDSC (Chair) Ian Foster, UC/ANL Michael Levine, PSC Paul Messina, CIT Dan Reed, NCSA Ralph Roskies, PSC Rick Stevens, UC/ANL Charlie Catlett, ANL Technical Working Group Are we creating an extensible cyberinfrastructure? External Advisory Committee Are we enabling new science? Are we pioneering the future? External Advisory Committee Are we enabling new science? Are we pioneering the future? Institutional Oversight Committee TBD, UCSD Richard Herman, UIUC Mark Kamlet, CMU Dan Meiron, CIT (Chair) Robert Zimmer, UC/ANL Institutional Oversight Committee TBD, UCSD Richard Herman, UIUC Mark Kamlet, CMU Dan Meiron, CIT (Chair) Robert Zimmer, UC/ANL User Advisory Committee Are we effectively supporting good science? NSF MRE Projects Internet-2 McRobbie Alliance UAC Sugar, Chair NPACI UAC Kupperman, Chair NSF ACIR NSF Technical Advisory Committee Policy Oversight Objectives/Eval Architecture Currently being formed Executive Director / Project Manager Charlie Catlett (UC/ANL) ANL Evard CIT Bartelt NCSA Pennington SDSC Andrews PSC Scott Operations Sherwin (SDSC) User Services Wilkins-Diehr (SDSC) Towns (NCSA) Implementation NSF Program Advisory Team PSC UAG

Social Issues: Working Groups Mixed effectiveness of working groups –The networking working group has turned into a team. –The cluster working group is less cohesive. –Others range from teams to just lists. Why? –Not personality issues, not organizational issues. What makes the networking group tick: –Networking people already work together: –The individuals have a history of working together on other projects. –They see each other at other events. –They’re expected to travel. –They held meetings to decide how to build the network before the proposal was completed. –The infrastructure is better understood: –Networks somewhat like this have been built before. –They are building one network, not four clusters. –There is no separation between design, administration, and operation. Lessons: –Leverage past collaborations that worked. –Clearly define goals and responsibilities.

Social Issues: Observations There will nearly always be four opinions on every issue. –Reaching a common viewpoint takes a lot of communication. –Not every issue can actually be resolved. –Making project-wide decisions can be tough. Thus far in the project, the social issues have been just as complex as the technical. –… but the technology is just starting to arrive… It’s possible we should have focused more on this in the early proposal stage, or allocated more resources to helping with these. –We have, just appointed a new “Director of Engineering” to help guide technical decisions and maintain coherency.

Conclusion Challenges abound! Early ones include: –Network design and deployment. –Cluster design and deployment. –Building the right distributed system architecture into the grid. –Getting along and having fun. –Expansion. The hardware arrives in December, production is in mid Check back in a year to see how things are going…