Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out.

Similar presentations


Presentation on theme: "Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out."— Presentation transcript:

1 Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing

2 2 10 Years of DOE Collaboratories Research l Distributed Collaboratory Experimental Environments (1995-1997) u Focus on interpersonal collaboration l DOE2000 National Collaboratories (1997-2000) u Somewhat broader focus l Next Generation Internet program (1998) u A broad set of application & technology projects l SciDAC Collaboratory program (2000-2005) u Technology R&D u PPDG, ESG, CMCS, Fusion; DOE Science Grid See: http://www.doecollaboratory.org/history.html

3 3 It’s Amazing How Much We Have Achieved in 10 Years l Applications u Production services: Grid3, ESG, Fusion, CMCS; also NEESgrid, many others that use DOE tech) l Infrastructure u Broadly deployed PKI and single sign on u Access Grid at 300+ institutions worldwide l Leadership and technology u Grid concepts & software used worldwide u Global Grid Forum: standards & community u GridFTP: California  Illinois at 27 Gbit/s l Multicast almost works

4 4 There’s Still Much to Do: Where We Should Be vs. Where We Are l Goal: Any DOE scientist can access any DOE computer, software, data, instrument u ~25,000 scientists* (vs. ~1000 DOE certs) u ~1000 instruments** (vs. maybe 10 online?) u ~1000 scientific applns** (vs. 2 Fusion services) u ~10 PB of interesting data** (vs. 100TB on ESG) u ~100,000 computers* (vs. ~3000 on Grid3) l Not to mention many external partners I.e., we need to scale by 2-3 orders of magnitude to have DOE-wide impact! * Rough estimate; ** WAG

5 5 “25,000 Scientists”: The Many Aspects of Scaling l Data & computational services integrated into the fabric of science communities u Used not by a handful but by 1000s u Part of everyday science workflows  Scale load on services by factors of 100+ u 100,000 requests annually to fusion codes u 1000 concurrent users for ESG services u 25,000 users to authenticate & authorize l Manageability as a key new challenge u Resource management and provisioning u Automation of management functions

6 6 “25,000 Scientists”: Authentication & Authorization l User-managed PKI credentials l Single sign on & delegation (GSI) l DOEGrids CA: 1250 users l MyProxy & related tools l WS-Security & SAML-based authentication/authorization

7 7 Authentication & Authorization: Next Steps l Integration with campus infrastructures u “Authenticate locally, act globally” u E.g., KX509, GridLogon, GridShib, etc. l Enabling access while enhancing security u Create secure virtual national laboratories u Tech & policy solns to risk/benefit tradeoffs l Evolving what we mean by “trust” u Colleagues  collaboration  community l Scaling to the ultrascale u Data volumes, data rates, transaction rates

8 8 “1000 Instruments”: The Scale of the Problem Lawrence Berkeley National Lab Advanced Light Source National Center for Electron Microscopy National Energy Research Scientific Computing Facility Los Alamos Neutron Science Center Univ. of IL Electron Microscopy Center for Materials Research Center for Microanalysis of Materials MIT Bates Accelerator Center Plasma Science & Fusion Center SC User Facilities Institutions that Use SC Facilities Fermi National Accelerator Lab Tevatron Stanford Linear Accelerator Center B-Factory Stanford Synchrotron Radiation Laboratory Princeton Plasma Physics Lab General Atomics - DIII-D Tokamak SC Laboratories Pacific Northwest National Lab Environmental Molecular Sciences Lab Argonne National Lab Intense Pulsed Neutron Source Advanced Photon Source Argonne Tandem Linac Accelerator System Brookhaven National Lab Relativistic Heavy Ion Collider National Synchrotron Light Source Oak Ridge National Lab High-Flux Isotope Reactor Surface Modification & Characterization Center Spallation Neutron Source (under construction) Thomas Jefferson National Accelerator Facility Continuous Electron Beam Accelerator Facility Physics Accelerators Synchrotron Light Sources Neutron Sources Special Purpose Facilities Large Fusion Experiments Sandia Combustion Research Facility James R. MacDonald Laboratory

9 9 For Example: NSF Network for Earthquake Engineering Simulation Links instruments, data, computers, people

10 10 Resources implement standard access & management interfaces Collective services aggregate &/or virtualize resources Users work with client applications Application services organize VOs & enable access to other services NEESgrid: How it Really Happens (A Simplified View) Web Browser Compute Server Globus MCS/RLS Data Viewer Tool Certificate Authority CHEF Chat Teamlet MyProxy CHEF Compute Server Database service Database service Database service Simulation Tool Camera Telepresence Monitor Globus Index Service Globus GRAM OGSA DAI Application Developer 2 Off the Shelf 9 Globus Toolkit 5 Grid Community 3

11 11 Scaling to 1000 Instruments: Challenges l Common teleoperation control interfaces u NEESgrid Network Telecontrol Protocol (NTCP) provides service-oriented interface: nice start? l Major social & organizational challenges u Operating instruments as shared facilities u Data sharing policies and mechanisms l Basic technological challenges also u Provisioning/QoS for multi-modal experiments u Hierarchical/latency tolerant control algorithms u Reliability, health, and safety

12 12 “1000 Applications”: Software as Service l Software is increasingly central to almost every aspect of DOE science l Service interfaces are needed for broad adoption: “shrink wrap” isn’t the answer TransP production service: 1662 runs in FY03

13 13 Software as Service: What If You Have 1000s of Users? l Service-oriented applications u Wrapping applications as Web services u Composing applications into workflows l Service-oriented infrastructure u Provisioning physical resources to support application workloads Appln Service Users Workflows Composition Invocation Provisioning

14 14 “10 PB Data”: Distributed Data Integration l Major challenges in four dimensions u Number & distribution of data sources u Volume of data u Diversity in data format, quality, semantics u Sophistication & scale of data analysis Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts

15 15 Distributed Data Integration: Examples of Where We Are Today Earth System Grid: O(100TB) online data STAR: 5 TB transfer (SRM, GridFTP) NASA/NVO: Mosaics from multiple sources Bertram Ludäscher’s examples

16 16 Distributed Data Integration: Enabling Automated Analysis l Data ingest l Managing many petabytes l Common schema and ontologies l How to organize petabytes? Reorganize it? l Interactive & batch analysis performance l Universal annotation infrastructure l Query, analysis, visualization tools

17 17 Tomorrow? “100,000 Computers”: A Healthy Computing Pyramid Supercomputer Cluster Desktop Today Supercomputers USE SPARINGLY Desktop 100,000 SERVINGS Specialized computers 2-3 SERVINGS Clusters 100s of SERVINGS

18 18 Grid2003: An Operational Grid  28 sites (2100-2800 CPUs) & growing  400-1300 concurrent jobs  8 substantial applications + CS experiments  Running since October 2003 Korea http://www.ivdgl.org/grid2003

19 19 Example Grid2003 Workflows Genome sequence analysis Physics data analysis Sloan digital sky survey

20 20 Example Grid2003 Application: NVO Mosaic Construction NVO/NASA Montage: A small (1200 node) workflow Construct custom mosaics on demand from multiple data sources User specifies projection, coordinates, size, rotation, spatial sampling Work by Ewa Deelman et al., USC/ISI and Caltech

21 21 Invocation Provenance Completion status and resource usage Attributes of executable transformation Attributes of input and output files

22 22 “100,000 Computers”: Future Challenges l New modes of working that are driven by (& drive) massive increases in computing u Enabling massive data analysis, simulation- driven problem solving, application services u These make massively parallel computing essential, not an academic curiousity l More pesky security & policy challenges l Technological challenges u Reliability, performance, usability as infrastructure, workflows, data volumes, user community scale by 2+ orders of magnitude u Manageability again

23 23 Cross-cutting Challenges l Institutionalize infrastructure u Broad deployment & support at sites u Software as infrastructure u Legitimate (& challenging) security concerns l Expand range of resource sharing modalities u Research aimed at federating not just data & computers, but workflow and semantics u Scale data size, community sizes, etc., etc. l Reach new application domains u Sustain current collaboratory pilots, and start new ones of similar or greater ambition

24 24 Summary: It’s Amazing How Much We Have Achieved in 10 Years l Applications u Production services: Grid3, ESG, Fusion, CMCS; also NEESgrid, many others that use DOE tech) l Infrastructure u Broadly deployed PKI and single sign on u Access Grid at 300+ institutions worldwide l Leadership and technology u Grid concepts & software used worldwide u Global Grid Forum: standards & community u GridFTP: California  Illinois at 27 Gbit/s l Multicast almost works

25 25 But Over Those Same 10 Years: Dramatic Change l Exponential growth in network speed, data volume, computer speed, collaboration size u E.g., 155 Mb/s  10 Gb/s (ESnet backbone)  eScience methods no longer optional but now vital to scientific competitiveness l We’ve demonstrated feasibility of eScience, but we are far from DOE-wide adoption  We have moved forward, but we’ve also fallen behind

26 26 The $3.4B Question l Future science will be dominated by “eScience” l Europe is investing heavily in eScience u EU: ~$70M/yr for “Grid” infrastructure, tech u UK: ~$60M/yr for eScience apps and tech u German, Italian, Dutch, etc., programs l Asia Pacific is investing heavily in eScience u Japan, China, South Korea, Singapore, Australia all have programs  How does DOE stay competitive?

27 27 We Have Done Much, But Have Much More to Do l Any DOE scientist can access any DOE computer, software, data, instrument u ~25,000 scientists* (vs. ~1000 DOE certs) u ~1000 instruments** (vs. maybe 10 online?) u ~1000 scientific applns** (vs. 2 Fusion services) u ~10 PB of interesting data** (vs. 100TB on ESG) u ~100,000 computers* (vs. ~3000 on Grid3) l Not to mention many external partners We need to scale by 2-3 orders of magnitude to have DOE-wide impact * Rough estimate; ** WAG

28 28 Staff costs - Grid Resources Computers & Network funded separately EPSRC Breakdown UK e-Science Budget (2001-2006) Source: Science Budget 2003/4 – 2005/6, DTI(OST) Total: £213M + Industrial Contributions £25M + £100M via JISC


Download ppt "Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out."

Similar presentations


Ads by Google