Download presentation
Presentation is loading. Please wait.
Published byMuriel Alexander Modified over 9 years ago
1
Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation
2
2 Two Exciting Things That I Won’t Talk About l Globus Toolkit v4 (release: April 30, 2005) u Robustness, performance, usability, testing, documentation, standards compliance u E.g., GRAM supports 30,000 active jobs u 180+ people on alpha tester list u New functionality: data management, security, registry, OGSA-DAI, C hosting, etc. l Our work with DAGman, Condor-G, Condor u > 1 Million jobs (we estimate) run over the last year from many application domains u Mike Wilde’s talk (yesterday) gave details
3
3 Instead: Scaling eScience l Dimensions of scaling l Service-oriented science l Separating concerns: hosting eScience communities eScience [n]: Large-scale science carried out through distributed collaborations— often leveraging access to large-scale data & computing
4
4 Dimensions of Scaling: For Example, U.S. Dept of Energy Lawrence Berkeley National Lab Advanced Light Source National Center for Electron Microscopy National Energy Research Scientific Computing Facility Los Alamos Neutron Science Center Univ. of IL Electron Microscopy Center for Materials Research Center for Microanalysis of Materials MIT Bates Accelerator Center Plasma Science & Fusion Center SC User Facilities Institutions that Use SC Facilities Fermi National Accelerator Lab Tevatron Stanford Linear Accelerator Center B-Factory Stanford Synchrotron Radiation Laboratory Princeton Plasma Physics Lab General Atomics - DIII-D Tokamak SC Laboratories Pacific Northwest National Lab Environmental Molecular Sciences Lab Argonne National Lab Intense Pulsed Neutron Source Advanced Photon Source Argonne Tandem Linac Accelerator System Brookhaven National Lab Relativistic Heavy Ion Collider National Synchrotron Light Source Oak Ridge National Lab High-Flux Isotope Reactor Surface Modification & Characterization Center Spallation Neutron Source (under construction) Thomas Jefferson National Accelerator Facility Continuous Electron Beam Accelerator Facility Physics Accelerators Synchrotron Light Sources Neutron Sources Special Purpose Facilities Large Fusion Experiments Sandia Combustion Research Facility James R. MacDonald Laboratory
5
5 Dimensions of Scaling: E.g., U.S. Dept of Energy l Goal: Any DOE scientist can access any DOE computer, software, data, instrument u ~25,000 scientists* (vs. ~1000 DOE certs) u ~1000 instruments** (vs. maybe 10 online?) u ~1000 scientific applns** (vs. 2 Fusion services) u ~10 PB of interesting data** (vs. 100TB on ESG) u ~100,000 computers* (vs. ~3000 on OSG) l Not to mention many external partners I.e., we need to scale by 2-3 orders of magnitude to have DOE-wide impact! * Rough estimate; ** WAG
6
6 Scaling eScience l Dimensions of scaling l Service-oriented science l Separating concerns: hosting eScience communities eScience [n]: Large-scale science carried out through distributed collaborations— often leveraging access to large-scale data & computing
7
7 Scaling eScience: A Services Approach l Take the “Grid” moniker seriously u Not “discover, deploy, debug, monitor, resubmit, …” but “plug in and tune out” l For example u GriPhyN virtual data service dispatches analysis tasks to campus or national Grid u Campus CHARMM service dispatches large jobs to national resources u Online biology service serves thousands, uses national resources to preprocess data l I.e., eScience as “service”
8
8 For Example: BLASTing for Protein Knowledge Blasting complete NR DB for sequence similarity and function characterization Knowledge Base PUMA enables researchers to find information about a specific protein after having been analyzed against the complete set of sequenced genomes (NR file: ~ 2 million sequences) Analysis on the Grid The analysis of protein sequences occurs in the background in the grid environment. Millions of processes are started since several tools are run to analyze each sequence, such as finding protein similarities (BLAST), protein family domain searches (BLOCKS), and structural characteristics of the protein.
9
9 Provisioning l Service-oriented infrastructure u Provision physical resources to support application workloads Service-Oriented Science Requires Grid Technology l Service-oriented applications u Wrap applications as (Web) services u Compose applications into workflows Appln Service Users Workflows Composition Invocation
10
10 Grid Technology as Service-Oriented Infrastructure Uniform interfaces, security mechanisms, Web service transport, monitoring ComputersStorage Specialized resource User Application User Application User Application GRAMGridFTP Host Env User Svc DAIS Database Tool Reliable File Transfer MyProxy Host Env User Svc MDS- Index
11
11 Scaling eScience l Dimensions of scaling l Service-oriented science l Separating concerns: hosting eScience communities eScience [n]: Large-scale science carried out through distributed collaborations— often leveraging access to large-scale data & computing
12
12 Scaling eScience: A Range of Approaches l Cookie cutter u Standard h/w + s/w u E.g., BIRN, PlanetLab, NEES u Simple deployment, limited scalability l Service ecology u Standard interfaces, many service providers u E.g., NVO, bioinformatics u Powerful model, limited service capacity l General-purpose infrastructure u Standard resource provider interfaces u E.g., TeraGrid, OSG u Need to work out how to host services
13
13 Scaling eScience: Separating Concerns l Content u Stuff that a community cares about: data, metadata, software, analyses, instruments u Community responsibility l Middleware/function u Plumbing needed for community to function: membership, data mgmt, registry, workflow u Can often be provided by others l Resources u The physical devices required to support community content, function, computation u Need not be the concern of individual users!
14
14 Domain-independentDomain-dependent Content Function Resources Experimental apparatus Servers, storage, networks Metadata catalog Data archive Simulation server Certificate authority Simulation code Expt design Telepresence monitor Simulation code Expt output Electronic notebook Portal server Scaling eScience: Separating Concerns
15
15 Virtualizing Resources (K. Keahey et al.) l “Virtual workspace” as a core abstraction u Computer(s), network(s), configuration(s) l Multiple implementation technologies u Dynamic accounts (e.g., gLite deployment) u Virtual machines (current prototyping) l E.g., “OSG virtual cluster” u A collection of virtual machines running standard OSG software (Virtual Data Toolkit) u Instantiation by a resource provider makes it immediately accessible as an OSG cluster u Load (3 nodes): 1.3 sec; start: 0.7 sec
16
16 Summary l Q: How to scale eScience? l A1: Virtualization: eScience as service u AKA “science gateways” u Service-oriented infrastructure for management & provisioning l A2: Separation of concerns u Allow providers to host communities by providing resources & function u Virtual workspaces as an enabling technology
17
17 For More Information l Globus Alliance u www.globus.org l Globus Consortium u www.globusconsortium.com l Global Grid Forum u www.ggf.org l Open Science Grid u www.opensciencegrid.org l Background information u www.mcs.anl.gov/~foster 2nd Edition www.mkp.com/grid2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.