Ian Foster Computation Institute Argonne National Lab & University of Chicago Scaling eScience Impact.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

Open Grid Forum 19 January 31, 2007 Chapel Hill, NC Stephen Langella Ohio State University Grid Authentication and Authorization with.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Services for Science.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Service-Oriented Science: Scaling eScience Impact.
Service Oriented Grid Architecture Hui Li ICT in Business Colloquium, LIACS Mar 1 st, 2006 Note: Part of this presentation is based on Dr. Ian Foster’s.
Microsoft Research Faculty Summit Ian Foster Computation Institute University of Chicago & Argonne National Laboratory.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”
The Global Storage Grid Or, Managing Data for “Science 2.0” Ian Foster Computation Institute Argonne National Lab & University of Chicago.
CaGrid Service Metadata Scott Oster - Ohio State
Ian Foster Computation Institute Argonne National Lab & University of Chicago Service-Oriented Science: Scaling eScience Impact Or, “Science 2.0”
NSF Middleware Initiative: GridShib Tom Barton University of Chicago.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
The Grid as Infrastructure and Application Enabler Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
State of Service Oriented Science Tools Open Source Grid Cluster Conference Oakland.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
A Swift Talk about Globus Technology: What Can It Do for Me? OOI Cyberinfrastructure Design Meeting, San Diego, October The Globus Team (presented.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Globus and Service Oriented Architecture.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Ian Foster Computation Institute Argonne National Lab & University of Chicago From the Heroic to the Logistical Programming Model Implications of New Supercomputing.
Ian Foster Argonne National Laboratory University of Chicago Univa Corporation Service-Oriented Science Scaling Science Services APAC Conference, September.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Middleware Support for Virtual Organizations Internet 2 Fall 2006 Member Meeting Chicago, Illinois Stephen Langella Department of.
Grid Services Overview & Introduction Ian Foster Argonne National Laboratory University of Chicago Univa Corporation OOSTech, Baltimore, October 26, 2005.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”
Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Grid Enabling Open Science.
Grid Rapid Application Virtualization Interface (gRAVI) - Service Oriented Science Ravi K Madduri, Argonne National Laboratory/ University of Chicago Joshua.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
CEDPS Services Area Update CEDPS Face-to-Face Meeting ANL October 2007.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Ian Foster Computation Institute Argonne National Lab & University of Chicago Application Hosting Services — Enabling Science 2.0 —
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
Accessing the VI-SEEM infrastructure
Virtual Organizations By the Rules
Presentation transcript:

Ian Foster Computation Institute Argonne National Lab & University of Chicago Scaling eScience Impact

The Big Questions Nature of the universeConsciousness Life & death Future of the planet

3 How Do We Answer Them? < Empirical Data Theory Simulation

4 The Same is True of Smaller Questions l Designing new chemical catalysts l Selling advertising l Creating entertainment l Finding parking

5 eScience: Science in an Exponential World l “Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing” [John Taylor, UK EPSRC] l “When brute force doesn’t work anymore” [Alex Szalay] l Science accelerated, decentralized, integrated, & democratized via the Internet

6 Grid: An Enabler of eScience Must we buy (or travel to) a power source? Or can we ship power to where we want to work? Enable on-demand access to, and integration of, diverse resources & services, regardless of location The dubious electrical power grid analogy

7 1st Generation Grids Focus on aggregation of many resources for massively (data-)parallel applications EGEE

8 Second-Generation Grids l Empower many more users by enabling on-demand access to services l Science gateways (TeraGrid) l Service oriented science l Or, “Science 2.0” “Service-Oriented Science”, Science, 2005

9 “Web 2.0” l Software as services u Data- & computation-rich network services l Services as platforms u Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services l Enabled by massive infrastructure buildout u Google projected to spend $1.5B on computers, networks, and real estate in 2006 u Many others are spending substantially l Paid for by advertising Declan Butler, Nature

10 Science 2.0: E.g., Virtual Observatories Data Archives User Analysis tools Gateway Figure: S. G. Djorgovski Discovery tools

11 Service-Oriented Science People create services (data or functions) … which I discover (& decide whether to use) … & compose to create a new function... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005

12 The Importance of “Hosting” and “Management” Tell me about this star Tell me about these 20K stars Support 1000s of users E.g., Sloan Digital Sky Survey, ~10 TB; others much bigger

13 Load Comes from All Over … A few hours of Globus.org access, early one morning …

14 Skyserver Sessions (Thanks to Alex Szalay)

15 The Two Dimensions of Service Oriented Science l Decompose across network Clients integrate dynamically u Select & compose services u Select “best of breed” providers u Publish result as new services l Decouple resource & service providers Function Resource Data Archives Analysis tools Discovery tools Users Fig: S. G. Djorgovski

16 Hosting & Management: Application Hosting Services Resource Provider Appln Code Appln Code Application client AHS management Hosting Service Author ization Resource Provider Provisioning Persistence Users Admins PDP Policy management Application deployment Application Prep Tool(s) Appln Code Application providers

17 Web Services Resource Framework in a Nutshell l Service l State representation u Resource u Resource Property l State identification u Endpoint Reference l State Interfaces u GetRP, QueryRPs, GetMultipleRPs, SetRP l Lifetime Interfaces u SetTerminationTime u ImmediateDestruction l Notification Interfaces u Subscribe u Notify l ServiceGroups RPs Resource Service GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy EPR

18 Apache Tomcat Service Container Globus Toolkit Web Services Container RPs Resource Service GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy EPR ResourceHome RPs Resource Service GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy EPR ResourceHome RPs Resource Service GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy EPR ResourceHome PIP PDP WorkManagerDB Conn Pool JNDI Directory Security Persistence Management State Authorization GT4 Web Services Container Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005

19 Appln Service Create Index service Store Repository Service Advertize Discover Invoke; get results Introduce Container Transfer GAR Deploy Ravi Madduri et al., Argonne/U.Chicago & Ohio State University RAVE l Remote Application Virtualization Environment l Builds on Introduce u Define service u Create skeleton u Discover types u Add operations u Configure security l Wrap arbitrary executables

20 Defining Community: Membership and Laws l Identify VO participants and roles u For people and services l Specify and control actions of members u Empower members  delegation u Enforce restrictions  federate policy A 12 B 12 A B Access granted by community to user Site admission- control policies Effective Access Policy of site to community

Courtsey : DOE report : LBNL : Authorization & Attribute Certificates for Widely Distributed Access Control XACML SAML

22 Security Services for VO Policy l Attribute Authority (ATA) u Issue signed attribute assertions (incl. identity, delegation & mapping) l Authorization Authority (AZA) u Decisions based on assertions & policy l Use with message- or transport-level security VO A Service VO ATA VO AZA Mapping ATA VO B Service VO User A Delegation Assertion User B can use Service A VO-A Attr  VO-B Attr VO User B Resource Admin Attribute VO Member Attribute VO Member Attribute

23 Globus Authorization Framework VOMSShibbolethLDAP PERMIS … GT4 Client GT4 Server PDP Attributes Authorization Decision PIP

24 caBIG: sharing of infrastructure, applications, and data. Data Integration! Service-Oriented Science & Cancer Biology

25 Microarray NCICB Research Center Gene Database Grid-Enabled Client Research Center Tool 1 Tool 2 caArray Protein Database Tool 3 Tool 4 Grid Data Service Analytical Service Image Tool 2 Tool 3 Grid Services Infrastructure (Metadata, Registry, Query, Invocation, Security, etc.) Grid Portal Cancer Bioinformatics Grid All Globus-based, by the way …

26 Composing Services: E.g., BPEL Workflow System Data uchicago.edu Analytic osu.edu Analytic duke.edu <BPEL Workflow Doc> <Workflow Inputs> <Workflow Results> BPEL Engine link caBiG: BPEL work: Ravi Madduri et al. link See also Kepler & Taverna

27 Provisioning: Astro Portal Stacking Service l Purpose u On-demand “stacks” of random locations within ~10TB dataset l Challenge u Rapid access to K “random” files u Time-varying load l Solution u Dynamic acquisition of compute, storage = + S4S4 Sloan Data Web page or Web Service Joint work with Ioan Raicu & Alex Szalay

Virtual Node(s) SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler SpecificationExecution Virtual Node(s) Provenance data Provenance data Provenance collector launcher file1 file2 file3 App F1 App F2 Scheduling Execution Engine (Karajan w/ Swift Runtime) Swift runtime callouts C CCC Status reporting Provisioning Falkon Resource Provisioner Amazon EC2 Dynamic Provisioning: Swift Architecture Yong Zhao, Mihael Hatigan, Ioan Raicu, Mike Wilde, Ben Clifford

29 Synthetic Benchmark 18 Stages 1000 tasks CPU seconds 1260 total time on 32 machines Ioan Raicu & Yong Zhao, U.Chicago

Release after 15 Seconds Idle

Release after 180 Seconds Idle

Swift Application B. Berriman, J. Good (Caltech) J. Jacob, D. Katz (JPL)

33 Montage Yong Zhao and Ioan Raicu, U.Chicago

34 Dispatcher Grid backend Grid-enabled Business Intelligence (BI) Application Provision New Worker Process BI Server 2 BI server applications started and decommissioned by a Grid-enabled dispatcher Managed Pool of Shared Resources

35 Computation as a First-Class Entity l Capture information about relationships among u Data (varying locations and representations) u Programs (& inputs, outputs, constraints) u Computations (& execution environments) l Apply this information to: u Discovery of data and programs u Computation management u Provenance u Planning, scheduling, performance optimization Data Program Computation operates-on execution-of created-by consumed-by A Virtual Data System for Representing, Querying & Automating Data Derivation [SSDBM02]

36 Example: fMRI Analysis First Provenance Challenge, [CCPE06]

37 Query Examples l Query by procedure signature u Show procedures that have inputs of type subjectImage and output types of warp l Query by actual arguments u Show align_warp calls (including all arguments), with argument model=rigid l Query by annotation u List anonymized subject images for young subjects: l Find datasets of type subjectImage, annotated with privacy=anonymized and subjectType=young l Basic lineage graph queries u Find all datasets derived from dataset ‘5a’ l Graph pattern matching u Show me all output datasets of softmean calls that were aligned with model=affine

38 Guidelines (Apache) Infrastructure (CVS, , bugzilla, Wiki) Projects Include … dev.globus — Community Driven Improvement of Globus Software, NSF OCI

39 Summary: Service-Oriented Science People create services (data or functions) … which I discover (& decide whether to use) … & compose to create a new function... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005

40 Summary: Service-Oriented Science People create services (data or functions) … which I discover (& decide whether to use) … & compose to create a new function... & then publish as a new service. Profoundly revolutionary:  Accelerates the pace of enquiry  Introduces a new notion of “result”  Requires new reward structures, training, infrastructure “Service-Oriented Science”, Science, 2005

41 Science 1.0  Science 2.0 Megabytes & gigabytes Tarballs Journals Individuals Community codes Supercomputer centers Makefile Computational science Mostly physical sciences 1000s of computationalists Government funded  Terabytes & petabytes  Services  Wikis  Communities  Science gateways  Campus & national grids …  Workflow  Science as computation  All sciences (& humanities)  Millions of scientists  Government funded

42 Thanks! l DOE Office of Science l NSF Office of Cyberinfrastructure l Colleagues at Argonne, U.Chicago, USC/ISI, and elsewhere l Many members of the German DGrid community

43 Service-Oriented Science Challenges l A need for new technologies, skills, & roles u Creating, publishing, hosting, discovering, composing, archiving, explaining … services l A need for substantial software development u “30-80% of modern astronomy projects is software”—S. G. Djorgovski, Caltech l A need for more & different infrastructure u Computers & networks to host services l And certainly profound research challenges u In every part of the service & science lifecycle For more information: