Ian Foster Computation Institute Argonne National Lab & University of Chicago Grid Enabling Open Science
2 Abstract Rapid advances in both science and information technology are driving the emergence of "eScience.” Grid technologies play a crucial role in eScience by enabling resource and service federation across organizational boundaries, supporting on-demand access to computing resources, and allowing the formation and operation of distributed, multi-organizational collaborations. eScience and Grid also require new tools, infrastructure, and policies. I will discuss opportunities, achievements, and challenges in these related areas.
3 Thanks! l DOE Office of Science l NSF Office of Cyberinfrastructure l Colleagues at Argonne, U.Chicago, USC/ISI, and elsewhere l Mary Kratz and my other colleagues involved in US HealthGrid
4 What is the Grid? “The Grid is an international project that looks in detail at a terrorist cell operating on a global level and a team of American and British counter-terrorists who are tasked to stop it” Gareth Neame, BBC's head of drama
5 Well, Not Exactly! “The Grid is an international project that looks in detail at scientific collaborations operating on a global level and a team of computer scientists who are tasked to enable it” At least, that’s where it started …
6 When power transmission became (quasi-)ubiquitous …
7 We No Longer Had to Travel to Power Plants
8 We Invented New Tools
9 We Worked in New Ways
As telecommunications … … become (quasi-)ubiquitous
11 We Can Access Computing on Demand Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back Office Analysis on Grid Millions of BLAST, BLOCKS, etc., on OSG and TeraGrid Natalia Maltsev et al.,
12 Bennett Berthenthal et al., We Can Invent New Tools
13 We Can Work in New Ways Credit: Jonathan Silverstein, U.Chicago
14 Enable on-demand access to, and federation of, diverse resources Computers, storage, data, people, … Resources can be distributed, heterogeneous Networks & protocols provide the connectivity Software provides the “glue” Grid: Unifying Concept & Technology IMAGING INSTRUMENTS COMPUTATIONAL RESOURCES LARGE-SCALE DATABASES DATA ACQUISITION,ANALYSIS ADVANCED VISUALIZATION
15 Applications Technology, Infrastructure, & Standards Resources integration interoperability User-level Middleware and Tools System-level Common Infrastructure For example
16 Towards Open eScience Natural disasters Sustainable energy Disease Climate change
17 Emergence of New Problem Solving Methodologies < Empirical Data Theory Simulation eScience: When brute force doesn’t work anymore (Szalay)
18 First Generation Grids: Batch Computing Focus on aggregation of many resources for massively (data-)parallel applications EGEE Globus
19 fMRI: A Broad Picture Globus
20 fMRI Performance Study Yong Zhao and Ioan Raicu, U.Chicago
21 Second Generation Grids: Service-Oriented Science l Empower many more users by enabling on-demand access to services l Grids become an enabling technology for service oriented science (and business) u Grid infrastructures host services u Grid technologies used to build services “Service-Oriented Science”, Science, 2005 Science Gateways
22 “Web 2.0” l Software as services u Data- & computation-rich network services l Services as platforms u Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services l Enabled by massive infrastructure buildout u Google projected to spend $1.5B on computers, networks, and real estate in 2006 u Many others are spending substantially l Paid for by advertising Declan Butler, Nature
23 Service-Oriented Science: E.g., Virtual Observatories Data Archives User Analysis tools Gateway Figure: S. G. Djorgovski Discovery tools
24 Service-Oriented Science People create services (data or functions) … which I discover (& decide whether to use) … & compose to create a new function... & then publish as a new service. I find “someone else” to host services, so I don’t have to become an expert in operating services & computers! I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005
25 Skyserver Sessions Source: Alex Szalay
26 caBIG: sharing of infrastructure, applications, and data. Data Integration! Service-Oriented Science & Cancer Biology
27 Microarray NCICB Research Center Gene Database Grid-Enabled Client Research Center Tool 1 Tool 2 caArray Protein Database Tool 3 Tool 4 Grid Data Service Analytical Service Image Tool 2 Tool 3 Grid Services Infrastructure (Metadata, Registry, Query, Invocation, Security, etc.) Grid Portal Cancer Bioinformatics Grid Globus
28 MEDICUS: Management of DICOM Images Imaging Modality Dicom Grid Interface Service (DGIS) Cache Image Storage 1. Grid PACS Meta Catalog (DICOM Image Attributes) OGCE-DAI. Hospital Domain Grid Node Imaging Technologist Stephan Erberich, Manasee Bhandekar, Ann Chervenak, et al. Globus
29 Children’s Oncology Grid: A MEDICUS Deployment Globus
30 MEDICUS Under the Covers l DICOM images u Send (publish) u Query/Retrieve (discover) l Grid Archive u Fault tolerant u Bandwidth l Security u Authentication u Authorization u Cryptography l Access u Web portal l Applications u Computing u Data Mining DICOM Grid Interface Service (DGIS) + Meta Catalog Service (OGSA-DAI) Data Replication Service (DRS)Grid Web Portal, OGCE / GridSphere Globus Toolkit Release 4 GRAM, OGSA-DAIX.509 Certificates + MyProxy Delegation Globus
31 Grids are Communities of People as well as Computers l Based on (technology-mediated) trust u Common goals u Processes and policies u Reward systems l That share resources u Computers u Data u Sensor networks u Services l Supported by software, standards, infrastructure
32 Applications Grid Infrastructure Supports Communities Resources integration interoperability User-level Middleware and Tools System-level Common Infrastructure For example
Global Community
34 Globus Downloads Last 24 Hours Globus
35 l Provider of commercial support, services, & products around open source Globus u Commercial distribution of GT4 & beyond u Univa Globus Cluster Edition u Univa Globus Enterprise Edition u Committed to open source & open standards l Univa is contributing to Globus open source u Big contributions to GT4 development, testing u New functionality: install shields, security configurator, GridFTP extensions u Additional contributions expected
36 Science 1.0 Science 2.0 Megabytes & gigabytes Tarballs Journals Individuals Community codes Supercomputer centers Makefile Computational science Mostly physical sciences 1000s of computationalists Government funded Terabytes & petabytes Services Wikis Communities Science gateways Campus & national grids … Workflow Science as computation All sciences (& humanities) Millions of researchers Gov & industry funded
37 Summary l Technology exponentials are transforming the nature of research u Data-, compute-, & communication-intensive approaches are increasingly influential l Grid is a unifying concept & technology u Federation & on-demand access to resources l An enabler of “service-oriented science” u Transforms how we conduct research & communicate results u Demands new reward structures, training, & infrastructure For more information: