Virtual Communities and Science in the Large Dr. Carl Kesselman ISI Fellow Director, Center for Grid Technologies Information Sciences Institute Research.

Slides:



Advertisements
Similar presentations
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Advertisements

© 2012 Open Grid Forum Simplifying Inter-Clouds October 10, 2012 Hyatt Regency Hotel Chicago, Illinois, USA.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
GT 4 Security Goals & Plans Sam Meder
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
High Performance Computing Course Notes Grid Computing.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Service-Oriented Science: Scaling eScience Impact.
Service Oriented Grid Architecture Hui Li ICT in Business Colloquium, LIACS Mar 1 st, 2006 Note: Part of this presentation is based on Dr. Ian Foster’s.
Grid Based Infrastructure for Distributed Medical Imaging Carl Kesselman ISI Fellow Director, Center for Grid Technologies Information Sciences Institute.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The Global Storage Grid Or, Managing Data for “Science 2.0” Ian Foster Computation Institute Argonne National Lab & University of Chicago.
Seminar Grid Computing ‘06 Hui Li Sep 18, Overview Brief Introduction Presentations –Architecture –Functionality/Middleware –Applications Projects.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Service-Oriented Science: Scaling eScience Impact Or, “Science 2.0”
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
Ian Foster Argonne National Laboratory University of Chicago Univa Corporation Service-Oriented Science Scaling eScience Application & Impact.
Ian Foster Argonne National Laboratory University of Chicago Univa Corporation Grid Dynamics.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Ian Foster Computation Institute Argonne National Lab & University of Chicago A Sociology of the Grid? Carl Kesselman Information Sciences Institute, University.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Ian Foster Argonne National Laboratory University of Chicago Univa Corporation Service-Oriented Science Scaling Science Services APAC Conference, September.
Grid Security Issues Shelestov Andrii Space Research Institute NASU-NSAU, Ukraine.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Services Overview & Introduction Ian Foster Argonne National Laboratory University of Chicago Univa Corporation OOSTech, Baltimore, October 26, 2005.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Grid Based Infrastructure for Distributed Medical Imaging Carl Kesselman ISI Fellow Director, Center for Grid Technologies Information Sciences Institute.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Authors: Ronnie Julio Cole David
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
06/22/041 Data-Gathering Systems IRIS Stanford/ USGS UNAVCO JPL/UCSD Data Management Organizations PI’s, Groups, Centers, etc. Publications, Presentations,
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Rights Management in Globus Data Services Ann Chervenak, ISI/USC Bill Allcock, ANL/UC.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
1 AHM, 2–4 Sept 2003 e-Science Centre GRID Authorization Framework for CCLRC Data Portal Ananta Manandhar.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
The GRIDS Center, part of the NSF Middleware Initiative Grid Security Overview presented by Von Welch National Center for Supercomputing.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
INDIGO – DataCloud Security and Authorization in WP5 INFN RIA
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Virtual Organizations By the Rules
Scott Callaghan Southern California Earthquake Center
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Argonne National Laboratory
Presentation transcript:

Virtual Communities and Science in the Large Dr. Carl Kesselman ISI Fellow Director, Center for Grid Technologies Information Sciences Institute Research Professor Computer Science Viterbi School of Engineering University of Southern California

2 Acknowledgements l Ian Foster, with whom I developed many of these slides l Bill Allcock, Charlie Catlett, Kate Keahey, Jennifer Schopf, Frank Siebenlist, Mike ANL/UC l Ann Chervenak, Ewa Deelman, Laura Pearlman, Mike D’Arcy, Gaurang Mehta, USC/ISI l Karl Czajkowski, Steve Univa l Numerous other fine colleagues l NSF, DOE, IBM for research support

3 Context: System-Level Science Problems too large &/or complex to tackle alone …

4 Seismic Hazard Analysis (T. Jordan & SCEC) Seismic Hazard Model Seismicity Paleoseismology Local site effects Geologic structure Faults Stress transfer Crustal motion Crustal deformation Seismic velocity structure Rupture dynamics

5 SCEC Community Model IntensityMeasuresEarthquake Forecast Model AttenuationRelationship 1 Standardized Seismic Hazard Analysis Ground motion simulation Physics-based earthquake forecasting Ground-motion inverse problem Structural Simulation AWM GroundMotions SRM Unified Structural Representation Faults Motions Stresses Anelastic model 2 AWP = Anelastic Wave Propagation = SRM = Site Response Model RDMFSM 3 FSM = Fault System Model RDM = Rupture Dynamics Model Invert Other Data GeologyGeodesy

6 Science Takes a Village … l Teams organized around common goals u People, resource, software, data, instruments… l With diverse membership & capabilities u Expertise in multiple areas required l And geographic and political distribution u No location/organization possesses all required skills and resources l Must adapt as a function of the situation u Adjust membership, reallocate responsibilities, renegotiate resources

7 Virtual Organizations l From organizational behavior/management: u "a group of people who interact through interdependent tasks guided by common purpose [that] works across space, time, and organizational boundaries with links strengthened by webs of communication technologies" (Lipnack & Stamps, 1997) l The impact of cyberinfrastructure u People  computational agents & services u Communication technologies  IT infrastructure, i.e. Grid “The Anatomy of the Grid”, Foster, Kesselman, Tuecke, 2001

8 Forming & Operating (Scientific) Communities l Define VO membership and roles, & enforce laws and community standards u I.e., policy l Build, buy, operate, & share community infrastructure u Data, programs, services, computing, storage, instruments l Define and perform collaborative work u Use shared infrastructure, roles, & policy u Manage community workflow

9 Forming & Operating (Scientific) Communities l Define VO membership and roles, & enforce laws and community standards u I.e., policy l Build, buy, operate, & share community infrastructure u Data, programs, services, computing, storage, instruments u Service-oriented architecture l Define and perform collaborative work u Use shared infrastructure, roles, & policy u Manage community workflow

10 Defining Community: Membership and Laws l Identify VO participants and roles u For people and services l Specify and control actions of members u Empower members  delegation u Enforce restrictions  federate policy A 12 B 12 A B

11 Policy Challenges in VOs l Restrict VO operations based on characteristics of requestor u VO dynamics create challenges l Intra-VO u VO specific roles u Mechanisms to specify/enforce policy at VO level l Inter-VO u Entities/roles in one VO not necessarily defined in another VO Access granted by community to user Site admission- control policies Effective Access Policy of site to community

12 Core Security Mechanisms l Authentication and digital signature u “Identity” of communicating party l Attribute Assertions u C asserts that S has attribute A with value V l Delegation u C asserts that S can perform O on behalf of C l Namespaces and Attribute mapping u {A 1, A 2 … A n } vo1  {A’ 1, A’ 2 … A’ n } vo2 l Policy u Entity with attributes A asserted by C may perform operation O on resource R

13 Security Services for VO Policy l Attribute Authority (ATA) u Issue signed attribute assertions (incl. identity, delegation & mapping) l Authorization Authority (AZA) u Decisions based on assertions & policy l Use with message/transport level security VO A Service VO ATA VO AZA Mapping ATA VO B Service VO User A Delegation Assertion User B can use Service A VO-A Attr  VO-B Attr VO User B Resource Admin Attribute VO Member Attribute VO Member Attribute

14 Security Services in Practice VO Rights Users Rights’ Compute Center Access Services (running on user’s behalf) Rights Local policy on VO identity or attribute authority CAS or VOMS issuing SAML or X.509 ACs SSL/WS-Security with Proxy Certificates Authz Callout: SAML, XACML KCA MyProxy

15 Forming & Operating Scientific Communities l Define VO membership and roles, & enforce laws and community standards u I.e., policy l Build, buy, operate, & share community infrastructure u Data, programs, services, computing, storage, instruments l Define and perform collaborative work u Use shared infrastructure, roles, & policy u Manage community workflow

16 Facilities Computers Storage Networks Services Software People Implementation System-Level Problem Grid technology Decomposition U. Colorado Experimental Model NCSA Computational Model COORD. UIUC Experimental Model

17 Beyond Science Silos: Service-Oriented Architecture l Decompose across network l Clients integrate dynamically u Select & compose services u Select “best of breed” providers u Publish result as a new service l Decouple resource & service providers Function Resource Data Archives Analysis tools Discovery tools Users Fig: S. G. Djorgovski

18 Decomposition Enables Separation of Concerns & Roles User Service Provider “Provide access to data D at S1, S2, S3 with performance P” Resource Provider “Provide storage with performance P1, network with P2, …” D S1 S2 S3 D S1 S2 S3 Replica catalog, User-level multicast, … D S1 S2 S3

19 Providing VO Services: (1) Integration from Other Sources l Negotiate service level agreements l Delegate and deploy capabilities/services l Provision to deliver defined capability l Configure environment l Host layered functions Community A Community Z …

20 Deploying New Services Policy Client Environment Activity Allocate/provision Configure Initiate activity Monitor activity Control activity Interface Resource provider Current mechanisms include: GRAM, Workspaces (Keahey, et al), HAND (Qi, et al)

21 Virtualizing Existing Services into a VO l Establish service agreement with service u E.g., WS-Agreement, GRAM l Delegate use to VO user User A VO Admin User B VO User Existing Services

22 Jobs (2004) Open Science Grid  50 sites (15,000 CPUs) & growing  400 to >1000 concurrent jobs  Many applications + CS experiments; includes long-running production operations  Up since October 2003; few FTEs central ops

23 VO User Embedded Resource Management Cluster Resource Manager GRAM Cluster Resource Manager GRAM VO admin delegates credentials to be used by downstream VO services. VO admin starts the required services. VO jobs comes in directly from the upstream VO Users VO job gets forwarded to the appropriate resource using the VO credentials Computational job started for VO Client-side VO Scheduler Other Services VO Admin... Monitoring and control Headnode Resource Manager GRAM Deleg VO User VO Job

24 The Condor Brick Deploy Brick Allocate resources Initiate management services Execute Jobs via Condor-C Local Condor Environment Public Network Private Network Allocate resources Initiate job starters (i.e. glidein) GRAM VO Admin VO User

25 Policy for Dynamic VO Service Hosting Environment Service PDP DoIt Service Container PDP VO PDP User Create doit AddPolicy if Role=VO/Admin Role=HE/Service_Creator CreateService if Role=HE/ServiceCreator AddUser DoIt if VO_PDP(Attrs)=yes & Role=HE/Doer VO ATA DoIt if Role=VO/Doer

26 Providing VO Services: (2) Coordination & Composition l Take a set of provisioned services … … & compose to synthesize new behaviors l This is traditional service composition u But must also be concerned with emergent behaviors, autonomous interactions u See the work of the agent & PlanetLab communities “Brain vs. Brawn: Why Grids and Agents Need Each Other," Foster, Kesselman, Jennings, 2004.

27 Birmingham The Globus-Based LIGO Data Grid Replicating >1 Terabyte/day to 8 sites >120 million replicas so far MTBF = 1 month LIGO Gravitational Wave Observatory  Cardiff AEI/Golm

28 l Pull “missing” files to a storage system List of required Files GridFTP Local Replica Catalog Replica Location Index Data Replication Service Reliable File Transfer Service Local Replica Catalog GridFTP Data Replication Service “Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005 Replica Location Index Data Movement Data Location Data Replication

29 Hypervisor/OS Deploy hypervisor/OS Composing Resources … Composing Services Physical machine Procure hardware VM Deploy virtual machine Provisioning, management, and monitoring at all levels JVM Deploy container DRS Deploy service GridFTP RLS VO Services GridFTP

30 Community Commons l What capabilities are available to VO? u Membership changes, state changes l Require mechanisms to aggregate and update VO information VO-specific indexes S S SS Information A A A FRESH MORE The age of information

31 GT4 Container Monitoring and Discovery Services MDS- Index GT4 Cont. RFT MDS- Index GT4 Container MDS- Index Registration & WSRF/WSN Access GridFTP adapter Custom protocols for non-WSRF entities Clients (e.g., WebMDS) GRAMUser Automated registration in container WS-ServiceGroup

32 Forming & Operating Scientific Communities l Define VO membership and roles, & enforce laws and community standards u I.e., policy l Build, buy, operate, & share community infrastructure u Data, programs, services, computing, storage, instruments u Service-oriented architecture l Define and perform collaborative work u Use shared infrastructure, roles, & policy u Manage community workflow

33 Collaborative Work Executed Executing Executable Not yet executable Query Edit Schedule Execution environment What I Did What I Want to Do What I Am Doing … Time

34 Managing Collaborative Work l Process as “workflow,” at different scales, e.g.: u Run 3-stage pipeline u Process data flowing from expt over a year u Engage in interactive analysis l Need to keep track of: u What I want to do (will evolve with new knowledge) u What I am doing now (evolve with system config.) u What I did (persistent; a source of information)

35 Problem Refinement l Given: desired result and constraints u desired result (high-level, metadata description) u application components u resources in the Grid (dynamic, distributed) u constraints & preferences on solution quality l Find: an executable job workflow u A configuration that generates the desired result u A specification of resources to be used u Sequence of operations: create agreement, move data, request operation l May create workflow incrementally as information becomes available "Mapping Abstract Complex Workflows onto Grid Environments," Deelman, Blythe, Gil, Kesselman, Mehta, Vahi, Arbree, Cavanaugh, Blackburn, Lazzarini, Koranda, 2003.

36 Trident: The GriPhyN Virtual Data System Abstract workflow Local planner DAGman DAG Statically Partitioned DAG DAGman & Condor-G Dynamically Planned DAG VDL Program Virtual Data catalog Virtual Data Workflow Generator Job Planner Job Cleanup Workflow spec Create Execution Plan Grid Workflow Execution

37 Seismic Hazard Curve Ground motion that will be exceeded every year Exceeded every year Ground motion that a person can expect to be exceeded during their lifetime Typical design for buildings Typical design for hospitals Typical design for nuclear power plant Exceeded 1 time in 10 years Exceeded 1 time in 100 years Exceeded 1 time in 1000 years Exceeded 1 time in 10,000 years Annual frequency of exceedance Ground Motion – Peak Ground Acceleration Carl’s house during Northridge Minor damageModerate damage 10% probability of exceedance in 50 years

38 SCEC Cybershake l Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast Rupture Forecast Synthetic Seismogram Strain Green Tensor Hazard Curve Spectral Acceleration Hazard Map

39 Cybershake on the SCEC VO TeraGrid Compute TeraGrid Storage VO Scheduler Workflow Scheduler/Engine VO Service Catalog Provenance Catalog Data Catalog SCEC Storage

40 Summary (1): Community Services l Community roll, city hall, permits, licensing & police force u Assertions, policy, attribute & authorization services l Directories, maps u Information services l City services: power, water, sewer u Deployed services l Shops, businesses u Composed services l Day-to-day activities u Workflows, visualization l Tax board, fees, economic considerations u Barter, planned economy, eventually markets

41 Summary (2) l Community based science will be the norm u Requires collaborations across sciences— including computer science l Many different types of communities u Differ in coupling, membership, lifetime, size l Must think beyond science stovepipes u Increasingly the community infrastructure will become the scientific observatory l Scaling requires a separation of concerns u Providers of resources, services, content l Small set of fundamental mechanisms required to build communities

42 For More Information l Globus Alliance u l NMI and GRIDS Center u u l Infrastructure u u l Background u 2nd Edition