Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory GriPhyN NSF Project Review January 2003 Chicago
229 Jan 2003 Ian Foster, U.Chicago Computer Science Research l Introduction & Context (Ian Foster: 30 mins) –Vision : Virtual data as e-science enabler –Organization: Structure & interactions –Dissemination: Targets and mechanisms –The nature of future challenges l Computer science research –Virtual data (Mike Wilde: 15) –Scheduling, planning (Ewa Deelman: 15) –Execution (Mike Franklin: 15) –Performance (Valerie Taylor: 15) l Technology delivery (Miron Livny: 15) –Virtual Data Toolkit l Student presentations (60)
329 Jan 2003 Ian Foster, U.Chicago Computer Science Research l Introduction & Context (Ian Foster: 30 mins) –Vision : Virtual data as e-science enabler –Organization: Structure & interactions –Dissemination: Targets and mechanisms –The nature of future challenges l Computer science research –Virtual data (Mike Wilde: 15) –Scheduling, planning (Ewa Deelman: 15) –Execution (Mike Franklin: 15) –Performance (Valerie Taylor: 15) l Technology delivery (Miron Livny: 15) –Virtual Data Toolkit l Student presentations (60)
429 Jan 2003 Ian Foster, U.Chicago PetaScale Virtual Data Grids (1) Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, computers, and network ) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Research group Raw data source PetaOps Petabytes Performance
529 Jan 2003 Ian Foster, U.Chicago Petascale Virtual Data Grids (2)
629 Jan 2003 Ian Foster, U.Chicago Computer Science and GriPhyN Computer Science Research Virtual Data Toolkit Partner Physics Projects Larger Science Community Globus, Condor, NMI, EU DataGrid, PPDG Communities Production Deployment Tech Transfer Techniques & software Requirements Prototyping & experiments Other linkages: - Work force - CS researchers - Industry
729 Jan 2003 Ian Foster, U.Chicago Computer Science Challenges (1) l Virtual data –Representation, discovery, & manipulation of workflows and associated data & programs l Planning –Mapping workflows in an efficient, policy- aware manner to distributed resources l Execution –Executing workflows, including data movements, reliably and efficiently l Performance –Monitoring aspects of system performance for scheduling & troubleshooting
829 Jan 2003 Ian Foster, U.Chicago Computer Science Challenges (2) l Engage meaningfully with physics groups l Provide educational opportunities l Develop, package, deliver, and support quality software l Achieve outreach to groups outside partner physics experiments
929 Jan 2003 Ian Foster, U.Chicago Computer Science Research l Introduction & Context (Ian Foster: 30 mins) –Vision : Virtual data as e-science enabler –Organization: Structure & interactions –Dissemination: Targets and mechanisms –The nature of future challenges l Computer science research –Virtual data (Mike Wilde: 15) –Scheduling, planning (Ewa Deelman: 15) –Execution (Mike Franklin: 15) –Performance (Valerie Taylor: 15) l Technology delivery (Miron Livny: 15) –Virtual Data Toolkit l Student presentations (60)
1029 Jan 2003 Ian Foster, U.Chicago GriPhyN Computer Science Team l U.Chicago: Dumitrescu, Foster, Iamnitchi, Milligan, Ranganathan, Ripeanu, Voeckler, Wilde l USC/ISI: Deelman, Kesselman, Mehta, Patil, Singh, Vahi l NWU -> TAMU: Taylor, Yin l UCB: Franklin, Liu l UCSD: Marzullo, Moore, Zhang, Jagatheesan l UW-Madison: Alderman, Arpaci-Dusseau, Arpaci- Dusseau, Bailey, Bent, Kosar, Livny, Roy, Stanley, Thain l UF: Arbee, George, Jiang, Katageri, Ranka, Rodriguez l UT Brownsville: Campanelli, Morris, Zamora l LBNL: Shoshani Faculty/Staff, Student/Postdoc (underlined = present)
1129 Jan 2003 Ian Foster, U.Chicago Computer Science Research: How do We Work? l System architecture & virtual data toolkit as two overarching organizational mechanisms l Project activities all defined in relationship to these organizing principles: –Research: Explore new techniques to guide evolution of the system architecture and VDT –Development: Construct VDT software –Evaluation: Apply and evaluate VDT software and/or new techniques in context of application challenges
1229 Jan 2003 Ian Foster, U.Chicago Computer Science Research: How Are We Coordinated? l The activities of this large, multidisciplinary group are coordinated by frequent and multivalent communications –Face-to-face meetings in large & small groups –Formal and informal documents defining requirements, challenge problems, testbeds – , phone calls, videoconferences –Cooperation on challenge problems and technology and application demonstrations –Cooperation on software releases
1329 Jan 2003 Ian Foster, U.Chicago GriPhyN Architecture/VDT and CS Research Projects Virtual Data Planning Execution Chimera Virtual Data System + Pegasus Planner DAGman Workflow Globus Toolkit, Condor, Ganglia, Etc. Partial Queries (Liu, Franklin) Decentralized scheduling (Ranganathan) Fault-tolerant master-worker (Marzullo) Scalable replica location service (UC, ISI team) Policy-aware scheduling (Dumitrescu) Ontologies (Zhao) NeST Storage mgmt (UW team) Virtual data language design (Voeckler,Wilde) AI Planning (Deelman,Narang) Virtual data language applns (Milligan, Zhao) DAGman enhancements (UW team) Prophesy (Taylor, Yin) HP monitoring (George) VDT Research
1429 Jan 2003 Ian Foster, U.Chicago GriPhyN Arch/VDT—CS Research Degree of Coupling Virtual Data Planning Execution Chimera Virtual Data System + Pegasus Planner DAGman Workflow Globus Toolkit, Condor, Ganglia, Etc. Partial Queries (Liu, Franklin) Decentralized scheduling (Ranganathan) Fault-tolerant master-worker (Marzullo) Scalable replica location service (UC, ISI team) Policy-aware scheduling (Dumitrescu) Ontologies (Zhao) NeST Storage mgmt (UW team) Virtual data language design (Voeckler,Wilde) AI Planning (Deelman,Narang) Virtual data language applns (Milligan, Zhao) DAGman enhancements (UW team) Prophesy (Taylor, Yin) HP monitoring (George) VDT Research Already Underway Pending
1529 Jan 2003 Ian Foster, U.Chicago Examples of Technology Injection: Chimera R&D Timeline Chimera-2 Type model Dataset catalog Metadata Hyperlinks Instance tracking Performance data Chimera-1 Java code & class model XML VDL TR/DV model Compound TRs General Grid exec env Optimized DB schema Chimera-0 Derivations only Grid exec environment (prototype) PERL & PostgresQL Sloan cluster finding APPSAPPS TECHTECH CMS analysis prototype w/ROOT CMS official event simulation Sloan cluster- finding science CMS & ATLAS analysis w/ROOT, CLARENS, JAS LIGO pulsar search ATLAS events- on- demand CMS event simulation prototyping Chimera-3 Knowledge repr. Policy-driven planners VD browsers, composers … 2004 Sloan near- earth object Bio Grid facility …
1629 Jan 2003 Ian Foster, U.Chicago Computer Science Research l Introduction & Context (Ian Foster: 30 mins) –Vision : Virtual data as e-science enabler –Organization: Structure & interactions –Dissemination: Targets and mechanisms –The nature of future challenges l Computer science research –Virtual data (Mike Wilde: 15) –Scheduling, planning (Ewa Deelman: 15) –Execution (Mike Franklin: 15) –Performance (Valerie Taylor: 15) l Technology delivery (Miron Livny: 15) –Virtual Data Toolkit l Student presentations (60)
1729 Jan 2003 Ian Foster, U.Chicago Dissemination: Targets l Researchers and educators –Facilitate creation of new knowledge l Computer science research community –Contribute to knowledge –Engage community in solving our problems l Open source community –Contribute to open Grid technology base l Industry –Contribute to vibrant commercial technology
1829 Jan 2003 Ian Foster, U.Chicago Dissemination: Mechanisms l Software –VDT: adoption by LHC Computing Grid –Globus Toolkit and Condor systems l Publications and talks –XX papers, YY tech reports, ZZ talks l Workshops and meetings –E.g., “Data Derivation & Provenance”, Oct 02 l Community activities –E.g., advisory committees, GGF standards
1929 Jan 2003 Ian Foster, U.Chicago Representative Publications l Annis, J., Zhao, Y., Voeckler, J., Wilde, M., Kent, S., Foster, I., Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey. SC'2002, l Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A.C., Arpaci- Dusseau, R.H., Livny, M., Flexibility, Manageability, and Performance in a Grid Storage Appliance, HPDC’11, l Deelman, E., Blackburn, K., Ehrens, P., Kesselman, C., Koranda, S., Lazzarini, A., Mehta, G., Meshkat, L., Pearlman, L., Blackburn, K. and Williams., R., GriPhyN and LIGO: Building a Virtual Data Grid for Gravitational Wave Scientists, HPDC’11, l Foster, I., Voeckler, J., Wilde, M., Zhao, Y., Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation, SSDBM, l Iamnitchi, A., Ripeanu, M., Foster, I., Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations. 1st Intl. Workshop on Peer-to-Peer Systems, l Raman, P., George, A., Radlinski, M., Subramaniyan, R., GEMS: Gossip-Enabled Monitoring Service for Heterogeneous Distributed Systems, Technical Report, UF, l Ranganathan, K. and Foster, I., Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications, HPDC’11, l Ripeanu, M., Foster, I., Iamnitchi, A. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. Internet Computing, 6 (1)
2029 Jan 2003 Ian Foster, U.Chicago Computer Science Research l Introduction & Context (Ian Foster: 30 mins) –Vision : Virtual data as e-science enabler –Organization: Structure & interactions –Dissemination: Targets and mechanisms –The nature of future challenges l Computer science research –Virtual data (Mike Wilde: 15) –Scheduling, planning (Ewa Deelman: 15) –Execution (Mike Franklin: 15) –Performance (Valerie Taylor: 15) l Technology delivery (Miron Livny: 15) –Virtual Data Toolkit l Student presentations (60)
2129 Jan 2003 Ian Foster, U.Chicago The Nature of Future Challenges l GriPhyN R&D is proving very successful –In terms of “new ideas” –In terms of interest & adoption l Our major challenges as we move forward are to scale and sustain the effort –Research scope: virtual data => KR; planning, execution => x1000 larger; …; … –Software support: we need NMIx10! –Infrastructure & application support l See Atkins cyberinfrastructure report!
2229 Jan 2003 Ian Foster, U.Chicago Summary l CS has made significant contributions both to experiments and to knowledge, e.g. –Virtual data concepts and technologies –Scheduling in large-scale distributed systems –DAGman workflow management & execution –Scalable replica location services l VDT (& underlying Globus Toolkit & Condor systems) a good technology transfer vehicle –Adoption by major science projects –Adoption of Grid concepts within industry l Major challenge: exploiting opportunities