GriPhyN & iVDGL Architectural Issues GGF5 BOF Data Intensive Applications Common Architectural Issues and Drivers Edinburgh, 23 July 2002 Mike Wilde Argonne National Laboratory Grid Physics Network International Virtual Data Grid Laboratory
Project Summary l Principle requirements –IT Research: virtual data and transparent execution –Grid building: deploy international grid lab at scale l Components developed/used –Virtual Data Toolkit; Linux deployment platform –Virtual Data Catalog, Request planner and executor, DAGman, NeST l Scale of current testbeds –ATLAS Test Grid – 8 sites –CMS Test Grid – 5 sites –Compute nodes: UW, UofC, UWM, UTB, ANL –>50 researchers and grid-builders working on IT research challenge problems and demos l Future directions (2002 & 2003) –Extensive work on virtual data, planning, and catalog architecture, and fault tolerance
Chimera Overview l Concept: Tools to support management of transformations and derivations as community resources l Technology: Chimera virtual data system including virtual data catalog and virtual data language; use of GriPhyN virtual data toolkit for automated data derivation l Results: Successful early applications to CMS and SDSS data generation/analysis l Future: Public release of prototype, new apps, knowledge representation, planning
“Chimera” Virtual Data Model l Transformation designers create programmatic abstractions –Simple or compound; augment with metadata l Production managers create bulk derivations –Can materialize data products or leave virtual l Users track their work through derivations –Augment (replace?) the scientist’s log book l Definitions can be augmented with metadata –The key to intelligent data retrieval –Issues relating to metadata propagation
pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end CMS Pipeline in VDL-0
Data Dependencies – VDL-1 TR tr1( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app1"; argument stdin = ${a1}; argument stdout = ${a2}; } TR tr2( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app2"; argument stdin = ${a1}; argument stdout = ${a2}; } DV x1->tr1( DV x2->tr2( file1 file2 file3 x1 x2
Executor Example: Condor DAGMan l Directed Acyclic Graph Manager l Specify the dependencies between Condor jobs using DAG data structure l Manage dependencies automatically –(e.g., “Don’t run job “B” until job “A” has completed successfully.”) l Each job is a “node” in DAG l Any number of parent or children nodes l No loops Job A Job BJob C Job D Slide courtesy Miron Livny, U. Wisconsin
Joint work with Jim Annis, Steve Kent, FNAL Size distribution of galaxy clusters? Galaxy cluster size distribution Chimera Virtual Data System + GriPhyN Virtual Data Toolkit + iVDGL Data Grid (many CPUs) Chimera Application: Sloan Digital Sky Survey Analysis
catalog cluster 5 4 core brg field tsObj brg field tsObj 2 1 brg field tsObj 2 1 brg field tsObj 2 1 core 3 Cluster-finding Data Pipeline
Small SDSS Cluster-Finding DAG
And Even Bigger: 744 Files, 387 Nodes
Vision: Distributed Virtual Data Service apps Tier 1 centers Regional Centers Local sites VDC Distributed virtual data service
Knowledge Management - Strawman Architecture l Knowledge based requests are formulated in terms of science data –Eg, Give me a specific transform of channels c,p,&t over time range t0-t1 l Finder finds the data files –Translates range “t0-t1” into a set of files l Coder creates an execution plan and defines derivations from known transformations –Can deal with missing files (e.g, file c in LIGO example) l Knowledge request is answered in terms of datasets l Coder translates datasets into logical files (or objects, queries, tables,…) l Planner translates logical entities into physical entities
GriPhyN/PPDG Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAG (concrete) DAG (abstract) DAGMAN, Kangaroo GRAMGridFTP; GRAM; SRM GSI, CAS MDS MCAT; GriPhyN catalogs GDMP MDS Globus
Common Problem #1 (evolving) View of Data Grid Stack Data Transport (GridFTP) Storage Element Local Repl Catalog (Flat or Hierarchical) Reliable File Transfer Replica Location Service Publish-Subscribe Service (GDMP) Storage Element Manager Reliable Replication
Architectural Complexities
Common Problem #2: Request Planning l Map of grid resources l Incoming work to plan –Queue? With lookahead? l Status of grid resources –State (up/down) –Load (current, queued, and anticipated) –Reservations l Policy –Allocation (commitment of resource to VO or group based on policy) l Ability to change decisions dynamically
Policy l Focus is on resource allocation (not with security) l Allocation examples: –“CMS should get 80% of the resources at Caltech” (averaged monthly) –“Higgs group has high prio at BNL till 8/1” l Need to apply fair share scheduling to grid l Need to understand the allocation models dictated by funders and data centers
Grids as overlays on shared resources
Grid Scheduling Problem l Given an abstract DAG representing logical work: –Where should each compute job be executed? >What does site and VO policy say? >What does grid “weather” dictate? –Where is the required data now? –Where should data results be sent? l Stop and re-schedule computations? l Suspend or de-prioritize work in progress to let higher prio work go through? l Degree of policy control? l Is a “grid” an entity? - “aggregator” of resources? l How is data placement coordinated with planning? l Use of a Execution profiler in the planner arch: –Characterize resource needs of an app over time –Parameterize resource reqs of app by its parameters l What happens when things go wrong?
Policy and the Planner l Planner considers: –Policy (fairly static, from CAS/SAS) –Grid status –Job (user/group) resource consumptn history –Job profiles (resources over time) from Prophesy
Open Issues – Planner (1) l Does the planner have a queue? If so, how does a planner manage its queue? l How many planners are there? Is it a service? l How is responsibility between planner and the executor (cluster scheduler) partitioned? l How many other entities need to be coordinated? –RFT, DAPman, SRM, NeST, …? –How to wait on reliable file transfers? l How does planner estimate times if it only has partial responsibility for when/where things run? l How is data placement planning coordinated with request planning?
Open Issues – Planner (2) l Clearly need incremental planning (eg for analysis) l Stop and re-schedule computations? l Suspend or de-prioritize work in progress to let higher prio work go through? l Degree of policy control? l Is the “grid” an entity? l Use of a Execution profiler in the planner arch: –Characterize the resource requirements of an app over time –Parameterize the res reqs of an app w.r.t its (salient) parameters l What happens when things go wrong?
Issue Summary l Consolidate the data grid stack –Reliable file transfer –Reliable replication –Replica catalog and virtual data catalog scaled for global use l Define interfaces and locations of planners l Unify job workflow representation around DAGs l Define how to state and manage policy l Strategies for fault tolerance – similar to replanning for weather and policy changes? l Evolution of services to OGSA