J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo
Grid Technology: Introduction & Overview Ian Foster Argonne National Laboratory University of Chicago
J.J.Blaising April 02AMS DataGrid-status3 Harvey B. Newman, Caltech Data Analysis for Global HEP Collaborations LCG Launch Workshop, CERN l3www.cern.ch/~newman/LHCCMPerspective_hbn ppt LHC Computing Model Perspective
J.J.Blaising April 02AMS DataGrid-status4 Query (task completion time) estimation Queueing and co-scheduling strategies Load balancing (e.g. Self Organizing Neural Network) Error Recovery: Fallback and Redirection Strategies Strategy for use of tapes Extraction, transport and caching of physicists’ object-collections; Grid/Database Integration Policy-driven strategies for resource sharing among sites and activities; policy/capability tradeoffs Network Peformance and Problem Handling – Monitoring and Response to Bottlenecks – Configuration and Use of New-Technology Networks e.g. Dynamic Wavelength Scheduling or Switching Fault-Tolerance, Performance of the Grid Services Architecture Consistent transaction management, ……. FROM H.Newman
NL SURFnet CERN UK SuperJANET4 Abilene ESNET MREN IT GARR-B GEANT NewYork STAR-TAP STAR-LIGHT DataTAG project Major 2.5 Gbps circuits between Europe & USA
J.J.Blaising April 02AMS DataGrid-status6 DataGrid Goal Develop middleware to allow WAN distributed computing and data management Build a distributed batch system allowing to submit jobs on different sites with automatic site selection according to resource matching. Next: Interactive use and parallel processing Other OS (Solaris) Requirements from HEP, Earth Orbservation and Biomedical applications.
J.J.Blaising April 02AMS DataGrid-status7 User ITF Node Computing element gatekeeper Jobmanger-PBS/LSF/BQS Publish CPU resources Storage element gatekeeper Publish storage resources Worker Node Client Worker Node Resources provider Storage CPU Workload manager Information system File Catalog server Grid Services Submit job
J.J.Blaising April 02AMS DataGrid-status8 Middleware status v1.1.2 Workload manager (UI+RB+JSS+LB), WP1 still bug fixing + improvements for year 2 Data management, file catalog, replica manager, WP2 good collaboration with globus Information system, WP3 deployment of uniform FTREE/MDS/R-GMA Fabric management, WP4 LCFG, light LCFG for preinstalled systems Mass storage management, WP5, Castor, Hpss, … Successful EU review on 1 March
J.J.Blaising April 02AMS DataGrid-status9 VO Services Computing and Storage element services deployed at CERN, CC-IN2P3, CNAF, NIKHEF, RAL, more … US sites soon to test Grid interoperability For ALICE, ATLAS, CMS, LHCb, Earthobs, Biomed deployment of dedicated services LDAP server (certificates) File catalog (LFN/PFN mapping) GDMP server (automatic data replication) More to come, Metadata catalog, …
J.J.Blaising April 02AMS DataGrid-status10 Application activities (WP8) Middleware evaluation using ALICE, ATLAS, CMS, LHCb, Gen-Hep toolkits User requirements collection with ALICE, ATLAS, CMS, LHCb Common HEP uses cases Common application use case
J.J.Blaising April 02AMS DataGrid-status11 OS & Net services Bag of Services (GLOBUS) Specific application layer GLOBU S team MiddleWare MW1MW2MW3MW4MW5 MiddleWare MW1MW2MW3MW4MW5 LHC VO use cases & requirements Other apps If we manage to define ALICEATLASCMSLHCbOther apps MiddleWare MW1MW2MW3MW4MW5 VO use cases & requirements Common core use case Or even better LHCOther apps MiddleWare MW1MW2MW3MW4MW5 VO use cases & requirements It will be easier to arrive at Common use cases LHCOther apps Common use cases
J.J.Blaising April 02AMS DataGrid-status12 What we want from a GRID This is the result of our experience on TB0 & TB1 OS & Net services Basic Services High level GRID middleware LHC VO common application layer Other apps ALICEATLASCMSLHCb Specific application layer Other apps GLOBU S team GRID architecture Common use cases
J.J.Blaising April 02AMS DataGrid-status13 Demo introduction Sites involved CERN, CNAF, LYON, NIKHEF, RAL User interface in X, dg-job-submit demo.jdl => job sent to the Workload management syst at CERN The WMS selects a site according to resource attributes given in the jdl file and to the resources published via the Infornation System. The job is sent to one of the site, a data file is written the file is copied to the nearest MS and replicated on all other sites. dg-job-get-output is used to retrieve the files
J.J.Blaising April 02AMS DataGrid-status14 Add lfn/pfn to Rep Catalog Generate Raw events on local disk Raw/dst ? Job arguments Data Type : raw/dst Run Number :xxxxxx Number of evts :yyyyyy Number of wds/evt:zzzzzz Rep Catalog flag : 0/1 Mass Storage flag : 0/1 Write logbook raw_xxxxxx_dat.log dst_xxxxxx_dat.log Read raw events Write dst events Get pfn from Rep Catalog Add lfn/pfn to Rep Catalog MS Move to SE, MS ? Write logbook pfn local ? n y raw_xxxxxx_dat.log Copy raw data From SE to Local disk Generic HEP application flowchart SE Move to SE, MS? SE
J.J.Blaising April 02AMS DataGrid-status15 demo.jdl Executable = demo.csh; Arguments = raw StdInput = none; StdOutput = demo.out; StdError = demo.err; InputSandbox = {demo.csh,main.exe}; OutputSandbox={demo.out.demo.err,demo.log}; Requirements = other.OpSys==“RH 6.2; dg-job-submit demo.jdl dg-job-get-output job-id User ITF Node Input sanbox Output sandbox Workload manager Information system STORAGE COMPUTING STORAGE COMPUTING File catalog server data