Implementing Metadata Using RLS/LCG James Cunha Werner University of Manchester

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

James Cunha Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EasyGrid: a job submission system for distributed.
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
EasyGrid: the job submission system that works! James Cunha Werner GridPP18 Meeting – University of Glasgow.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA NEMO Monte Carlo Application on the Grid R. Calcagno for the NEMO Collaboration.
Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester Christmas Meeting
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
James Cunha Job Submission for Babar Analysis James Werner Resources:
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
EasyGrid Job Submission System and Gridification Techniques James Cunha Werner Christmas Meeting University of Manchester.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
GRID Computing: Ifrastructure, Development and Usage in Bulgaria M. Dechev, G. Petrov, E. Atanassov.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
G.Corti, P.Robbe LHCb Software Week - 19 June 2009 FSR in Gauss: Generator’s statistics - What type of object is going in the FSR ? - How are the objects.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
3 rd EGEE Conference Athens 18th-22nd April EGEE is a project funded by the European Union under contract IST Geant4 Production in the LCG.
SkimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002 skimData based replica catalogue RLS (Replica Location.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
Working with AliEn Kilian Schwarz ALICE Group Meeting April
The CMS Simulation Software Julia Yarba, Fermilab on behalf of CMS Collaboration 22 m long, 15 m in diameter Over a million geometrical volumes Many complex.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
A New Tool For Measuring Detector Performance in ATLAS ● Arno Straessner – TU Dresden Matthias Schott – CERN on behalf of the ATLAS Collaboration Computing.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
AI in HEP: Can “Evolvable Discriminate Function” discern Neutral Pions and Higgs from background? James Cunha Werner Christmas Meeting 2006 – University.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Data Management The European DataGrid Project Team
Search for High-Mass Resonances in e + e - Jia Liu Madelyne Greene, Lana Muniz, Jane Nachtman Goal for the summer Searching for new particle Z’ --- a massive.
Data Management The European DataGrid Project Team
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
INFSO-RI Enabling Grids for E-sciencE UNOSAT and Geant4: Experiences of their merge in the LCG Environment Patricia Méndez Lorenzo.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
The GridPP DIRAC project DIRAC for non-LHC communities.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
GDB Meeting CERN 09/11/05 EGEE is a project funded by the European Union under contract IST A new LCG VO for GEANT4 Patricia Méndez Lorenzo.
Introduction to Particle Physics II Sinéad Farrington 19 th February 2015.
Grid development at University of Manchester Hardware architecture: - 1 Computer Element and 10 Work nodes Software architecture: - EasyGrid to submit.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Real Time Fake Analysis at PIC
Moving the LHCb Monte Carlo production system to the GRID
EasyGrid: a job submission system for distributed analysis using grid
INFN-GRID Workshop Bari, October, 26, 2004
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
US CMS Testbed.
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
gLite Job Management Christos Theodosiou
Presentation transcript:

Implementing Metadata Using RLS/LCG James Cunha Werner University of Manchester

Metadata Meeting - Grenoble 2005 James Werner Babar Experiment The BaBar experiment studies the differences between matter and antimatter, to throw light on the problem, posed by Sakharov, of how the matter-antimatter symmetric Big Bang can have given rise to today’s matter- dominated universe. High energy collisions between electrons and positrons produce other elementary particles, giving tracks and clusters which are recorded by several high granularity detectors and from which the properties of the short- lived particles can be deduced.

Metadata Meeting - Grenoble 2005 James Werner Each recorded collision, called an event, comprises a large volume of data, and thousand of millions of events are recorded, giving a total dataset size of hundreds of thousands of Gigabytes (or hundreds of Terabytes).

Metadata Meeting - Grenoble 2005 James Werner Sources of Data in Babar

Metadata Meeting - Grenoble 2005 James Werner Amount of data # FilesSize (TB)Events (Million) Run16, Run211, ,925 Run37, Run416, ,999 Run5 (2xRun4) ???32, Run6 (2xRun5) ???64, Run7 (2xRun6) ???128, SuperBabar ! Systematic errors >>> statistical errors Same amount of Monte Carlo Generated data!

Metadata Meeting - Grenoble 2005 James Werner Data Structure The user interface to the eventstore: event "collection". Each collection represents an ordered series of N events and a user can choose to read the events from the 1st one in the sequence or from any given offset into the sequence. Data components: – hdr - event header –usr - user data –tag - tag information –cnd - candidate information –aod - "analysis object data" –tru - MC truth data (only in MC data) –esd - "event summary data" –sim - "sim" data from BgsApp or MooseApp like GHits/GVertices (only in MC data) –raw - subset of raw data from xtc persisted in the Kanga eventstore

Metadata Meeting - Grenoble 2005 James Werner Data organisation How data are stored (level of detail): micro = hdr + usr + tag + cnd + aod (+ tru) mini = micro + esd Data access: collections - these are "logical" names that users use to configure their jobs. These are site-independent so (assuming the site has imported the data) the same collection name should work at any site. logical file names (LFN) - these are site-independent names give to all files in the eventstore. Any references within the event data itself _must_ use LFN's so that these remain valid when they are moved from site to site. physical file names (PFN) - these are file names that will vary from site to site. In practice they are usually derived from the LFN's by adding a prefix that encapsulates how the data is accessed at that site.

Metadata Meeting - Grenoble 2005 James Werner

Metadata Meeting - Grenoble 2005 James Werner Feeding RLS with metadata Generation of basic metadata file with files selection: #!/bin/bash BbkDatasetTcl --dbsite=local > MetaLista.txt cat MetaLista.txt | awk '// {print "BbkDatasetTcl --site local --nolocal \""$1"\"";}' >> geratcl chmod 700 geratcl./geratcl Feeding RLS with basic files #!/bin/bash ls *.tcl | awk '// {split($1,a,"."); print "edg-rm --vo babar cr file:///home/jamwer/PgmCM2/MetaData/"$1 " -l lfn:"a[1] " > " a[1]".rlstok";}' >> alimrls chmod 700 alimrls./alimrls

Metadata Meeting - Grenoble 2005 James Werner Conformity CE catalogue Run evaluation software to establish CE conformity and perform catalogue update. #!/bin/bash ldapsearch -x -H ldap://lcgbdii02.gridpp.rl.ac.uk:2170 -b 'Mds-vo- name=local,o=Grid' '(&(objectClass=GlueCE)(GlueCEAccessControlBaseRule=VO:baba r))' | grep "GlueCEUniqueID:" > cenames.txt cat cenames.txt | awk '// {print "./catal "$2;}' > subload.sh chmod 700 subload.sh./subload.sh cat loadrlssubm >> $1.histo cat $1.histo | awk ' /Sub/ {FileName=$2} /https/ {HandleName=$2; print "echo " HandleName "> " FileName".tok " }' >> gridtok chmod 700 gridtok./gridtok

Metadata Meeting - Grenoble 2005 James Werner Conformity validation Verify if site follow experiment standards: #!/bin/bash echo Hostname `/bin/hostname` echo Start time: `/bin/date` echo local=`pwd` echo “Babar initialisation ". $VO_BABAR_SW_DIR/babar-grid-setup-env.sh echo echo “Environment variables" printenv echo cd $local echo Arquivos disponiveis: $local ls echo echo " " echo cd $BFDIST/releases/ srtpath Linux24RH72_i386_gcc2953 cd $local BbkDatasetTcl --dbsite=local > MetaLista.txt cat MetaLista.txt | awk '// {print "BbkDatasetTcl --site local \""$1"\"";}' >> geratcl chmod 700 geratcl./geratcl export CE_NAME=$1 ls *.tcl | awk -v site=CE_NAME '// {split($1,a,"."); print "edg-rm --vo babar addAlias `cat " $1"` lfn:"a[1]"."site ;}' >> alimrls chmod 700 alimrls./alimrls echo echo " " echo echo End time: `/bin/date`

Metadata Meeting - Grenoble 2005 James Werner Analysis Submission to Grid Single command:./easygrid dataset_name Perform Handlers management and submission Configurable to achieve user’s requirements Software based in State-machine –Verify skimdata available: If not available perform BbkDatasetTCL to generate skimData. Each file will be a job. –Verify if there are handlers pending If not, script generation (gera.c) with edg-job-submit and ClassAdds, and script execution. Nest for submission policy and optimisation. If yes, verify job status. When the all jobs ended, recover results in user folder. (Prototype)

Metadata Meeting - Grenoble 2005 James Werner Job Submission system, metadata and data

Metadata Meeting - Grenoble 2005 James Werner Metadata/Event files and Computer elements For each dataset there is a metadata file containing the names of the event files. These physical files are registered with the RLS, with several logical file names in the format datsetname_CEJobQueue assigned to them as aliases, showing the CEs which contain copies of that dataset. Searching all the aliases for a dataset name provides a list of CEs to which jobs can be submitted.

Metadata Meeting - Grenoble 2005 James Werner Managing large files in Grid The analysis executable is allocated in the SE and its logical file name (LFN) is also catalogued in the RLS so any WN need download it only once. Metadata not only for data, but to support other files as well.

Metadata Meeting - Grenoble 2005 James Werner Gera Generation of all necessary information to submit the jobs on the Grid. –Job Description Language (JDL) files –the script with all necessary tasks to run the analysis remotely at a WN –some grid dependent analysis parameters. The JDL files define the input sandbox with all necessary files to be transferred WN balance load algorithm matches requirements to perform the task optimally.

Metadata Meeting - Grenoble 2005 James Werner Running analysis programs When the task is delivered in the WN, scripts start running to initialize the specific Babar environment, and the analysis software is downloaded.

Metadata Meeting - Grenoble 2005 James Werner Benchmarks The different behavior of electrons, hadrons, and muons can be distinguished. Performing this analysis takes 7 days using one computer 24 hours a day. Using 10 CPUs in parallel, accessed via the Grid, it took only 8 hours. Behavior of particles in the BaBar Electromagnetic Calorimeter (EMC )

Metadata Meeting - Grenoble 2005 James Werner Pi+- N Pi0 decays, with N= 1, 2, 3 and 4 Invariant masses of pairs of gammas, as measured by the EMC, from Pi0 decay produce a mass peak at 135 MeV (the peak in the plot). All other combinations are spread randomly around all energies (background). There were 81,700,000 events in the dataset and it took 4 days to run in production, with 26 jobs in parallel: to run it in one single computer would take more than 3 months.

Metadata Meeting - Grenoble 2005 James Werner Summary Easygrid is working and provides all job submission structure using LCG grid, RLS and metadata management. Provides handlers management transparent to the user. Easy to use !!! Configurable to achieve user’s requirements and maybe for other experiments as well. See homepage for more details. Thanks for the opportunity!