EasyGrid: a job submission system for distributed analysis using grid

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EasyGrid: a job submission system for distributed.
EasyGrid: the job submission system that works! James Cunha Werner GridPP18 Meeting – University of Glasgow.
Recent Results on Radiative Kaon decays from NA48 and NA48/2. Silvia Goy López (for the NA48 and NA48/2 collaborations) Universitá degli Studi di Torino.
Investigations of Semileptonic Kaon Decays at the NA48 Еxperiment Milena Dyulendarova (University of Sofia “St. Kliment Ohridski”) for NA48 Collaboration.
Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester Christmas Meeting
14 Sept 2004 D.Dedovich Tau041 Measurement of Tau hadronic branching ratios in DELPHI experiment at LEP Dima Dedovich (Dubna) DELPHI Collaboration E.Phys.J.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Particle Identification in the NA48 Experiment Using Neural Networks L. Litov University of Sofia.
Τ ± → π ± π + π - π 0 ν τ decays at BaBar Tim West, Jong Yi, Roger Barlow The University of Manchester Carsten Hast, SLAC IOP HEP meeting Warwick, 12 th.
Workload Management Massimo Sgaravatto INFN Padova.
Study of e + e  collisions with a hard initial state photon at BaBar Michel Davier (LAL-Orsay) for the BaBar collaboration TM.
Measurement of the Branching fraction B( B  D* l ) C. Borean, G. Della Ricca G. De Nardo, D. Monorchio M. Rotondo Riunione Gruppo I – Napoli 19 Dicembre.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Intercalibration of the CMS Electromagnetic Calorimeter Using Neutral Pion Decays 1 M. Gataullin (California Institute of Technology) on behalf of the.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
James Cunha Job Submission for Babar Analysis James Werner Resources:
EasyGrid Job Submission System and Gridification Techniques James Cunha Werner Christmas Meeting University of Manchester.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
1 Introduction to Dijet Resonance Search Exercise John Paul Chou, Eva Halkiadakis, Robert Harris, Kalanand Mishra and Jason St. John CMS Data Analysis.
Irakli Chakaberia Final Examination April 28, 2014.
Nick Brook Current status Future Collaboration Plans Future UK plans.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Possibility of tan  measurement with in CMS Majid Hashemi CERN, CMS IPM,Tehran,Iran QCD and Hadronic Interactions, March 2005, La Thuile, Italy.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
BaBar and the Grid Roger Barlow Dave Bailey, Chris Brew, Giuliano Castelli, James Werner, Fergus Wilson and Will Roethel GridPP18 Glasgow March 20 th 2007.
The CMS Simulation Software Julia Yarba, Fermilab on behalf of CMS Collaboration 22 m long, 15 m in diameter Over a million geometrical volumes Many complex.
AI in HEP: Can “Evolvable Discriminate Function” discern Neutral Pions and Higgs from background? James Cunha Werner Christmas Meeting 2006 – University.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
JPS 2003 in Sendai Measurement of spectral function in the decay 1. Motivation ~ Muon Anomalous Magnetic Moment ~ 2. Event selection 3. mass.
Mitchell Naisbit University of Manchester A study of the decay using the BaBar detector Mitchell Naisbit – Elba.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Mitchell Naisbit Manchester Jan2004 spectral function → a μ, α 1. Motivation Dominant uncertainty comes from experimentally determined hadronic loops:
Tau31 Tracking Efficiency at BaBar Ian Nugent UNIVERSITY OF VICTORIA Sept 2005 Outline Introduction  Decays Efficiency Charge Asymmetry Pt Dependence.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Review of PARK Reflectometry Group 10/31/2007. Outline Goal Hardware target Software infrastructure PARK organization Use cases Park Components. GUI /
Introduction to Particle Physics II Sinéad Farrington 19 th February 2015.
Grid development at University of Manchester Hardware architecture: - 1 Computer Element and 10 Work nodes Software architecture: - EasyGrid to submit.
David Lange Lawrence Livermore National Laboratory
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
GLAST LAT ProjectNovember 18, 2004 I&T Two Tower IRR 1 GLAST Large Area Telescope: Integration and Test Two Tower Integration Readiness Review SVAC Elliott.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
A Study on Leakage and Energy Resolution
Eleonora Luppi INFN and University of Ferrara - Italy
Matteo Negrini Frascati, Jan 19, 2006
LHCb Software & Computing Status
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
Simulation use cases for T2 in ALICE
Search for Pentaquarks at
Search For Pentaquark Q+ At HERMES
Data Analysis in Particle Physics
Measurements of some J/ and c decays at BES
Plans for checking hadronic energy
Claudio Bogazzi * - NIKHEF Amsterdam ICRC 2011 – Beijing 13/08/2011
B  at B-factories Guglielmo De Nardo Universita’ and INFN Napoli
Gridifying the LHCb Monte Carlo production system
Study of e+e collisions with a hard initial state photon at BaBar
Samples and MC Selection
p0 detection and reconstruction with the
Contents First section: pion and proton misidentification probabilities as Loose or Tight Muons. Measurements using Jet-triggered data (from run).
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Rare and Forbidden Decays in the Standard Model
Measurement of b-jet Shapes at CDF
The LHCb Computing Data Challenge DC06
Observation of non-BBar decays of (4S)p+p- (1S, 2S)
Presentation transcript:

EasyGrid: a job submission system for distributed analysis using grid James Cunha Werner jamwer2000@hotmail.com http://www.geocities.com/jamwer2002/

Develop grid software for BaBar experiment at University of Manchester BaBar is a high-energy physics experiment running since 1999 at Stanford University/SLAC to throw light on how the matter-antimatter symmetric Big Bang can have given rise to today’s matter-dominated universe. BaBar analysis was a conventional centralized software (850 packages). The project goal was to study grid performance and develop gridification algorithms – 5 papers published and 20 international talks.

Challenge: data distributed analysis… TauUser data: 18,000 files each user has thousands of different results 500,000,000 events raw data 800,000,000 simulated Monte Carlo events Raw data: 1,000,000 files / 20,000 categories 4,000,000,000 events raw data 4,000,000,000 simulated Monte Carlo events Massive computational resources are required. Grid computing is a strong candidate to provide them!

Main issues… Complex data management: Distributed datasets around the world and several other support databases (conditions, configuration, bookkeeping metadata, and parameters). Distributed and heterogeneous hardware platform around the world (standards). Users do not have grid skills.Their interests were high energy physics, not grid. Reliability/performance should be at least the same as SLAC. Users have a fixed time to do their research, they will use the more efficient resource.

LCG Grid Software Grid middleware developed by CERN / Switzerland and GridPP/UK. Homogeneous common ground in a heterogeneous platform. User interface Information system Resource broker Computer elements Worker node Storage Element Integration can be difficult for outsider users!

LCG around the world

EasyGrid: Job Submission system for grid It is an intermediate layer between Grid middleware and user’s software. It integrates data, parameters, software, and grid middleware doing all submission and management of several users’ software copies to grid. Performs DATA and TASK parallelism in grid. Web page: http://www.hep.man.ac.uk/u/jamwer/ Paper: http://www.geocities.com/jamwer2002/gridgeral.pdf

User software + Gridification algorithms Gridification Process: from conventional to grid computing. > Easygrid BetaMiniApp Tau11-Run3 File name Grid enabled software User software + Gridification algorithms > BetaMiniApp Tau11-Run3.tcl Data Gridification Functional Gridification User software EasyGrid Job Submission system Submit jobs Manage datasets Recover results Recover reports User computer Datasets Workload management Data Management Performance analysis Grid resources See http://www.hep.man.ac.uk/u/jamwer/Grid2006.pdf for more information

Job submission block diagram

Execution diagram

Data parallelism in Grid Each data file will be read by each copy of the binary code in parallel. EasyGrid Tasks: Copy binary code at closest storage elements. Set environment in each worker node. Start the binary code. Recover results in user’s directory. Provide information in case software fails. Tools for data management and replication.

Data gridification in action

Data gridification benchmarks

Particle identification Energy x Momentum for Tau 1N dataset. It contains 18,700,000 events. See http://www.hep.man.ac.uk/u/jamwer/index.html#06 Monte Carlo Simulation Real data Pions Kaons

Neutral pion decays BbkDatasetTcl selected 482,303,947 events in dataset Tau11-Run[1,2,3,4]-OnPeak-R14. Using easymoncar 4,890,000 events were simulated using Monte Carlo. Grid platform was used to run in parallel every data file selected by BbkDatasetTcl. Run3 run at Manchester and Run1,2,4 at RAL. Processing performance was 70,000 events per hour. See http://www.hep.man.ac.uk/u/jamwer/index.html#07

Rho 770 reconstruction from hadronic tau decay Parameters from Breit-Wigner mass distribution are: resonant mass 770 MeV, width 160 MeV and normalisation 4,500,000.

Search for anti deuteron The first task is to find where deuterons (and anti-deuterons) strapes will be in de/dx by momentum biparametric plots. The strapes correspond to Pions, kaons,protons and deuterons respectively. The anti-matter plot almost does not have anti-deuteron events. There were 800 jobs searching in 2 million events each. See http://www.hep.man.ac.uk/u/jamwer/index.html#08

NP hard optimization using Genetic Algorithms Job Shop Scheduling optimization using an always feasible map with genetic algorithm. 161 data tests running in GA and MC.

Some results from HEP users… Source Dr Marta Tavera Source Dr Mitchell Naisbit

Task parallelism in grid One master binary code (or client) requesting services and managing load flow. EasyGrid Tasks: Set a task queue. Search information system for services published in grid. Establish sections in each worker node. Start services and initialize software. Send data for processing in each server. Manages processing and re-submit in case of fail. Manages notification and recover results in master.

Task gridification in action

Task gridification benchmark

Neutral Pion discrimination Neutral Pions decays into 2 Gammas, detected by BaBar’s Electromagnetic Calorimeter. Two background gammas could have neutral pion invariant mass just by chance. How to discriminate them using artificial intelligence ???

Discriminate Functions Mathematical model obtained with GP maps the variables hyperspace to a real value through the discriminator function, an algebraic function of kinematics variables. Applying the discriminator to a given pair of gammas: if the discriminate value is bigger than zero, the pair of gammas is deemed to come from pion decay. Otherwise, the pair is deemed to come from another (background) source. Paper: http://www.hep.man.ac.uk/u/jamwer/gphep.pdf Poster: http://www.hep.man.ac.uk/u/jamwer/IoP2007.ppt

Discriminate function Methodology 1. Obtaining Discriminate Function (DF): Discriminate function Select Real / background events MC data Training data GP 2. Test DF accuracy: Test data 3. Selecting events for superposition: MC data Raw data

Training data: 2 red(0) 2 green(1) Selection criteria: 0 – red 1 - green DF Test data: 3 red(0) 3 green(1)

Running Genetic Programming with Grid computing Reverse Polish Notation The population size is 500 individuals; Crossover and mutation probabilities are 60% and 20% respectively. Every generation, 20 best individuals are copied as they are (without crossover and mutation) and half population is generated randomly and replace the worse individuals. Algebraic operators have been used with kinematics data. The service we have distributed in grid was fitness evaluation, in parallel by many WN . 482,303,947 BaBar’s detector events and 20,489,668 MC events

Training GP to obtain NPDF Monte Carlo (MC) generators integrates particle decays models with detector’s system transfer function. MC events contain all information from each track particle and gamma radiation, which allows select high purity training dataset (96%+). Events with real neutral pion were selected and marked as “1”. Events without real pions into MC truth and invariant mass reconstruction in the same region of real neutral pions where also selected and marked as “0”.

Energy cuts all gammas without energy cut (60,000 real and background records for training, and 60,000 real and 44527 background for test), more energetic than 30 MeV electronics’ noise threshold (32,000 real and background records for training and test), more energetic than 50 MeV (15,000 real and background records for training and test), more energetic than 30MeV, lateral moment between 0.0 and 0.8, and have hit more than one crystal in the electromagnetic calorimeter - the conventional cut for neutral pion(16,000 real and background records for training and test).

NPDF Final results -α: Sensitivity or efficiency. -β: specificity or purity. -γ: accuracy.

Neutral Pion Energy Distribution Cumulative plot of energy distribution for 1, 2, 3 and 4 neutral pion decays using all gammas NPDF. Contamination effect can be seen from MC energy distribution. The agreement between Monte Carlo and experimental data is conclusive about method’s convergence and accuracy.

Hadronic tau decays results

Summary Available since GridPP11 - September/2004: http://www.gridpp.ac.uk/gridpp11/babar_main.ppt Several benchmarks with BaBar experiment data: Data Gridification: Particle identification: http://www.hep.man.ac.uk/u/jamwer/index.html#06 Neutral pion decays: http://www.hep.man.ac.uk/u/jamwer/index.html#07 Search for anti deuteron: http://www.hep.man.ac.uk/u/jamwer/index.html#08 Functional gridification: Evolutionary neutral pion discriminate function: http://www.hep.man.ac.uk/u/jamwer/index.html#13 Documentation (main web page): http://www.hep.man.ac.uk/u/jamwer/ 109 html files and 327 complementary files 60 CPUs production and 10 CPUs development farms running independently without any problem between November/2005 and September /2006.

Dissemination 20 international events: http://www.hep.man.ac.uk/u/jamwer/index.html#10 5 refereed papers Int. Conferences. GridPP stand at IoP2006 and IoP2007. Contributions at GridPP web pages. http://www.gridpp.ac.uk/posters/

Further development in LHC: Higgs to +0j H+0j

Conclusion EasyGrid is a framework for distributed analysis that works very well providing task and functional gridification capabilities. Genetic programming approach obtains neutral pion discriminate function to discern between background and real neutral pion particles. Background can produce a critical influence in systematic errors and constrain qualitative analysis. Results from hadronic tau decays analyzed in this paper showed genetic programming discriminate function has an important role in background reduction, improving analysis quality. The use of NPDF will allow the study of observable and check with values obtained from theoretical Standard Model, from a sample of events with high purity.