LHC Physics Analysis and Databases or: “How to discover the Higgs Boson inside a database” Maaike Limper.

Slides:



Advertisements
Similar presentations
Experimental Particle Physics PHYS6011 Joel Goldstein, RAL 1.Introduction & Accelerators 2.Particle Interactions and Detectors (2) 3.Collider Experiments.
Advertisements

First results from the ATLAS experiment at the LHC
Jörgen Sjölin Stockholm University LHC experimental sensitivity to CP violating gtt couplings November, 2002 Page 1 Why CP in gtt? Standard model contribution.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Introduction to Analysis Example tutorial D. Greenwood For P. Skubic 8/21-22/20141ASP2014 Grid School.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
September 27, 2005FAKT 2005, ViennaSlide 1 B-Tagging and ttH, H → bb Analysis on Fully Simulated Events in the ATLAS Experiment A.H. Wildauer Universität.
A simulation study of the rapidity distributions of leptons from W boson decays at ATLAS Laura Gilbert.
1 Techincolor and Heavy Top/B search Tulika Bose Meenakshi Narain (Brown University) Work done in the context of Les Houches 2007.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
CMS Masterclass. It’s the dawn of an exciting age of new discovery in particle physics! At CERN, the LHC and its experiments are underway. ATLAS and.
CMS Masterclass It’s a time of exciting new discoveries in particle physics! At CERN, the LHC and its experiments are underway. ATLAS and CMS, the.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
10/2/2015C.Kourkoumelis,UoA1 WP3 Educational and Outreach Portal C. Kourkoumelis, UoA.
1 A Preliminary Model Independent Study of the Reaction pp  qqWW  qq ℓ qq at CMS  Gianluca CERMINARA (SUMMER STUDENT)  MUON group.
2004 Xmas MeetingSarah Allwood WW Scattering at ATLAS.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
How to discover the Higgs Boson in an Oracle database Maaike Limper.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
CMS Masterclass It’s the dawn of an exciting age of new discovery in particle physics! At CERN, the LHC and its experiments are underway. ATLAS.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Physics analysis & databases M. Limper Part 1: SQL analytics (live physics analysis demo) Part 2: Chopped-up tables and partition-wise joins 20/03/2014.
Inclusive analysis of supersymmetry in EGRET-point with one-lepton events: pp → 1ℓ + 4j + E Tmiss + Х V.A. Bednyakov, S.N. Karpov, E.V. Khramov and A.F.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Full Dress Rehearsal (FDR1) studies Sarah Allwood-Spiers 11/3/2008.
The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.
1 ttbar Cross-Section Studies D. Jana*, M. Saleem*, F. Rizatdinova**, P. Gutierrez*, P. Skubic* *University of Oklahoma, **Oklahoma State University.
LCWS14, Belgrade, October M. Pandurović, H→WW at 350 GeV and 1.4 TeV CLIC Measurement of H→WW* in HZ at 350 GeV and WW fusion at 1.4 TeV CLIC.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
A New Tool For Measuring Detector Performance in ATLAS ● Arno Straessner – TU Dresden Matthias Schott – CERN on behalf of the ATLAS Collaboration Computing.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Physics Analysis inside the Oracle DB Progress report 10 Octobre 2013.
ATLAS Trigger Development
Study of pair-produced doubly charged Higgs bosons with a four muon final state at the CMS detector (CMS NOTE 2006/081, Authors : T.Rommerskirchen and.
Update on WH to 3 lepton Analysis And Electron Trigger Efficiencies with Tag And Probe Nishu 1, Suman B. Beri 1, Guillelmo Gomez Ceballos 2 1 Panjab University,
Fully Hadronic Top Anti-Top Decay (Using TopView) Fully Hadronic Top Anti-Top Decay (Using TopView) Ido Mussche NIPHAD meeting, Februari 9 th :
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
A first look at CitusDB & in- database physics analysis M. Limper 19/06/2014.
Search for Higgs portal Dark matter Tohoku Ayumi Yamamoto 11/5 2011/11/51.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
ATLAS Z-Path Masterclass Masterclass Analysis Intro.
An SQL-based approach to Physics Analysis M. Limper.
Régis Lefèvre (LPC Clermont-Ferrand - France)ATLAS Physics Workshop - Lund - September 2001 In situ jet energy calibration General considerations The different.
La Thuile, March, 15 th, 2003 f Makoto Tomoto ( FNAL ) Prospects for Higgs Searches at DØ Makoto Tomoto Fermi National Accelerator Laboratory (For the.
Z & W Boson Reconstruction with Monte Carlo Simulations and ATLAS DATA Adam Lowery Supervisor: Dr. Jonas Strandberg Wednesday, August 11, 2010.
H Y P A T I A HYbrid Pupil’s Analysis Tool for Interactions in Atlas
CMS Masterclass It’s a time of exciting new discoveries in particle physics! At CERN, the LHC succesfully completed Run I at 8 TeV of collision.
SEARCH FOR DIRECT PRODUCTION OF SUPERSYMMETRIC PAIRS OF TOP QUARKS AT √ S = 8 TEV, WITH ONE LEPTON IN THE FINAL STATE. Juan Pablo Gómez Cardona PhD Candidate.
Search for the Higgs in ttH (H  bb) Chris Collins-Tooth.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
 reconstruction and identification in CMS A.Nikitenko, Imperial College. LHC Days in Split 1.
Viktor Veszpremi Purdue University, CDF Collaboration Tev4LHC Workshop, Oct , Fermilab ZH->vvbb results from CDF.
Marcello Barisonzi First results of AOD analysis on t-channel Single Top DAD 30/05/2005 Page 1 AOD on Single Top Marcello Barisonzi.
Introduction to Particle Physics II Sinéad Farrington 19 th February 2015.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
Eric COGNERAS LPC Clermont-Ferrand Prospects for Top pair resonance searches in ATLAS Workshop on Top Physics october 2007, Grenoble.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
CERN Openlab: my transition from science to business
WP3 Educational and Outreach Portal
Introduction to Analysis Example Tutorial
H Y P A T I A HYbrid Pupil’s Analysis Tool for Interactions in Atlas
Map Reduce, Types, Formats and Features
Presentation transcript:

LHC Physics Analysis and Databases or: “How to discover the Higgs Boson inside a database” Maaike Limper

Introduction to LHC physics analysis The discovery of a “Higgs boson-like” particle! The work of thousands of people! Operations of LHC and its experiments rely on databases for storing conditions data, log files etc. … but the data-points in these plots did not came out of a database ! 2 Plots of the invariant mass of photon-pairs produced at the LHC show a significant bump around 125 GeV

ATLAS reconstruction and analysis 3 generation  simulation  digitization reconstruction analysis interactive physics analysis (thousands of users!) event analysis event analysis raw data event reconstruction event reconstruction analysis objects (extracted per physics topic) data acquisition event data taking event data taking event summary data simulated raw data ntuple1 ntuple2 ntupleN event simulation event simulation Global computing resources to store, distribute and analyse LHC data are provided by the Worldwide LHC Computing Grid (WLCG) which has more than 170 computing centres in 36 countries

Analysis versus reconstruction Z->  candidate, m  =93.4 GeV Event Reconstruction focuses on creating physics objects from the information measured in the detector (detectors hits  particle trajectory) Event Analysis focuses on interpreting information from the reconstructed objects to determine what type of event took place

Data analysis in practice ROOT-ntuples are centrally produced by physics groups from previously reconstructed event summary data Each physics group determines specific content of ntuple Physics objects to include Level of detail to be stored per physics object Event filter and/or pre-analysis steps ROOT-ntuples are centrally produced by physics groups from previously reconstructed event summary data Each physics group determines specific content of ntuple Physics objects to include Level of detail to be stored per physics object Event filter and/or pre-analysis steps 5 event summary data ntuple1 ntuple2 ntupleN Ntuples=column-based storage: data is stored as “TTree” object, with a “TBranch” for each variable Variables for each event in the form of scalar (number of muons), vectors (energy of each muon), vector-of-vectors (position of each detector hit for each muon) LHC Physics Analysis is done with ROOT Dedicated C++ framework developed by the High Energy Physics community, Provides tools for plotting/fitting/statistic analysis etc. LHC Physics Analysis is done with ROOT Dedicated C++ framework developed by the High Energy Physics community, Provides tools for plotting/fitting/statistic analysis etc.

Data analysis in practice 6 analysis interactive physics analysis (thousands of users!) event analysis event analysis analysis objects (extracted per physics topic) ntuple1 ntuple2 ntupleN Small datasets  copy data and run analysis locally Large datasets:  use the LHC Computing Grid Grid computing tools split the analysis job in multiple jobs each running on a subset of the data Each sub-job is sent to Grid site where input files are available Results produced by sub-jobs are summed Bored waiting days for all grid-jobs to finish  Filter data and produce private mini-ntuples Analysis is typically I/O intensive and runs on many files My Openlab Project: Can we replace the ntuple analysis with a model where data is analysed inside a centrally accessible Oracle database?

7 Z+H candidate event observed in ATLAS Higgs decays to two b-quarks, select good b-jets Z boson decays to lepton-pair, select two good muons or two good electrons Require specific Event Filter (EF) triggers to select events Require “good lumi-blocks” from Event Data Require Missing Transverse Energy (MET) less than 50 GeV to exclude top-pair events Higgs decays to two b-quarks, select good b-jets Z boson decays to lepton-pair, select two good muons or two good electrons Require specific Event Filter (EF) triggers to select events Require “good lumi-blocks” from Event Data Require Missing Transverse Energy (MET) less than 50 GeV to exclude top-pair events Physics Analysis Benchmark My performance study for analysis inside the database used as benchmark: “The search for a Higgs production in association with a Z boson” My performance study for analysis inside the database used as benchmark: “The search for a Higgs production in association with a Z boson”

Database design 8 My test-sample “ DATA12_8TEV” : ATLAS experiment data taken in 2012 with collission energy of 8 Tev 7.2 million events ~ 0.5% of all collision events recorded by ATLAS in 2012 Corresponds to 127 ntuple-files My test-sample “ DATA12_8TEV” : ATLAS experiment data taken in 2012 with collission energy of 8 Tev 7.2 million events ~ 0.5% of all collision events recorded by ATLAS in 2012 Corresponds to 127 ntuple-files 1366 variables, divided over 5 different tables Oracle DB has row-based storage: Separate tables for different physics objects, so users have to read only the object-tables relevant for their analysis Oracle DB has row-based storage: Separate tables for different physics objects, so users have to read only the object-tables relevant for their analysis The (simplified) Z+H benchmakr analysis uses 40 of these variables To run the analysis in the DB we need to transform a root-macro into a SQL-query

Physics Analysis C++ 9 vector el_pt; vector el eta; tree->getBranch(“el_pt”,&el_pt); tree->getBranch(“el_eta”,&el_eta); //etc. for ( ievent = 0 ; ievent<nevents ; ievent++){ //find good electrons tree->NextEvent(); for(i=0; i<nelectrons; i++){ if( el_pt[i] > 25. && fabs(el_eta[i])<2.5 etc.) ngoodelectron++; } //etc. for muon, jet, EF selections //select events with 2 selected muons or 2 selected electrons and 2 good b-jets …. //after passing selection cut reconstruct invariant mass, apply combined cuts etc. … // fill histograms } vector el_pt; vector el eta; tree->getBranch(“el_pt”,&el_pt); tree->getBranch(“el_eta”,&el_eta); //etc. for ( ievent = 0 ; ievent<nevents ; ievent++){ //find good electrons tree->NextEvent(); for(i=0; i<nelectrons; i++){ if( el_pt[i] > 25. && fabs(el_eta[i])<2.5 etc.) ngoodelectron++; } //etc. for muon, jet, EF selections //select events with 2 selected muons or 2 selected electrons and 2 good b-jets …. //after passing selection cut reconstruct invariant mass, apply combined cuts etc. … // fill histograms } Root-analysis: Load relevant branches in ntuple-tree, loop over events, apply selection cuts and fill histograms:

Physics Analysis SQL 10 select "EventNo_RunNo","EventNumber","RunNumber","DiMuonMass","DiElectronMass","DiJetMass" from sel_muon_events FULL OUTER JOIN sel_electron_events USING ("EventNo_RunNo") INNER JOIN sel_bjet_events USING ("EventNo_RunNo") INNER JOIN sel_MET_events USING ("EventNo_RunNo") INNER JOIN sel_EF_events USING("EventNo_RunNo") INNER JOIN sel_goodlbn_events USING("EventNo_RunNo") with sel_electron as (select "electron_i","EventNo_RunNo",etc from "electron" where "pt" < 25. and abs("eta") <2.5 …etc. ), sel_muon as (select "muon_i","EventNo_RunNo",etc from "muon" where "pt" < 20. or abs("eta") < etc. ), sel_bjet as ( select "jet_i","EventNo_RunNo”,etc from "jet" where "pt">25. and abs("eta") ), with sel_electron as (select "electron_i","EventNo_RunNo",etc from "electron" where "pt" < 25. and abs("eta") <2.5 …etc. ), sel_muon as (select "muon_i","EventNo_RunNo",etc from "muon" where "pt" < 20. or abs("eta") < etc. ), sel_bjet as ( select "jet_i","EventNo_RunNo”,etc from "jet" where "pt">25. and abs("eta") ), Single SQL-statement to reproduce physics analysis Followed by JOINs to find events with two b-jets and two muons or two electrons in which the invariant-mass of the electron/muon-pair and the b-jet pair is calculated and other combined selections are applied Calculations are done via PL/SQL functions except for one function for the b-jet selection that is called from C++-library Finally a super-join where the calculated quantities, used to fill histograms, are returned: Query starts by applying selection criteria via select-statements on relevant tables:

Ntuples vs DB performance 11 Both DB and ntuple analysis produce (almost) the same plot! From Oracle DatabaseFrom root-ntuples One double event in ntuple-analysis (due to overlapping trigger-streams) Oracle DB does not contain double events due to unique constraint on EventNumber

Ntuples vs DB performance 12 Analysis from Oracle database up to 4.5 times faster than standard ntuple analysis Improvement of query time with parallel execution is limited by I/O wait time Improvement of query time with parallel execution is limited by I/O wait time!

Many Physicists! 13 A real physics analysis database should be able to handle multiple users accessing the same data at the same time, with each user sending a unique analysis query corresponding to the signal they are looking for The effect of parallelism and I/O bandwidth is similar to many users accessing the same data

Hadoop vs Oracle 14 Hadoop is supposed to have fast I/O by using data-locality, I prepared a basic test: Physics-data stored as text-files in hadoop filesystem (hdfs) Reproduce Z+H benchmark analysis with MapReduce-code (java!) Mappers: one mappers per object to select muon, electron etc. Reduce: select events with 2 good leptons and 2 b-jets, calculate invariant mass Hadoop is supposed to have fast I/O by using data-locality, I prepared a basic test: Physics-data stored as text-files in hadoop filesystem (hdfs) Reproduce Z+H benchmark analysis with MapReduce-code (java!) Mappers: one mappers per object to select muon, electron etc. Reduce: select events with 2 good leptons and 2 b-jets, calculate invariant mass Compare the performance of the Oracle DB analysis with Hadoop! Tests used a 5-machine cluster that could run either Hadoop or Oracle RAC Oracle DB: 231 seconds (limited by iowait) Oracle DB: 231 seconds (limited by iowait) Hadoop: 299 seconds (limited by CPU) Hadoop: 299 seconds (limited by CPU)

Conclusion/Outlook 15 A central database running on a cluster of machines could provide a platform for physicists to perform analysis on data stored in the database I/O is a bottleneck, especially with many users accessing the same data Column-based data storage to be explored (some hints by Oracle this will be introduced in future 12c versions) To be continued… A central database running on a cluster of machines could provide a platform for physicists to perform analysis on data stored in the database I/O is a bottleneck, especially with many users accessing the same data Column-based data storage to be explored (some hints by Oracle this will be introduced in future 12c versions) To be continued… LHC data analysis provides a real “big data” challenge An unique benchmark to study the ability of the Oracle database to perform complex tasks on a large set of data