An SQL-based approach to Physics Analysis M. Limper.

Slides:



Advertisements
Similar presentations
Some Introductory Programming 1. Structured Query Language - used for queries. - a standard database product. 2. Visual Basic for Applications - use of.
Advertisements

CERN IT Department CH-1211 Geneva 23 Switzerland t Sequential data access with Oracle and Hadoop: a performance comparison Zbigniew Baranowski.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 7-1 David M. Kroenke’s Chapter Seven: SQL for Database Construction and.
By: Blake Peters.  OODB- Object Oriented Database  An OODB is a database management system in which information is represented in the form of objects.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Computer Science Standard Level Mastery Aspects. Mastery Item Claimed JustificationWhere Listed Arrays Used to store the student data Lines P.
Announcements Read JDBC Project Step 5, due Monday.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Component 4/Unit 6f Topic VI: Create simple querying statements for the database The SELECT statement Clauses Functions Joins Subqueries Data manipulation.
1 Robert Wijnbelt Health Check your Database A Performance Tuning Methodology.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
March 19981© Dennis Adams Associates Tuning Oracle: Key Considerations Dennis Adams 25 March 1998.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Chapter 9 Joining Data from Multiple Tables
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 7-1 David M. Kroenke’s Chapter Seven: SQL for Database Construction and.
CSE 3330 Database Concepts Stored Procedures. How to create a user CREATE USER.. GRANT PRIVILEGE.
1 Intro to JOINs SQL INNER JOIN SQL OUTER JOIN SQL FULL JOIN SQL CROSS JOIN Intro to VIEWs Simple VIEWs Considerations about VIEWs VIEWs as filters ALTER.
1 Copyright © 2004, Oracle. All rights reserved. Introduction to PL/SQL.
Relational Databases Database Driven Applications Retrieving Data Changing Data Analysing Data What is a DBMS An application that holds the data manages.
PL/SQLPL/SQL Oracle10g Developer: PL/SQL Programming Chapter 6 Functions.
PL/SQLPL/SQL Oracle11g : PL/SQL Programming Chapter 6 Functions.
Using Special Operators (LIKE and IN)
Triggers and Stored Procedures in DB 1. Objectives Learn what triggers and stored procedures are Learn the benefits of using them Learn how DB2 implements.
Evaluation of the C++ binding to the Oracle Database System Performance Issues and Benchmarks Dirk Geppert and Krzysztof Nienartowicz, IT/DB CERN IT Fellow.
How to discover the Higgs Boson in an Oracle database Maaike Limper.
CS146 References: ORACLE 9i PROGRAMMING A Primer Rajshekhar Sunderraman
Physics analysis & databases M. Limper Part 1: SQL analytics (live physics analysis demo) Part 2: Chopped-up tables and partition-wise joins 20/03/2014.
PL / SQL By Mohammed Baihan. What is PL/SQL? PL/SQL stands for Procedural Language extension of SQL. PL/SQL is a combination of SQL along with the procedural.
Nic Shulver Chris Introduction to databases Introduction Storage Temporary and Permanent Unstructured.
1 Copyright © 2004, Oracle. All rights reserved. Introduction to PL/SQL.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Programming in R SQL in R. Running SQL in R In this session I will show you how to: Run basic SQL commands within R.
ROOT I/O for SQL databases Sergey Linev, GSI, Germany.
PL/SQLPL/SQL Oracle10g Developer: PL/SQL Programming Chapter 3 Handling Data in PL/SQL Blocks.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
THE WEBMASTERS: SENG + WAVERING.  On account of construction, we will be having class in room 1248 next week.
Indexes and Views Unit 7.
LHC Physics Analysis and Databases or: “How to discover the Higgs Boson inside a database” Maaike Limper.
Chapter 13 Views Oracle 10g: SQL. Oracle 10g: SQL2 Objectives Create a view, using CREATE VIEW command or the CREATE OR REPLACE VIEW command Employ the.
Central Arizona Phoenix LTER Center for Environmental Studies Arizona State University Data Query Peter McCartney RDIFS Training Workshop Sevilleta LTER.
Oracle10g Developer: PL/SQL Programming1 Objectives SQL queries within PL/SQL Host or bind variables The %TYPE attribute Include queries and control structures.
Oracle11g: PL/SQL Programming Chapter 3 Handling Data in PL/SQL Blocks.
(SQL - Structured Query Language)
Physics Analysis inside the Oracle DB Progress report 10 Octobre 2013.
LINQ to DATABASE-2.  Creating the BooksDataContext  The code combines data from the three tables in the Books database and displays the relationships.
A first look at CitusDB & in- database physics analysis M. Limper 19/06/2014.
Oracle9i Developer: PL/SQL Programming Chapter 5 Functions.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
IFS180 Intro. to Data Management Chapter 10 - Unions.
3.5 Databases Relationships.
Analyzing the Database and Query Manager
Introduction to PL/SQL Programing
Efficiently Searching Schema in SQL Server
PHP / MySQL Introduction
EFFICIENT RANGE QUERY PROCESSING ON UNCERTAIN DATA
Chapter 4 Indexes.
CH 4 Indexes.
Database Processing: David M. Kroenke’s Chapter Seven:
CH 4 Indexes.
Oracle Memory Internals
Data Management Innovations 2017 High level overview of DB
Introduction To Structured Query Language (SQL)
Chapter 7 Using SQL in Applications
Spreadsheets, Modelling & Databases
Chapter 8 Advanced SQL.
Database SQL.
© 2014, Mike Murach & Associates, Inc.
Reports Report builder meets the challenge by making it easy to design, publish, and distribute professional, production-quality reports in a variety of.
Presentation transcript:

An SQL-based approach to Physics Analysis M. Limper

Intro The challenge: stress a database engine by forcing it to do physics analysis: Store analysis data in relational database Complicated SQL queries Calls to external C++ libraries Let the database take care of parallelism The challenge: stress a database engine by forcing it to do physics analysis: Store analysis data in relational database Complicated SQL queries Calls to external C++ libraries Let the database take care of parallelism 2

Analysis data in a relational database 3  Test with root-ntuples (D3PDs) produced for ATLAS top-physics group Sub-set of 7.2 million events (27 ntuples) ~4000 variables (“branches”) stored in event-tree  DB design uses different tables for different physics-objects Many columns per table

Physics Analysis in SQL 4 Make temporary tables using the WITH-AS statement: WITH goodmuons AS (SELECT … FROM muon WHERE pt>25.) JOIN statements on the RunNumber,EventNumber put information from the different selections together: SELECT … FROM good_muons INNER JOIN good_bjets USING (RunNumber,EventNumber) WHERE goodmuons.N=2 AND goodbjets.N=2 Simple calculations were written in (PL/)SQL Code from external C++ libraries was used for more complicated calculations

Test setup 5 nodes with 40 cores total 5 disk arrays each with 12 disks max. I/O reads of ~2500 MB/s 5 nodes with 40 cores total 5 disk arrays each with 12 disks max. I/O reads of ~2500 MB/s Compare performance on Oracle RAC with 1-job-per-core ntuple-analysis

Benchmark 1. Simplified Higgs+Z: compare simple root-macro with SQL-query returning same results  In both cases limited by iowait !  I/O reads for root-ntuple analysis 10x less than for DB 6 40 root-jobs: 71 seconds SQL parallel 40: 135 seconds

Benchmark root-jobs: 588 seconds SQL parallel 40: 372 seconds Ttbar cutflow: compare existing ‘root-core’ packages with modified version that constructs SQL-query

Conclusion SQL-based physics analysis using data stored in a relational database could reproduce results from root-ntuple analysis’ Database takes care of parallelism Row-based storage in combination with wide tables limits performance by the I/O read speed of the system 8