Maria Grazia Pia, INFN Genova 1 Data analysis with R in an experimental physics environment Andreas Pfeiffer (CERN) and Maria Grazia Pia (INFN Genova)

Slides:



Advertisements
Similar presentations
Dr Andy Pryke - The Data Mine Ltd An Introduction to R Free software for repeatable statistics, visualisation and modeling Dr Andy Pryke, The Data Mine.
Advertisements

Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit
Maria Grazia Pia, INFN Genova 1 Part IV Geant4 results.
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
Physicist Interfaces Project an overview Physicist Interfaces Project an overview Jakub T. Moscicki CERN June 2003.
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Automated Testing Ted Driggs (tdriggs). What Verify program behavior without human interaction Programmatically load and run test code on a wide array.
Database System Concepts and Architecture
1. 2 Captaris Workflow Microsoft SharePoint User Group 16 May 2006.
Ed-Fi 1.1 Request for Comment Webinar 5 Audio lines are muted during the presentation Please use the chat box to type your questions.
How to improve your Data Analysis Processes in your Web Application / ERP using RClass Juan Antonio Breña Moral
User Interface The full set of lecture notes of this Geant4 Course is available at
Maria Grazia Pia IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst.
Teaching with Greenfoot
Maria Grazia Pia Geant4 LowE Workshop 30-31/5/2002 ow Energy e.m. Workshop CERN, May 2002.
JAS in SDA. My Experience My assignment was to use JAS to read SDA data and make plots. –I used OSDA and OSDAphysics to read SDA data. OSDA and OSDAphysics.
JAS – Distributed Data Analysis Grid Enabled Analysis Workshop Caltech - June 23-25, 2003.
Usage of the Python Programming Language in the CMS Experiment Rick Wilkinson (Caltech), Benedikt Hegner (CERN) On behalf of CMS Offline & Computing 1.
Analysis with Geant4 and AIDA Tony Johnson SLAC-Geant4 Workshop February 2002 Tony Johnson.
Susanna GuatelliGeant4 Workshop 2004 Use of Analysis Tools Geant4 Workshop 2004, Catania Susanna Guatelli, INFN Genova.
Introduction of Some Useful Free Software Cheng-Han Du.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
JAS3 – Current Status and Prospects by Victor Serbo, SLAC.
Java Analysis Studio Status Update 12 May 2000 Altas Software Week Tony Johnson
Advanced Analysis Environments What is the role of Java in physics analysis? Will programming languages at all be relevant? Can commercial products help.
JAS3 + AIDA LC Simulations Workshop SLAC 19 th May 2003.
JAIDA, AIDA-JNI, JAS3 Status and Plans Mark Dönszelmann, Tony Johnson, Joseph Perl, Victor Serbo, Max Turri AIDA Workshop CERN July 2003.
IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research KEK, Tsukuba, December 2003
ALCPG Software Tools Jeremy McCormick, SLAC LCWS 2012, UT Arlington October 23, 2012.
GMT: The Generic Mapping Tools Paul Wessel, Walter H.F. Smith and the GMT team.
Java Root IO Part of the FreeHEP Java Library Tony Johnson Mark Dönszelmann
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
Maria Grazia Pia, INFN Genova Test & Analysis Project aka “statistical testing” Maria Grazia Pia, INFN Genova on behalf of the T&A team
Writing Extension Modules (Plugins) for JAS 3 Mark Donszelmann Tony Johnson Victor Serbo Max Turri CHEP2004, 27 september-1 october 2004, Interlaken, Switzerland.
V. Serbo, SLAC ACAT03, 1-5 December 2003 Interactive GUI for Geant4 by Victor Serbo, SLAC.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
Using JAS3 for LCD Analysis Tony Johnson 20 th May 2003.
Geant4 Space User Workshop 2004 Maria Grazia Pia, INFN Genova Proposal of a Space Radiation Environment Generator interfaced to Geant4 S. Guatelli 1, P.
JAS3 - A general purpose data analysis framework for HENP and beyond Tony Johnson, Victor Serbo, Max Turri, Mark Dönszelmann, Joseph Perl SLAC.
1 Control Software (CAT) Introduction USB Interface implementation Calorimeter Electronics Upgrade Meeting Frédéric Machefert Wednesday 5 th May, 2010.
SiD Workshop October 2013, SLACDmitry Onoprienko SiD Workshop SLAC, October 2013 Dmitry Onoprienko SLAC, SCA FreeHEP based software status: Jas 3, WIRED,
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
CLAS12 software paradigm My two cents… Vardan Gyurjyan Friday, August 26, 2005.
GranSasso, Jul-2002 Andreas Pfeiffer, CERN/IT-API, AIDA Abstract Interfaces for Data Analysis Andreas Pfeiffer CERN IT/API
Update on IS performance issues Classes for storing and retrieving scan data from IS GUI updates –Scripting support –Histogramming Overview.
Java Analysis Studio - Status CHEP 98 - September 1998 Tony Johnson - SLAC Jonas Gifford + Kevin Garwood - University of Victoria.
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
Visualization of Geant4 Data: Exploiting Component Architecture through AIDA, HepRep, JAS and WIRED Geant4 Workshop, CERN - 2 October 2002 Joseph Perl.
JAS and JACO – Status Report Atlas Graphics Group August 2000 Tony Johnson.
Summary of the AIDA workshop AIDA Workshop, July What is AIDA  AIDA defines today interfaces for some common analysis data objects  IHistogram,
Maria Grazia Pia, INFN Genova and CERN1 Geant4 highlights of relevance for medical physics applications Maria Grazia Pia INFN Genova and CERN.
AIDA Abstract Interfaces for Data Analysis Massimiliano Turri, SLACCHEP, La Jolla, March “The goal of the AIDA project is to define abstract.
Maria Grazia Pia, INFN Genova - G4 WG Coord. Meeting, 13/11/2001 ow Energy Electromagnetic Physics ow Energy Electromagnetic Physics New physics features.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Discussion with Blueprint RTAG August 2002 Tony Johnson SLAC.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
Outline SOAP and Web Services in relation to Distributed Objects
Outline SOAP and Web Services in relation to Distributed Objects
Potential use of JAS/JAIDA etc. SAS J2EE Review
R Programming.
Project Status and Plan
What's New in eCognition 9
Introductory Course PTB, Braunschweig, June 2009
Andreas Pfeiffer, CERN/IT,
Introductory Course ORNL, May 2008
What's New in eCognition 9
What's New in eCognition 9
Presentation transcript:

Maria Grazia Pia, INFN Genova 1 Data analysis with R in an experimental physics environment Andreas Pfeiffer (CERN) and Maria Grazia Pia (INFN Genova) IEEE NSS October – 2 November 2013 Seoul, Korea

Maria Grazia Pia, INFN Genova 2 Daily work © 2013 CERN, for the benefit of the CMS Collaboration (License: CC-BY-SA-3.0)

Maria Grazia Pia, INFN Genova 3 Background In the old days… simulation histograms, ntuples analysis You are free to use whatever you want Nowadays… Text file AIDA implementation ROOT … GnuPlot MATLAB iAIDA JAS Open Scientis t PAIDA ROOT …

Maria Grazia Pia, INFN Genova 4 Different conceptual models begin of run Create histograms, ntuples event loop Fill (accumulate) histograms, ntuples end of run Store histograms, ntuples data analysis data data analysis Strong as a producer of analysis objects Outstanding data analysis capabilities

Maria Grazia Pia, INFN Genova 5 AIDA - Abstract Interfaces for Data Analysis Started in 1999, defining full set of interfaces The goal of the AIDA project is to define abstract interfaces for common physics analysis objects, such as histograms, ntuples, fitters. The adoption of these interfaces makes it easier for developers and users to select and use different tools without having to learn new interfaces or change their code. In addition it is possible to exchange data (objects) between AIDA compliant applications through a standard XML format. The goal of the AIDA project is to define abstract interfaces for common physics analysis objects, such as histograms, ntuples, fitters. The adoption of these interfaces makes it easier for developers and users to select and use different tools without having to learn new interfaces or change their code. In addition it is possible to exchange data (objects) between AIDA compliant applications through a standard XML format.

Maria Grazia Pia, INFN Genova 6 AIDA objects Data types Histograms 1D, 2D, 3D as statistical entities also dynamic/unbinned ones (Clouds) Profile Histograms Ntuple DataPoints vectors of free form N-dim data with errors Non-data types Annotations to add statistics/summary and free form info provided by user (key/value pairs) Fitter, Functions, Plotter, Analyzer Defined XML format for data storage.aida files (compressed XML)

Maria Grazia Pia, INFN Genova 7 AIDA implementations Modular design to maximise flexibility Factory pattern to create objects Plugin modules for different implementations e.g. native, Root, HBook stores to read/write histograms and tuples Implementations of interfaces in several languages C++ iAIDA - OpenScientist - Java JAS (Java Analysis Studio) - Python paida - Flexibility through XML data interchange format.aida files can be read by all across languages

Maria Grazia Pia, INFN Genova 8 R R is a language and environment for statistical computing and graphics Similar to S (can be considered as a different implementation of S) GNU project Provides a wide variety of statistical and graphical tools It is highly extensible Used by a huge multi-disciplinary community Strong at producing well-designed, publication-quality plots Runs on Windows, MacOs X, Linux (various distros) In 1998 John Chambers won the ACM Software Systems award for the S language, which the ACM heralded as having "forever altered how people analyze, visualize, and manipulate data

Maria Grazia Pia, INFN Genova 9 Getting the best of both worlds A lightweight system for dealing with analysis objects in experimental software scenarios A powerful, extensible data analysis system A transparent bridge between the two begin of run Create histograms, ntuples event loop Fill histograms, ntuples end of run Store histograms, ntuples data analysis

Maria Grazia Pia, INFN Genova 10 aidar - Interfacing AIDA with R Interface to read AIDA XML files into R Exploiting the power of R for analysis Using the existing XML package in R aidar converts AIDA objects from (XML) file into data.frames Histograms, Clouds, Profiles, Ntuples getFileInfo( ) to get overview Developer version available from github: Easy install via devtools package (see Readme on github) Plan to have it as regular CRAN module by end November Seamless data production and analysis, transparent use in R

Maria Grazia Pia, INFN Genova 11 Initialization (start of run) // Create the analysis factory and the tree factory … // Creating a tree mapped to a new XML file std::auto_ptr tree( tf->create( "comptoncs.xml", "xml", readOnly, createFile, "uncompressed" ) ); // Create a tuple factor and a histogram factory … // Create a 1D histogram AIDA::IHistogram1D* hSigma = hf->createHistogram1D("10","Cross section", 100,0.,1.); // Create a ntuple AIDA::ITuple* ntuple = tpf->create( "1", "Compton cross section, "float z, e, lib, pen, std" ); // Create the analysis factory and the tree factory … // Creating a tree mapped to a new XML file std::auto_ptr tree( tf->create( "comptoncs.xml", "xml", readOnly, createFile, "uncompressed" ) ); // Create a tuple factor and a histogram factory … // Create a 1D histogram AIDA::IHistogram1D* hSigma = hf->createHistogram1D("10","Cross section", 100,0.,1.); // Create a ntuple AIDA::ITuple* ntuple = tpf->create( "1", "Compton cross section, "float z, e, lib, pen, std" ); // Do your calculations in the event/track loop … // Fill histogram hSigma->fill(sigmaEPDL); // Add data row to Ntuple ntuple->fill( ntuple1->findColumn( "z" ), z ); ntuple->fill( ntuple1->findColumn( "e" ), e ); ntuple->fill( ntuple1->findColumn( lib" ), sigmaEPDL ); ntuple->fill( ntuple1->findColumn( "pen" ), sigmaPenelope ); ntuple->fill( ntuple1->findColumn( "std" ), sigmaStandard ntuple->addRow(); // Do your calculations in the event/track loop … // Fill histogram hSigma->fill(sigmaEPDL); // Add data row to Ntuple ntuple->fill( ntuple1->findColumn( "z" ), z ); ntuple->fill( ntuple1->findColumn( "e" ), e ); ntuple->fill( ntuple1->findColumn( lib" ), sigmaEPDL ); ntuple->fill( ntuple1->findColumn( "pen" ), sigmaPenelope ); ntuple->fill( ntuple1->findColumn( "std" ), sigmaStandard ntuple->addRow(); Event loop // Committing the transaction with the tree tree->commit(); tree->close(); // Committing the transaction with the tree tree->commit(); tree->close(); End of run e.g.

Maria Grazia Pia, INFN Genova 12 R session Load devtools and aidar packages histoFile = "comptoncs.xml t1 = getTuple(histoFile, '1') plot(t1$e, t1$lib, …) T1 (AIDA ntuple) gets converted into a R data.frame

Maria Grazia Pia, INFN Genova 13 Recent Geant4 validation

Maria Grazia Pia, INFN Genova 14 It works! This conference: N29-4, Physics Methods for the Simulation of Photoionization N29-5, Validation of Compton Scattering Monte Carlo Simulation Models NPO2-141, Validation of Geant4 Electron Pair Production by Photons

Maria Grazia Pia, INFN Genova 15 Conclusions Bridge between two conceptually different data analysis scenarios Addresses typical use cases in daily experimental practice Best of two worlds Transparent to users Lightweight, robust analysis system for data production Powerful system for data elaboration and graphics Use it! Feedback from the experimental community is welcome