Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork1 Software Frameworks for HEP Data Analysis Vincenzo Innocente CERN/EP.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Database System Concepts and Architecture
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Object-Oriented Software Development CS 3331 Fall 2009.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Ideas on the LCG Application Architecture Application Architecture Blueprint RTAG 12 th June 2002 P. Mato / CERN.
Vincenzo Innocente, BluePrint RTAGNuts & Bolts1 Architecture Nuts & Bolts Vincenzo Innocente CMS.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
Chapter 22 Object-Oriented Design
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Object-Oriented Methods: Database Technology An introduction.
Distributed Systems: Client/Server Computing
CHEP `03 March 24, 2003 Vincenzo Innocente CERN/EP CMS Data Analysis: Present Status, Future Strategies Vincenzo.
The chapter will address the following questions:
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
Microsoft Visual Basic 2012 CHAPTER ONE Introduction to Visual Basic 2012 Programming.
Microsoft Visual Basic 2005 CHAPTER 1 Introduction to Visual Basic 2005 Programming.
What is Concurrent Programming? Maram Bani Younes.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.
Software Engineering Muhammad Fahad Khan
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
An Introduction to Software Architecture
©Ian Sommerville 2000 Software Engineering, 6th edition. Slide 1 Component-based development l Building software from reusable components l Objectives.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 07. Review Architectural Representation – Using UML – Using ADL.
LC Software Workshop, May 2009, CERN P. Mato /CERN.
Introduction To System Analysis and Design
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Component Technology. Challenges Facing the Software Industry Today’s applications are large & complex – time consuming to develop, difficult and costly.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
CORBA1 Distributed Software Systems Any software system can be physically distributed By distributed coupling we get the following:  Improved performance.
1 Chapter 1 Introduction to Databases Transparencies.
The BaBar Prompt Reconstruction Manager: a Real Life Example of a Constructive Approach to Software Development. Francesco Safai Tehrani Istituto Nazionale.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
5 Novembre 2001 Vincenzo Innocente AFT Agenda 1 AFT Tasks l Architecture l Framework l Framework specializations l Utility Toolkit l Graphics tools l Data.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
- LCG Blueprint (19dec02 - Caltech Pasadena, CA) LCG BluePrint: PI and SEAL Craig E. Tull Trillium Analysis Environment for the.
General requirements for BES III offline & EF selection software Weidong Li.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
Geant4 User Workshop 15, 2002 Lassi A. Tuura, Northeastern University IGUANA Overview Lassi A. Tuura Northeastern University,
CPT Week, November , 2002 Lassi A. Tuura, Northeastern University Core Framework Infrastructure Lassi A. Tuura Northeastern.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
Why is Design so Difficult? Analysis: Focuses on the application domain Design: Focuses on the solution domain –The solution domain is changing very rapidly.
KID - KLOE Integrated Dataflow
CMS High Level Trigger Configuration Management
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Vincenzo Innocente CERN/EP/CMC
Service-centric Software Engineering
SW Architecture SG meeting 22 July 1999 P. Mato, CERN
CIS16 Application Development – Programming with Visual Basic
An Introduction to Software Architecture
Use of GEANT4 in CMS The OSCAR Project
CMS Software Architecture
Planning next release of GAUDI
Presentation transcript:

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork1 Software Frameworks for HEP Data Analysis Vincenzo Innocente CERN/EP

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork2 Data Analysis Micro-Process Physics analysis is to a large degree an iterative process of Reducing data samples to more interesting subsets Distilling the sample into information at higher abstraction level By summarising lower level information By calculating statistical entities from the samples A large part of the work can be done on very high-level entities in an interactive analysis and presentation tool Hence focus on tools that work on simple summary information (DSTs, N-tuples, tag databases,...) Additional tools for detector and event visualisation Experiment Reduce Distil Interpret PhysicsPaper

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork3 HEP Experiment-Data Analysis Detector Control Online Monitoring Environmental data store Request part of event Simulation store Data Quality Calibrations Group Analysis User Analysis on demand Request part of event Request part of event Store rec-Obj and calibrations Quasi-online Reconstruction Request part of event Store rec-Obj Persistent Object Store Manager Database Management System Event Filter Object Formatter PhysicsPaper

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork4 Mission Get data from a HEP detector Publish result (mass and width of a particle decaying in e + e - couples) before those living on the other side of the Ring/Continent/Ocean Mission still the same: New challenges require innovative software solutions

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork5 Offline Architecture: New Requirements Bigger Experiment, higher rate, more data Larger and dispersed user community performing non trivial queries against a large event store Make best use of new IT technologies Increased demand of both flexibility and coherence ability to plug-in new algorithms ability to run the same algorithms in multiple environments guarantees of quality and reproducibility high-performance user-friendliness

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork6 Analysis Environments Real Time Event Filtering and Monitoring Data driven pipeline Highly reliability Pre-emptive Simulation, Reconstruction and Event Classification Massive parallel batch-sequential process Excellent error recovery and rollback mechanisms Excellent scheduling and bookkeeping systems Interactive Statistical Analysis Rapid Application Development environment Excellent visualization and browsing tools Human “readable” navigation

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork7 Migration Today Nobel price becomes trigger for tomorrow (and background the day after) Boundaries between running environments are fuzzy “Physics Analysis” algorithms should migrate up to the online to make the trigger more selective Robust batch systems should be made available for physics analysis of large data sample The result of offline calibrations should be fed back to online to make the trigger more efficient

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork8 File Distributed Data Store Data Browser Analysis job wizards Simulation Reconstruction PersistencyServices NetworkServices Coherent Analysis Environment Visualization BatchServices VisualizationTools AnalysisTools Software Development

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork9 The Challenge Beyond the interactive analysis tool (User point of view) Data analysis & presentation: N-tuples, histograms, fitting, plotting, … A great range of other activities with fuzzy boundaries (Developer point of view) Batch Interactive from “pointy-clicky” to Emacs-like power tool to scripting Setting up configuration management tools, application frameworks and reconstruction packages Data store operations: Replicating entire data stores; Copying runs, events, event parts between stores; Not just copying but also doing something more complicated—filtering, reconstruction, analysis, … Browsing data stores down to object detail level 2D and 3D visualisation Moving code across final analysis, reconstruction and triggers Today this involves (too) many tools

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork10 Collaborating Frameworks: The Enabling Technology

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork11 What a Framework is A Framework is a reusable “semi-complete” application that can be specialized to produce custom applications (R.Johnson in JOOP 1988) Frameworks Provide a default behavior Can be customized and extended by mean of OO techniques such as inheritance or object composition Frameworks are specific to a particular area: May provide system-level support services May encapsulate expertise at some application level May encapsulate expertise for a given problem domain

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork12 Framework Dynamics Customized Extension (client plug-in) Client API Framework API Flow of control Call backs Framework: Controls flow of execution Defines object interaction (implementing design patterns) Calls client (plug-in) functions May offer a traditional “client API” for integration in more specialized frameworks Clients specialize framework behavior: Inheriting from framework classes Overwriting their methods Instantiating other framework classes Interacting directly with other, more general, frameworks

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork13 What Frameworks are not Toolkit libraries ( C++ std, Posix, Nag-lib, CERNlib) a toolkit is passive: control stays in the user code Programs (PowerPoint, PAW) have a well defined behavior customization by “input parameters” Design Patters Abstract design and architecture knowledge Do not directly yield reusable code Languages (XML, Java, Python) New languages comes together with such a large set of support and application libraries that make them to be considered as frameworks for rapid application development, integration and/or communication

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork14 Framework-based Software Posix C++ std OpenGL System Libraries Support Frameworks Application Frameworks Problem Domain Framework Sub-Domain Framework Thread ODBMS GUI Network XML

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork15 Framework Architecture Reuse of application frameworks Common look&feel Uniform data-access Common problem-domain framework Consistent behavior Reuse of well established mechanisms Reduced maintenance, faster deployment, easer migration

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork16 Analysis & Reconstruction Framework ODBMS Geant3/4 CLHEP Paw Replacement C++ standard library Extension toolkit Reconstruction Algorithms Data Monitoring Event Filter Physics Analysis Calibration Objects Event Objects Configuration Objects Generic Application Framework Physics modules Utility Toolkit Specific Framework adapters and extensions

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork17 Why Frameworks Physicists concentrate on the development of reconstruction and analysis algorithms as plug-in modules Frameworks orchestrates instances of these modules hides system related complexities Allows for sharing of code for common or related tasks. Changes into the physics reconstruction and analysis logic affect only plug-ins Changes in system services, migration to new IT technologies, affect only the framework

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork18 Questions What is the role of an experiment-specific framework How it integrates with more generic frameworks How the user can have a coherent and consistent view of the Analysis process How new tools (new frameworks) can be integrated without disrupting the existing architecture

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork19 Difficult Balance The most profoundly elegant framework will never be reused unless the cost of understanding it and then reusing its abstractions is lower than the programmer’s perceived cost of writing them from scratch (G.Booch, 1994) Flexibility (many abstractions) Wide range of applications Great potentiality of extension and migration Difficult to understand, to use Rigidity (few abstractions, many concrete classes) Easy to use Limited range of applications Difficult to migrate, extend

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork20 Coherent, Monolithic Solution Framework Kernel is expanded to cover the whole problem domain User see The Framework New tools should be incorporated into the framework Imported classes should be modified to derive from framework base-classes to keep coherency Persistency is implemented by the framework Example: MS

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork21 Incoherent Solution The experiment kernel deals just with one problem: event processing External tools are kept as they are: Communication through I/O converters Persistency is just one (or more) of the external tools Users see a different environment for each part of the problem domain

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork22 Coherent, Non-invasive Solution Users see a standard environment that acts also as integration glue The experiment kernel is composed of a hierarchy of application-frameworks reusable in various parts of the problem domain External frameworks are integrated directly, if they conform to the standard environment, or through wrappers, if not. Persistency is encapsulated by one of the kernel application-frameworks

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork23 Python Python is an interpreted, object-oriented language introduced at the beginning of the `90s It had a fast spread particularly among scientific communities in search for a rapid application development tool able to integrate efficiently already existing, highly optimized, scientific software Python provides: Scripting functionalities such as Perl or Tcl Runtime dynamic loading A standard OO library for system level support Simple mechanisms for interfacing to C++ objects A large body of open-source modules covering a wide spectrum of application domains, scientific in particular

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork24 Python as a glue Integration in Python is non-intrusive Export to Python just the class interface: encapsulation is preserved Original (C++) representation is respected: no translation, no conversion Additional Python-specific extensions do not impact original design and functionalities Binding with Python is at Runtime Batch applications need not to be Python aware Interactive applications can be extended (actually constructed) and modified at runtime

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork25 Examples (personal experience): Exporting the interface of an application framework such as Objectivity/DB took few hours CERN/IT Physics analysis environment (ANAPHE) provides a complete Python binding (Lizard) which does not affect the core C++ library Seamless integration of CMS framework kernel (COBRA) and CERN/IT ANAPHE library through their (independent) python interface Direct application of other Python modules (regular expression, string/list manipulation, numerics, etc) on ANAPHE or COBRA objects

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork26 Lizard Qt plotter ANAPHE histogram Extended with pointers to CMS events Emacs used to edit CMS C++ plugin to create and fill histograms OpenInventor-based display of selected event Python shell with Lizard & CMS modules

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork27 Coherent Analysis Environment File Distributed Data Store PersistencyServices NetworkServices BatchServices Visualization Simulation Reconstruction VisualizationTools Data Browser Analysis job wizards AnalysisTools Software Development

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork28 HEP Data Event Collection CollectionMeta-Data Event Electrons Electrons Tracker Alignment Tracks Tracks Ecal calibration Ecal calibration User Tag (N-tuple) Event-Collection Meta-Data Environmental data Detector and Accelerator status Calibrations, Alignments (luminosity, selection criteria, …) … Event Data, User Data Navigation is essential for an effective physics analysis Complexity requires coherent access mechanisms

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork29 Framework for Persistency (DataBase) Persistency breaks encapsulation To store and retrieve an object it is required to know its concrete type and its complete state End-user developed converters (streamer operators) Reuse of classes that does give access to their full state to clients is impossible Stored schema by source parsing or user description Ideally just an extended virtual memory (in time and space) In reality much more to manage Access concurrence Tertiary storage Replication

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork30 DataBase Management System DBMS Server Distributed, Hierarchical, File Storage System Application (Distributed) DBMS Client Application Representation Persistent Data Representation Database internal Representation Database Storage (Server+Files) Tertiary Storage (Tapes) NETWORK

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork31 Successful DBMS Coherent data-view at problem-domain level Efficient data caching mechanism Variety of data&process distribution models Transparent and flexible interface to storage (disks and tapes) Cannot be achieved with a single product Requires a set of flexible, collaborating frameworks

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork32 Conclusions (Challenges) Today HEP Experiment Bigger, higher rate, more data, last longer Larger and dispersed user community IT Ubiquitous Develops fast Become obsolete even faster Traditional HEP analysis software architectures Monolithic Incoherent

Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork33 Conclusions (Solutions) Hierarchy of non-intrusive, loosely-connected Frameworks Easier Maintenance, Evolution, Migration Standard framework acting as “glue|” Easier integration Coherent user view Powerful flexible persistency mechanism Uniform Transparent data access