RooFit – Open issues W. Verkerke. Datasets Current class structure Data representation –RooAbsData (abstract base class) –RooDataSet (unbinned [weighted]

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

Chapter 12: File System Implementation
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
File Systems.
CMSC 414 Computer (and Network) Security Lecture 13 Jonathan Katz.
Observer Method 1. References Gamma Erich, Helm Richard, “Design Patterns: Elements of Reusable Object- Oriented Software” 2.
Allocation Methods - Contiguous
Data Structures Hash Tables
File System Implementation
File System Implementation
Iterators T.J. Niglio Computer & Systems Engineering Fall 2003 Software Design & Documentation Object Behavioral.
Tutorial 6 & 7 Symbol Table
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Software Requirements
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Objectives of the Lecture :
Building a XanEdu CoursePack Copyright 2004 ProQuest Information and Learning Company. All rights reserved.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
CEN Network Fundamentals Chapter 19 Binding Protocol Addresses (ARP) To insert your company logo on this slide From the Insert Menu Select “Picture”
Review C++ exception handling mechanism Try-throw-catch block How does it work What is exception specification? What if a exception is not caught?
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
Announcement Resources ARC Announcement_Issues Group Name: WG2 Source: Barbara Pareglio, NEC Meeting Date: Agenda Item: Input Contribution.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Lecture 8 – Cookies & Sessions SFDV3011 – Advanced Web Development 1.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
SE: CHAPTER 7 Writing The Program
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Towards a High-Level Petri Net Type DefinitionWorkshop on Interchange Formats for Petri Nets 1/18 June 26, 2004 Towards a High-Level Petri Net Type Definition.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Design Patterns Gang Qian Department of Computer Science University of Central Oklahoma.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Design Patterns -- Omkar. Introduction  When do we use design patterns  Uses of design patterns  Classification of design patterns  Creational design.
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
C# Interfaces C# Class Version 1.0. Copyright © 2012 by Dennis A. Fairclough all rights reserved. 2 Interface  “A surface forming a common boundary between.
Introduction to c++ programming - object oriented programming concepts - Structured Vs OOP. Classes and objects - class definition - Objects - class scope.
Proxy, Observer, Symbolic Links Rebecca Chernoff.
CE Operating Systems Lecture 17 File systems – interface and implementation.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Class Builder Tutorial Presented By- Amit Singh & Sylendra Prasad.
LANDESK SOFTWARE CONFIDENTIAL Tips and Tricks with Filters Jenny Lardh.
Interfaces About Interfaces Interfaces and abstract classes provide more structured way to separate interface from implementation
Design Patterns Software Engineering CS 561. Last Time Introduced design patterns Abstraction-Occurrence General Hierarchy Player-Role.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
Intermediate 2 Computing Unit 2 - Software Development.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
ITM © Port,Kazman 1 ITM 352 Cookies. ITM © Port,Kazman 2 Problem… r How do you identify a particular user when they visit your site (or any.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
1 Management of Offline SLE Services SLe-SM Red-1 RID GSFC-09-JP John Pietras.
1 Software Requirements Descriptions and specifications of a system.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Module 11: File Structure
Chapter 14: System Protection
Sequences and Iterators
CMS RooStats Higgs Combination Package
File System Implementation
Modern Systems Analysis and Design Third Edition
SQL – Application Persistence Design Patterns
Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Chapter 14: Protection.
Presentation transcript:

RooFit – Open issues W. Verkerke

Datasets Current class structure Data representation –RooAbsData (abstract base class) –RooDataSet (unbinned [weighted] data) –RooDataHist (binned data) Data storage –RooAbsDataStore (abstract base class) –RooTreeDataStore (TTree based storage) used by both RooDataSet and RooDataHist –RooCompositeDataStore Used by RooDataSet when combining external datasets with Link() rather than Import() –Since there are 2 concrete implementations, most RooFit code already adapted to concept that storage type is not necessarily tree-based (e.g. virtual copy construction through clone functions etc)

Open issues in datasets - storage Project: New STL vector-based storage implementation –May be (much) faster that TTree-based datastore Work needed –Develop new class RooVectorDataStore –Inherits from RooAbsDataStore, implements full functionality of RooTreeDataStore (including support for append/merge/rename operations, storing of ‘cache’ columns). Must be persistable, support string and category data types as well –Workload: 3 days of work –Once done, need to add cmdline option to RooDataSet/Hist to use this alternate storage technique [easy] –Workload: 0.5 days of work –Add new stressRooFit test module that exercises this type of storage –Workload: 0.5 days of work –Need to validate that RooCompositeDataStore works fine with RooVectorDataStores (should be OK) –Workload 0.5 days of work

Open issues in datasets - representation Request for new mixed binned-unbinned data representation type Work needed –Fixed feature is a ‘master category’ variable that indexes the various data subsets. –Write class RooMixedData to represent this. –Need work out precise functionality and interface of such a class Several concepts of binned data not available for unbinned data and vice versa (see next slide) –Could make class that only implement common aspects (as defined in RooAbsData), but in practice only useable as read-only class. OK? –Is (typed) access to component representation needed, i.e. do you need to be able to see subset [i] as a RooDataHist or RooDataSet, (not handled via composite storage scheme, but could be added a separate layer: i.e. RooMixedData owns multiple RooDataHist and RooDataSet objects that each own their own storage, then link their storage objects to a RooCompositeDataStore for unified view. –Workload: ~1 week (depending on what design/interface issues will appear…)

Functionality of RooDataSet/RooDataHist OperationRooDataHistRooDataSet add(RooArgSet)Increase weight of corresponding bin Add data point append(RooAbsData)Add all points merge(RooDataSet)UNDEFINEDAdd columns from imported dataset addColumn(RooAbsArg)UNDEFINEDAdd columns with values of given function set(RooArgSet&,dbl)Set weight of given point to given value UNDEFINED binVolume(RooArgSet&)Return volume of bin in given (subset) of dimensions UNDEFINED weightError()Return error on given weight() UNDEFINED

Open issues in datasets - representation Representation of number-counting data Now –Regular PDF: Gauss(x)  RooDataSet(x) with N entries –Extended PDF: Gauss(x)*Poisson(N)  RooDataSet(x) with N entries –Number-counting PDF: should be (in analogy) Poisson(N)  RooCountingData( ) with N entries but we don’t have that. –Can do: Poisson(N)  RooDataSet(N) with 1 entry but that doesn’t (automatically) behave in the right way. –Also requires some thinking on the PDF-side… –Two ways to go

Open issues in datasets - representation Path #1 (Kyle proposal) –Need to label (any) pdf explicit as ‘number counting’ pdf –Effect is that generate() fills a dataset with 1 entry representing the event count, rather than N entries of a dummy observable where the dataset size represents the event count –Possible issue: Special meaning of counting data only clear in contact of (labeled) pdf that generated it, unless data is also labeled itself in some way. [ E.g when calculating total event count of a composite dataset need to know if RooDataSet with 1 entry counts as 1 or as N, simular issue when asking for event count of component dataset ] Path #2 (My original proposal) –Make a wrapper class that represents any pdf as a number counting pdf, e.g. class RooCountingPdf, e.g. ws.factory(“CountingPdf::Nexp(Poisson(Nobs,mu))”) ; –Net effect of class is to redirect output of RooAbsPdf::getVal() to RooAbsPdf::expectedEvents() Return class of type RooCountingData() when generate is called –Requires writing of a class RooCountingData which can be extremely lightweight & fast (just contains 1 double) –Adapt class RooMixedData to be able to also contain RooCountingData –Data and pdf are both self-labeling in terms of interpretation. Should be straightforward to use this in existing RooFit code [ but need to check if there is code that assumes at least one ‘observable’ ] Workload: either way 2-3 days

Conceptual issues with simultaneous pdf / data Need more flexibility in mixing/matching different pdfs Eg sim[ F(x), G(y) | i ] –Will work technically, but fundamental issue is that meaningful observables depend on index I –Unwanted side-effects of present construction: generate() will make random y variable for generation of F(x), and random x variable for generation of G(y). Datasets will always allocate entries for x and y for both dataset subsets (results in a waste of space, especially if x,y are binned) Need several items to resolve this –Composite datasets, where each subset only stores selected observables [ need: a mechanism to specify this ] –A mechanism in RooSimultaneous::generate() to only generate the “relevant” observables for each state [ need: same mechanism to specify this ] –Will need to change RooSimultaneous in any case to store output in a composite datastore [ not done now] to gain needed flexibility

Conceptual issues with simultaneous pdf / data Composite datasets most likely used only in conjunction with RooSimultaneous, so that p.d.f. is likely the most sensible point to make this interface, e.g. ws.factory(“SIMUL::model[idx,a=pdfA(x),b=pdfB(y)]”) then modify internally RooSimultaneous::generate() to follow instructions accordingly. Also need new syntax to construct RooDataSets in this way RooDataSet ds(“ds”,”ds”,RooArgSet(x,y,i),Index(i), Import(dataA,”a”,x), Import(dataB,”b”,y)

Conceptual issues with simultaneous pdf / data Once concept of RooMixedData is implemented can also think of interface binned-vs-unbinned datasets –Construction ‘by hand’ follows trivially from ctor RooMixedData ds(“ds”,”ds”,RooArgSet(x,y,i),Index(i), Import(dataA,”a”,x), Import(dataB,”b”,y) –When generating binned-vs-unbinned is a ‘preference’ (you can always do either way) –Either specify at generation time (requires non-trivial interface), or encode ‘preference’ inside a RooSimultaneous Still requires some creativity to be able insert this preference spec in the factory Otherwise through class interface sim.setGenerateBinned(“a”,kTRUE) ;

Recap of data and simultaneous issues Project 1 –Make RooVectorDataStore ~ 1 week. Easily factorized/delegated Project 2 –Adjust RooDataSet/RooDataHist to accept index-dependent observables [ ~2-3 days ] –Adjust RooSimultanous to specify ‘relevant’ observables for each index [ 1 day ] Project 3 –Make RooCountingData ~ 2-3 days –Make RooMixedData ~2-4 days [ depending on difficulties ] –Adjust RooSimultaneous to use these

Other issues Workspaces –Ability to rename named sets store in datasets [ 1 hour ] –Make EDIT() capable of removing terms in PROD terms [ 1 day ] –Bug in RooHistPdf persistence [ 1-2 days ] Time consuming as it requires intervention in RooAbsArg streamer –Kyle reported 32/64 issues in persistence [need example] [ ?? ] Pdf interface issues –Port generateSimGlobal() to generate() interface [ 1 day ] –Make extendedTerm() return Double_t instead of Int_t to support Asimov datasets [ 0.5 day ] –Common abstract interface for morphing operator PDF [ ??? ] Likelihood interface issues –What normalization set applied to constraint terms? –Need data/pdf combination scheme that allows to detach dataset that has already died from a NLL  Simplifies use of setData() in RooStats [1-2 days ]

Addressing RooStats performance issues from RooFit side Avoid need to (re)create likelihoods –Modified data/pdf attachement scheme in RooNLLVar that allow to detach datasets after they have been deleted  Allows straightforward use of setData() in RooStats [ 2 days ] Speeding of dataset looping, creation deletion –Vector-based datasets [ ~1 week ] Copy overhead of complex objects –Complex defines as have >>100 nodes –Several optimization already applied on RooFit side (Hash tables etc for reconnection lookup). Biggest speed gain most likely in form of addition of new classes that allow to reduce number of objects  Collapse construct of a pdf for N channels into a single one. Needs some details on use cases, but likely good progress possible in O[2-3 days] Profiling of RooStats TLimit macro essential