PyTimber & CO M. Betz, R. De Maria, M. Fitterer, C. Hernalsteens, T. Levens Install: $ pip install pytimber Sources: https://github.com/rdemaria/pytimber.

Slides:



Advertisements
Similar presentations
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Advertisements

1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
An Example Use Case Scenario
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
Parallel Interactive Computing with PyTrilinos and IPython Bill Spotz, SNL (Brian Granger, Tech-X Corporation) November 8, 2007 Trilinos Users Group Meeting.
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
FLUKA GUI Status FLUKA Meeting CERN, 10/7/2006.
Domain and Persistence Patterns. Fundamental Pattern Types Design Patterns Business Logic Patterns.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Java Beans. Definitions A reusable software component that can be manipulated visually in a ‘builder tool’. (from JavaBean Specification) The JavaBeans.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
GLAST Science Support Center June 29, 2005Data Challenge II Software Workshop User Support Goals For DC 2 James Peachey GSFC/L3.
Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.
NcBrowse: OPeNDAP Server Access and 3-D Graphics Presented by Nancy N. Soreide NOAA/PMEL Donald W. Denbo UW/JISAO-NOAA/PMEL.
CS 501: Software Engineering Fall 1999 Lecture 23 Design for Usability I.
Python tools and services for LHC data analysis R. De Maria, C. Hernalsteens, T. Levens, L. Mascetti, D. Piparo.
Big Data Analytics and Machine Intelligence Capability Team
Fundamental of Databases
Python Programming Unit -1.
Python for data analysis Prakhar Amlathe Utah State University
Analysing FCC Data Files in SWAN
CSC391/691 Intro to OpenCV Dr. Rongzhong Li Fall 2016
Python Tools for Control System Access
ODBC, OCCI and JDBC overview
ITCS-3190.
16th CAA Cross-calibration Workshop
Spark Presentation.
In-situ Visualization using VisIt
Enterprise Library Overview
Core WG Meeting November 16th, 2017.
New trends in parallel computing
The Web Service based approach for data distribution at the IRIS DMC
Prepared by Kimberly Sayre and Jinbo Bi
Applied Software Implementation & Testing
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
structures and their relationships." - Linus Torvalds
Network Visualization
Introduction to MATLAB
GIFT / Fiscal Data Package Iteration 3
Chapter 2: System Structures
C.U.SHAH COLLEGE OF ENG. & TECH.
Program Design Language (PDL)
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Charles Tappert Seidenberg School of CSIS, Pace University
Dtk-tools Benoit Raybaud, Research Software Manager.
Introduction to Data Structure
CSC 142 Introduction to Java [Reading: chapters 1 & 2]
Simulation And Modeling
Java Analysis Studio - Status
CREE: HEIRPORT lite Welcome screen:
CMPE212 – Reminders Assignment 2 due today, 7pm.
NIEM Tool Strategy Next Steps for Movement
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Collecting, Analyzing, and Visualizing Data with Python Part I
GEO Knowledge Hub: overview
structures and their relationships." - Linus Torvalds
Web Application Development Using PHP
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Installations for Course
Presentation transcript:

PyTimber & CO M. Betz, R. De Maria, M. Fitterer, C. Hernalsteens, T. Levens Install: $ pip install pytimber Sources: https://github.com/rdemaria/pytimber

PyTimber: design criteria Package goal: Provide a simple interface to CALS API for interactive data analysis in Python: get data with few lines of code works well in the typical scientific Python ecosystem: numpy (array objects), matplotlib (plotting library), ipython/jupyter (interactive environment), etc… use native python objects for input and output: string, lists, dictionary, floating point numbers, array keep python API predictable, expose simple building blocks, provide higher level functionality top of the building blocks keep performance under control use easily from in lxplus, technical network, swan, users machines reduce prerequisites to the minimum

PyTimber: example

PyTimber: prerequisites Python 2.7 or Python 3.x jpype: critical package to connect to a JVM enabling the whole concept of leveraging on Java APIs. Not in the Python standard library, community supported, complex package built on top of Java native interface, not straightforward to install, not easy to use in concurrent processes cmnbuild_dep_manager: get jars, manage JVM, resolve class location use from CERN network

PyTimber: methods __init__: choose appid, clientid, source search: from string pattern get: variables, time window, fundamental (optional) Variables: string, pattern, list of strings Time window: strings (localtime), unix time, datetime objects, ‘last’, ‘next’ tree: explore variable hierarchies getAligned: return align data getLHCFillData: return beam modes for a fill getLHCFillesByTime: return a list of fills getIntervalsByLHCModes: return a list fill numbers and intervals getMetaData: get all metadata defined by pattern or list

PyTimber: status and wish list API stable, still few methods could be added Performance regressions from first releases to be investigated Better time zone management (choose UTC time as input, time zones explicitly) Documentation is still sparse and not complete: tutorial, examples in the source distribution, swan galleries, GitHub wiki, CO wiki Decide if pytimber should evolve towards a large framework: provide generic methods + application specific classes or keep as it is and use e.g. pytimbertools to provide the additional functionalities

Pytimber Tools: pagestore Provide data persistence of pytimber time series. store everything pytimber can emit subset of pytimber API: search and get from unix timestamps fast bulk reading performance: data stored contiguously in binary files large dataset management ~TB separate data store (e.g. in EOS) path and data indexes (e.g. local) slow writing speed, brittle concurrent writing

Pytimber Tools: pagestore example

Pytimber Tools: dataquery Encapsulate query in a object: use pytimber or pagestore as data source provide shortcuts to data array for interactive manipulations provide generic plots (lines, images for 2D arrays) method to extend datasets by appending/inserting data or adding variables Interpolate and align all variables

Pytimber Tools: dataquery example

Pytimber Tools: BSRT Class to process LHC BSRT data.

Outlook Pytimber proved to be an easy way to query CALS data. Pytimber API simple and generic enough that can be used by other data service. Applications being built on top of Pytimber API. Actions and proposals: Formalize Python API and use it for future products related to machine data. Use pytimber package to provide additional tools to give a more complete experience, collect best practices, avoid reinventing wheels, reduce development time. CO being more involved in jpype maintenance. Profile pytimber and see if opportunities of speed-up emerges. Start using py.test for testing and improve documentation (Sphinx?)

Proposals for sub packages