Python tools and services for LHC data analysis R. De Maria, C. Hernalsteens, T. Levens, L. Mascetti, D. Piparo.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Usage of the memoQ web service API by LSP – a case study
Chapter 5: Introduction to Information Retrieval
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Low level CASE: Source Code Management. Source Code Management  Also known as Configuration Management  Source Code Managers are tools that: –Archive.
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Presented by Mina Haratiannezhadi 1.  publishing, editing and modifying content  maintenance  central interface  manage workflows 2.
Database Management Systems (DBMS)
E. Hatziangeli – LHC Beam Commissioning meeting - 17th March 2009.
Section 6.1 Explain the development of operating systems Differentiate between operating systems Section 6.2 Demonstrate knowledge of basic GUI components.
STAR Software Basics Introduction to the working environment Lee Barnby - Kent State University.
Linux Operations and Administration
A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries.
CAA/CFA Review | Andrea Laruelo | ESTEC | May CFA Development Status CAA/CFA Review ESTEC, May 19 th 2011 European Space AgencyAndrea Laruelo.
Information Architecture Linden Daniels. Steps of a Successful Information Architecture Discovery Education Design Migration Monitor.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
Component 4: Introduction to Information and Computer Science Unit 4: Application and System Software Lecture 3 This material was developed by Oregon Health.
1 Materi Pendukung Pertemuan > Contoh HTML document Matakuliah: >/ > Tahun: > Versi: >
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
CERN-PH-SFT-SPI August Ernesto Rivera Contents Context Automation Results To Do…
Sciamachy features and usage with respect to end-users The typical fate of retrieval people dealing with large datasets… C. Frankenberg, SRON team, IUP.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
File sharing requirements of remote users G. Bagliesi INFN - Pisa EP Forum on File Sharing 18/6/2001.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Module 5: Creating and Configuring Group Policies.
SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Data Science Background and Course Software setup Week 1.
2-Dec Offline Report Matthias Schröder Topics: Scientific Linux Fatmen Monte Carlo Production.
Extraction Tools and Relational Database Schemas for CVS, SVN, and Bazaar Revision Control Systems.
Web based spectrum databases and utilities László Dobos Tamás Budavári István Csabai MAGPOP kick-off meeting, January Cassis.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
Oct. 16 th, 2013 Geant4 hadronic Meeting 1 Hans Wenzel Oct 16 th 2013 Status of physics validation tool.
JRA1 Meeting – 09/02/ Software Configuration Management and Integration EGEE is proposed as a project funded by the European Union under contract.
Unit 1: IBM Tivoli Storage Manager 5.1 Overview. 2 Objectives Upon the completion of this unit, you will be able to: Identify the purpose of IBM Tivoli.
Software tools for digital LLRF system integration at CERN 04/11/2015 LLRF15, Software tools2 Andy Butterworth Tom Levens, Andrey Pashnin, Anthony Rey.
European Organization For Nuclear Research CERN Accelerator Logging Service Overview Focus on Data Extraction for Offline Analysis Ronny Billen & Chris.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
 TE-MPE-PE Clean code workshop – R.Heil, M.Koza, K.Krol Introduction to the MPE software process Raphaela Heil TE-MPE-PE Clean code workshop - 9 th July.
AIRS Meeting GSFC, February 1, 2002 ECS Data Pool Gregory Leptoukh.
7/8/2016 OAF Jean-Jacques Gras Stephen Jackson Blazej Kolad 1.
AFS Phase Out experience R. De Maria. AFS and LSF Phase Out (1) I have learned about relevant changes in IT services, only while iterating with IT experts.
ROOT as a Service for Web-based Data Analysis, SWAN L. Moneta, X. Valls – CERN EP-SFT The SWAN team E. Tejedor, D. Piparo, P.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Why indexing? For efficient searching of a document
Analysing FCC Data Files in SWAN
PyTimber & CO M. Betz, R. De Maria, M. Fitterer, C. Hernalsteens, T. Levens Install: $ pip install pytimber Sources:
Python Tools for Control System Access
Hadoop and Analytics at CERN IT
Virtualisation for NA49/NA61
Dag Toppe Larsen UiB/CERN CERN,
Future Database Challenges
Dag Toppe Larsen UiB/CERN CERN,
AFS/LSF Phase Out personal experience
Virtualisation for NA49/NA61
CERN-Russia Collaboration in CASTOR Development
CTA: CERN Tape Archive Adding front-ends and back-ends Status report
GFAL 2.0 Devresse Adrien CERN lcgutil team
TYPES OF SERVER. TYPES OF SERVER What is a server.
FileSpot Collaborative File Manager
LCG Monte-Carlo Events Data Base: current status and plans
Computing Infrastructure for DAQ, DM and SC
Diskless network security
Network Visualization
Interoperability of Digital Repositories
Network Media, models and number systems
EOSDIS Approach to Data Services in the Cloud
Production Manager Tools (New Architecture)
Outline Announcements: Version control with CVS HW II due today!
Presentation transcript:

Python tools and services for LHC data analysis R. De Maria, C. Hernalsteens, T. Levens, L. Mascetti, D. Piparo

Existing tools LHC machine data is normally available through: the CERN Accelerator Logging Service that can be queried with the Timber application (provided Java 1.8 and jws are installed);CERN Accelerator Logging Service Timber jws the NFS file system available from the technical network (e.g. ssh cs-ccr-dev1 ) in the directory /user/slops/data/LHC_DATA/OP_DATA ; in the Post Mortem System database accessible through its java API and other dedicated tools.Post Mortem System

Data analysis work flow Steps and challenges: Get data: data in different network, in special databases, deleted after some time Explore data: slow plots, difficult to align data, missing points, complex data conditioning Save data: when large data sets too many files, too large files, data access becomes to slow, local disks do fail Publish results: need finer control on graphics

New tools and services PyTimber: Python API for querying data of the CERN Accelerator Logging Service PyTimber EOS ABP space: tree in the EOS filesystem (eos/cern.ch/project/abpdata) which can be accessed from lxplus, lxplus-cb6, web server. EOS ABP spaceweb server PageStore: a Python library for storing and retrieving measurement data like, but not limited to PyTimber data. PageStore Old LHC data collection: stored in EOS and accessible with PageStore. Old LHC data collection SWAN : Jupyter Python notebook server with access to EOS and PyTimber/PageStore installed. SWAN Jupyter Python notebook server

PyTimber Developed by R. De Maria, T. Levens, C. Hernalsteens Use Java CALS API (BE/CO) through python-jpype and cmmbuild-dep-manager (T. Levens) Simple API to get data, search variable, name explore variable tree: import pytimber db=pytimber.LoggingDB() data=db.get(“LHC.BOFSU:OFSU_ENERGY”,” :00:00.000”, ” :00:00.000”) vars=db.search(“LHC%ENERGY”) print(db.tree.LHC.Instrumentation) Notebook example

PageStore Developed by R. De Maria. Store data in indexed binary pages managed by SQLite database: Can be queried like PyTimber: import pagestore, pytimber, time mydb=pagestore.PageStore(‘mydata.db',‘./mydata') db=pytimber.LoggingDB() now=time.time() data=db.get(“%LUMI_TOT_INST”,now-3600*24,now) mydb.store(data) mydb.get(“ATLAS:LUMI_TOT_INST”,now-3600,now)

EOS ABP Space EOS is fast ~100 PB storage space from CERN. EOS It has been used as file storage by the experiments in the last few years. It is going to replace AFS in the near future as file system. It is being the CERNBOX service. It can be accessed using eos command line tool, xrootd, as filesystem logging into lxplus-cb6. eos command line tool is available in lxplus and as a package in Scientific linux. EOS team (L. Mascetti et al.) has granted 20TB not back-up space for ABP in: eos/cern.ch/project/abpdata (soon to be replaced by eos/cern.ch/project/a/abpdata) Backup will be done on CASTOR: castor/cern.ch/accel/abpdata.

LHC old data repository In the past many LHC high frequency data was filtered out every week (e.g. BBQ raw data, BLM, Power converter current). A subset of filtered data from 2010, 2011, 2012, 2015 has been saved for about few TB. Data is now accessible in EOS through PageStore. Notebook example

SWAN service All tools mentioned before can be installed in local machines or in lxplus. Python code can be developed using command line tools like IPython or web based notebook. SWAN SWAN service has been developed by PH/SFT (D. Piparo et al.) to provide a centrally managed cloud accessible with CERN credential with EOS and Python software installed. The service is still in beta and few ABP users have granted access, but it will be soon open to everyone.

References and documentation LhcDataStorage accessible from cern.ch/abp-computing