ROOT as a Service for Web-based Data Analysis, SWAN https://swan.web.cern.ch L. Moneta, X. Valls – CERN EP-SFT The SWAN team E. Tejedor, D. Piparo, P.

Slides:



Advertisements
Similar presentations
Network Redesign and Palette 2.0. The Mission of GCIS* Provide all of our users optimal access to GCC’s technology resources. *(GCC Information Services:
Advertisements

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
INTERNET DATABASE Chapter 9. u Basics of Internet, Web, HTTP, HTML, URLs. u Advantages and disadvantages of Web as a database platform. u Approaches for.
Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
WebFTS as a first WLCG/HEP FIM pilot
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Platform as a Service (PaaS)
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
Lecture 8 – Platform as a Service. Introduction We have discussed the SPI model of Cloud Computing – IaaS – PaaS – SaaS.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Breaking Barriers Exploding with Possibility Breaking Barriers Exploding with Possibility The Cloud Era Unveiled.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
TOPIC 7.0 LINUX SERVICES AND CONFIGURATION. ROOT USER Root user is called “super user” because it has power far beyond those of mortal user. As root,
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
T3g software services Outline of the T3g Components R. Yoshida (ANL)
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
EGI-InSPIRE RI EGI Webinar EGI-InSPIRE RI Porting your application to the EGI Federated Cloud 17 Feb
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Short Customer Presentation September The Company  Storgrid delivers a secure software platform for creating secure file sync and sharing solutions.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Canadian Bioinformatics Workshops
Python tools and services for LHC data analysis R. De Maria, C. Hernalsteens, T. Levens, L. Mascetti, D. Piparo.
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
How to Get Started With Python
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
VCL Best practices Lee Toderick, Department of Technology Systems
Accessing the VI-SEEM infrastructure
Building ARM IaaS Application Environment
IBM Predictive Analytics Virtual Users’ Group Meeting March 30, 2016
Analysing FCC Data Files in SWAN
Fundamentals Sunny Sharma Microsoft
Use of HLT farm and Clouds in ALICE
What are they? The Package Repository Client is a set of Tcl scripts that are capable of locating, downloading, and installing packages for both Tcl and.
Data Analytics and CERN IT Hadoop Service
Foundations of Data Science
Virtualisation for NA49/NA61
Blueprint of Persistent Infrastructure as a Service
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Working With Azure Batch AI
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
Marc-Elian Bégin ETICS Project, CERN
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
Platform as a Service.
Virtualisation for NA49/NA61
DATA MINING Python.
Task Management System (TMS)
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
WLCG Collaboration Workshop;
Excel Services Deployment and Administration
Coding in the Cloud This slide deck includes recorded video demonstrations of content from the live presentation. Joon-Yee.
Prepared by Kimberly Sayre and Jinbo Bi
Introduction to Apache
Module 01 ETICS Overview ETICS Online Tutorials
Building and running HPC apps in Windows Azure
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Enol Fernandez & Giuseppe La Rocca EGI Foundation
Web Application Development Using PHP
Presentation transcript:

ROOT as a Service for Web-based Data Analysis, SWAN L. Moneta, X. Valls – CERN EP-SFT The SWAN team E. Tejedor, D. Piparo, P. Mato – CERN EP-SFT L. Mascetti, J. Moscicki, M. Lamanna – CERN IT-ST GridKa School 2016 August 29 – September 2, 2016 Karlsruhe

2 A service for analysing data in the Cloud only with a web browser, using the CERN software suite and relying on existing CERN services in production Promote CERN software suite and services, propose widely adopted analysis ecosystems. The SWAN Team: P. Mato, D. Piparo, E. Tejedor – EP-SFT / M. Lamanna, L. Mascetti, J. Moscicki – IT-ST

Prelude: the Notebook Innovation based on existing CERN services Examples and Demo Future plans We will have a tutorial tomorrow afternoon 3

Prelude: The “Notebook”

5 A web-based interactive computing interface and platform that combines code, equations, text and visualisations. In a nutshell: an “interactive shell opened within the browser” Also called: “Jupyter Notebook” or “IPython Notebook” Many supported languages: Python, Haskell, Julia, R … One generally speaks about a “kernel” for a specific language No excuses possible when it comes to describe all steps in an analysis!

A Choice of Kernels In a browser Kernels are processes that run interactive code in a particular programming language and return output to the user. Kernels also respond to tab completion and introspection requests.

7 Text and Formulas

8 Code

9 Code This is a notebook in Python

10 Code

11 Shell Commands We can invoke commands in the shell…

12 Shell Commands … And capture their output

13

Images

Code Shell Commands Images 15 Text and Formulas In a browser

Full integration of ROOT with Jupyter Notebooks “import ROOT”: only action required to activate all goodies! A C++ Kernel Inlining of plots as images or JavaScript interactive graphics Magics to JIT or compile C++ code for acceleration Immediately usable in Python! Tab completion for name and methods of classes known to ROOT Capturing of output from C++ libraries “Like before, but better” 16

Full integration of ROOT Machine Learning package (TMVA) with Jupyter Notebooks Enhanced JSROOT plots Interactive training neural network and decision tree visualization Work produced by GSoC students Will be available in next ROOT release 17

Dec 2015Jan Notebook based ROOT online demo Linked from ROOT website front- page Anonymous access, no persistent storage View, Create and Run ROOTbooks! View, Create and Run ROOTbooks! ROOT Demo available on Binder 2k visits/month since December 2015!

19 $ root --notebook Follow some simple instructions at: (basically build ROOT) and… Provides a ROOT C++ kernel and the rest of ROOTbook goodies This command: 1.Starts a local notebook server 2.Connects to it via the browser

A Distributed Service Building on top of CERN Services Portfolio

Platform independent: only with a web browser –Analyse data via the Notebook web interface –No need to install and configure software Calculations, input and results “in the cloud” Allow easy sharing of scientific results: plots, data, code –Storage is crucial, both mass and synchronised Simplify teaching of data processing and programming –ROOT Summer Student course, ML and statistics trainings Ease reproducibility of results and documentation C++, Python and other languages or analysis “ecosystems” –Also interface to widely adopted scientific libraries 21

22 SWAN relies on the CERN ecosystem: Authentication with CERN credentials Machines in the Openstack cloud Software distribution: CVMFS Storage access: EOS, CERNBox –User and experiments data available External but mainstream technologies Jupyterhub Docker A coherent view at CERN Both have large user bases and an active community behind The “home directory” in SWAN

23 EOS Disk-based low latency storage infrastructure for physics users. Main target: physics data analysis. Storage backend for CERNBox. CVMFS HTTP based network FS, optimized to deliver experiment software Files aggressively cached and downloaded on demand. Read-only.

24 Server application - manages login of users and redirection to notebook Existing solution Allows encapsulation: spawn Docker container at logon

25 A “Light-weight virtual machine” Complete isolation of users: many linux systems sharing the same kernel Works on OSx and Windows too –Need VM in the background to run the Kernel! Openstack support Transparent to the SWAN users!

26 Strategy to configure the software environment: –Docker: single thin image, not managed by the user! –CVMFS: configurable environment via “ views ” –CERNBox: custom user environment C C Experiment software Externals/LC G Releases User software

CERN Auth Web Portal Container Scheduler C C C C C C C C Notebook Container CERN Cloud EOS (Data) CERNBox (User Files) CVMFS (Software) C C C C C C C C

28 Launch jobs on the batch farm Access notebook running in a container Inspect produced data via CERNBox/EOS from the notebook Create plots and output data Share, access plots (and output data!) on the web with CERNBox web interface Security guaranteed by the usual CERN standards Added value: remote users cannot open graphical connections to CERN (latency): Problem automatically solved in the workflow described above! e.g.

29

Examples and Demo

31 Open a single notebook or a GIT repository in SWAN: one click away!

32 G. Lo Presti, M. Lamanna “Castor data corruption incident” Describe incident, data source, analysis and results in a single document

33 master/examples/LHC%20Page1.ipynb Read measurements coming from pick- ups in a database Plot time series Needs also SciPy and to share the notebooks with his colleagues R. De Maria, BE-ABP-HSS

34 L. Anderlini Rare B meson decay in LHCb Read data from EOS Setup complex fit Document and inspect results Results coming from real data!(published now)

35 CMSSW Gen-Sim Running in a container Writing directly on EOS Steered from a web browser

36 Pilot service released beginning of June –available at All the main components are already there –EOS, CERNBox: Mass & Synchronised Cloud storage –ROOT integrated with Jupyter, Python analysis ecosystem, R –CVMFS to distribute software In beta testing phase: ~200 users, growing –If interested, please send us an to: –Your feedback is very much welcome!!

37 Since a month, accessible also from outside CERN

38 Continue to incorporate user feedback Improve experience with storage: response time, sharing Exploit external resources –Spark clusters, batch, Grid rsources Approach more and more analysis community –Started by providing support for machine learning use cases

Prototype service for Web based analysis available –ROOT integrated with Jupyter –CVMFS for software distribution –EOS mass storage + CERNBox synchronisation Try on Open for users with CERN accounts, but need to register first. Send mail to to 39

Swan and ROOT Notebooks hands-on tutorial tomorrow at 13:00 (Room 163) People intended to participate and having CERN accounts are invited to register by sending an to Other accounts will be available for non-CERN users 4017/1/2016Data Mining As a Service

Backup Slides

42 Free Jupyterhub deployments on the web: for instance tmpnb, Binder Only temporary resources, e.g. no persistent storage Some commercial providers of cloud based analysis models: Sage Math Cloud Microsoft Machine Learning Cloud Google Cloud Datalab Wolfram Mathematica Online Wakari Octave Online Must be sure of the added value

Large volume of data – complex analysis: need to use many cores 1)Single node: TProcPool, IPython Parallel, etherogeneous/multithreaded codeTProcPoolIPython Parallel 2)Many nodes: Batch/Grid jobs Several production grade, Python based job submission tools available: –Ganga, GridControl, Panda, Crab, … Opportunity: Steer job submission to WLCG or local batch resources from the notebook. 43 CERN Batch Service CERN Batch Service being considered in the full picture!

44 Spark Master C C User Notebook Spark Cluster Python Spark Worker Python