6 May 2014 CERN openlab IT Challenges workshop, https://indico.cern.ch/event/297559/https://indico.cern.ch/event/297559/ Kacper Szkudlarek, CERN Manuel.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

A successful public- private partnership Alberto Di Meglio CERN openlab CTO.
Click to edit Master title style European AFS and Kerberos Conference Welcome to CERN CERN, Accelerating Science and Innovation CERN, Accelerating Science.
CERN IT Department CH-1211 Geneva 23 Switzerland t Sequential data access with Oracle and Hadoop: a performance comparison Zbigniew Baranowski.
Overview of Data Management solutions for the Control and Operation of the CERN Accelerators Database Futures Workshop, CERN June 2011 Zory Zaharieva,
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
CERN openlab Open Day 10 June 2015 KL Yong Sergio Ruocco Data Center Technologies Division Speeding-up Large-Scale Storage with Non-Volatile Memory.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
CERN openlab V preparation, Data Analytics (for research)
13 October 2014 Eric Grancher, head of database services, CERN IT Manuel Martin Marquez, data scientist, CERN openlab.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Oracle: Database and Data Management Innovations with CERN
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
Openlab Workshop on Data Analytics 16 th of November 2012 Axel Voitier – CERN EN-ICE.
Storage and data services eIRG Workshop Amsterdam Dr. ir. A. Osseyran Managing director SARA
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
IMDGs An essential part of your architecture. About me
Control System Data Analysis Future Vision Author: Axel Voitier CERN EN-ICE.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
CERN openlab V Technical Strategy Fons Rademakers CERN openlab CTO.
Eugenia Hatziangeli Beams Department Controls Group CERN, Accelerators and Technology Sector E.Hatziangeli - CERN-Greece Industry day, Athens 31st March.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
IT Architectures for Handling Big Data in Official Statistics: the Case of Scanner Data in Istat Gianluca D’Amato, Annunziata Fiore, Domenico Infante,
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
CERN openlab technical workshop
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Next Generation of Apache Hadoop MapReduce Owen
Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA,
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
European Organization For Nuclear Research CERN Accelerator Logging Service Overview Focus on Data Extraction for Offline Analysis Ronny Billen & Chris.
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
Retele de senzori Curs 1 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Peter Idoine Managing Director Oracle New Zealand Limited.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
A successful public- private partnership Maria Girone CERN openlab CTO.
The PVSS Oracle Archiver FW WG 6 th July Credits Many people involved IT/DES: Eric Grancher, Nilo Segura, Chris Lambert IT/PSS: Luca Canali ALICE:
Database 12.2 and Oracle Enterprise Manager 13c Liana LUPSA.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop /1.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
CERN Data Analytics Use Cases
MPD Data Acquisition System: Architecture and Solutions
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Connected Maintenance Solution
Future Database Challenges
WinCC-OA Log Analysis SCADA Application Service - Reporting
Future Archiver (librarian) for WinCC OA Control Systems
PVSS Evolution in Relation to Databases
Connected Maintenance Solution
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Oracle Storage Performance Studies
Grid Canada Testbed using HEP applications
Case studies – Atlas and PVSS Oracle archiver
Ch 4. The Evolution of Analytic Scalability
Presentation transcript:

6 May 2014 CERN openlab IT Challenges workshop, Kacper Szkudlarek, CERN Manuel Martin Marquez, CERN Eric Grancher, CERN

Outline 3 CERN and openlab Analytics challenges Analytics investigation Fault prediction CERN openlab analytics challenges Technology evolutions Analytics as a service

4 Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery Credit: Alberto Pace

Events at LHC Luminosity : cm -2 s MHz – every 25 ns 20 events overlaying 5

Trigger & Data Acquisition 6

Data Recording 7

8 NetworkCPUsStorageInfrastructureDatabases # of cores: 99,061 # of disks: 80,754 # of processors: 18,633 # of 10GB NICs: 3,218 # of 1GB NICs: 19,334 # of servers: 10,809 Disk space (TiB): 129,455 RAM memory (TiB): 349 Credit: Alberto Pace Distributed analysis in the different sites (« LHC Computing Grid ») CERN

openlab (1/3) 9 Public-private partnership between CERN and leading ICT companies, currently in fourth phase (started in 2003) Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community Innovative ideas aligned between CERN and the partners, for products “you make it, we break it”

openlab (2/3) 10 Many successes (DB competence center/Oracle): RAC on Linux x86 (9.2 PoC, 10.1 production with ASM), Additional required functionality (IEEE numbers, OCCI, instant client, etc.), WinCC OA and RAC scalability, Monitoring with Grid Control, Streams world wide distribution, Active DG, GoldenGate, Analytics for accelerator, experiment and IT, Etc.

openlab (3/3) 11 Publications (web, paper) and presentations of results, visitors Maaike Limper, best poster award at The International Conference on Computing in High Energy and Nuclear Physics 2013

Networked devices 12 Large number of control devices, front-end equipment, etc. Many critical systems: Cryogenics, Vacuums, Machine Protection, etc. (Also called “Internet of Things”) Example: LHC logging service (BE-CO)

Logging service (1/3) 13 Credit: Chris Roderick

Logging service (2/3) 14 Credit: Chris Roderick

Logging service (3/3) 15 Credit: Chris Roderick

Improvements for Archiver of WinCC Open Architecture Report on work by departments: Information Technology Beams Engineering Kacper Szkudlarek CERN openlab & Oracle Analytics Innovation Summit

Use case: Quench Protection System › Critical system for LHC operation  Major upgrade for LHC Run 2 ( ) › High throughput for data storage requirement  Constant load of 150k changes/s from 100k signals › Whole data set is transfered to long-term storage DB  Query + Filter + Insertion › Analysis performed on both DBs 06/05/ Backup LHC Logging (long-term storage) RDB Archive 16Projects Around LHC

Performance Optimizations › Use of Oracle Index Organized Tables › Tuning of DB data queries  Search predicates, time-based partitioning  Alignment of data in RAC cluster › Changes in database schema  Focus on redo log and space reduction › Tuning of database parameters 06/05/ BE-CO EN- ICE IT-DB

Results summary › Nominal conditions  Stable constant load of 150k changes/s  100 MB/s of I/O operations  500 GB of data stored each day › Peak performance  Exceeded 1 million value changes per second  MB/s of I/O operations › All CERN production WinCC OA systems (accelerators, detectors and technical infrastructure, 600 servers) will benefit from these optimizations › Next challenge: ~10x increase  Required for next major upgrade ( ) 06/05/201419

Technology evolutions (1/4) 20 Columnar compression Credit: Maaike Limper

Technology evolutions (2/4) 21 In-memory databases Commodity servers with 6TB memory or more Together with compression, enable to have (for some applications) the active data set fully in memory Flash with low latency storage

Technology evolutions (3/4) 22 Networking 10GbE and more (security…)

Technology innovation (4/4) 23 Scalable database and storage infrastructure Powerful analysis packages like ROOT and R Credit: Luca Canali

24. SVM - Support Vector Machines Credit: Massimo Lamanna, Sebastien Ponce (IT-DSS), Stefano Alberto Russo (ex IT-DB) Anomaly detection

Manuel Martin Marquez, CERN IT-DB URL link

Experiment data placement 26 Intelligent data placement models for the CMS and ATLAS experiments Need to extract further knowledge from the monitoring data in order to implement an effective data placement Correlate file-access monitoring with site status Readiness, queue length, storage and CPU available Classify analysis activities and needed resources Making recommendations Learn from the past trends and patterns

Analytics as a service 27 “Analytics platform” or (Big data) “Analytics-as-a- service” (A 3 S ?): Data fed from multiple sources (live) Stored reliably Data processing with multiple systems Easy access, domain expert natural language (DSL) Visualisation Credit: CERN EN-ICE

Conclusion 28 CERN: very diverse and challenging requirements CERN openlab IV, a lot done on Analytics CERN openlab V, a lot to be done, CERN and others (EMBL-EBI, ESA, etc.), whitepaper) “you make it, we break it” Complex integration challenges of rapidly evolving technologies Some more about Oracle and CERN… DB12c