Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.

Slides:



Advertisements
Similar presentations
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Advertisements

Why Spark on Hadoop Matters
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
ACL Solutions for Continuous Auditing and Monitoring John Verver CA, CISA, CMC Vice President, Professional Services & Product Strategy ACL Services Ltd.
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
HADOOP ADMIN: Session -2
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
material assembled from the web pages at
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Matthew Winter and Ned Shawa
Nov 2006 Google released the paper on BigTable.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Spark and Jupyter 1 IT - Analytics Working Group - Luca Menichetti.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Microsoft Partner since 2011
Microsoft Ignite /28/2017 6:07 PM
eBay Marketplaces Ming Ma June 27 th, 2013.
Internal Modern Data Platform Somnath Data Platform Architect.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Big Data & Test Automation
Integration of Oracle and Hadoop: hybrid databases affordable at scale
OMOP CDM on Hadoop Reference Architecture
Big Data Analytics and HPC Platforms
Connected Infrastructure
PROTECT | OPTIMIZE | TRANSFORM
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Hadoop.
Why is my Hadoop* job slow?
Data Analytics and CERN IT Hadoop Service
Hadoop and Analytics at CERN IT
Zhangxi Lin Texas Tech University
Chapter 14 Big Data Analytics and NoSQL
Sqoop Mr. Sriram
New Big Data Solutions and Opportunities for DB Workloads
Connected Infrastructure
Data Analytics and CERN IT Hadoop Service
SQOOP.
Data Analytics and CERN IT Hadoop Service
Hadoop Clusters Tess Fulkerson.
APACHE HAWQ 2.X A Hadoop Native SQL Engine
07 | Analyzing Big Data with Excel
Overview of Azure Data Lake Store
Ministry of Higher Education
Data Analytics and CERN IT Hadoop Service
Visual Analytics Sandbox
Data Analytics – Use Cases, Platforms, Services
Introduction to Apache
Renouncing Hotel’s Data Through Queries Using Hadoop
Setup Sqoop.
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
IBM C IBM Big Data Engineer. You want to train yourself to do better in exam or you want to test your preparation in either situation Dumpspedia’s.
Big-Data Analytics with Azure HDInsight
Pig Hive HBase Zookeeper
Presentation transcript:

Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take into account the condition of the equipment LHC upgrades will further increase luminosity Computing resources needs will be higher Data generated will increase drastically 19/04/20161

Post-LHC accelerator projects km 2

Data Analytics Objective 19/04/2016 3

Areas of investigation Predictive maintenance and system optimization Data extraction, transformation and loading (ETL) Data Visualization and Discovery 19/04/20164

Use Case - FCC RAMS studies Reliability, Availability, Maintainability and Safety (RAMS) studies for the Future Circular Collider (FCC) Study and increase the reliability and availability of the LHC Use RAMS findings to assess the feasibility of the needs of FCC 19/04/20165

Hadoop based solution 19/04/20166 JSON CLI & Scripting Interactive browsing & editing Visualization & Discovery Controls and Operations System Configuration Logging and Monitoring Engineering IT Logs Hadoop Cluster

HUE – Hadoop User Experience Hue is an open source suite of web-based applications for analyzing data with any Apache Hadoop It features: SQL Editors for Hive, Impala, MySql, PostGres, Sqlite and Oracle Dynamic search dashboards for Solr Spark Notebooks Browsers for YARN, HDFS, Hive table Metastore, HBase, ZooKeeper Pig Editor, Sqoop2, Oozie workflows Editors and Dashboards Wizards to import data into Hadoop 19/04/20167

HUE – Hadoop User Experience 19/04/20168

9 HUE – Hadoop User Experience

One tool to use multiple Hadoop components Easy to use Compatible with multiple versions, open source Extensible But Requires language knowledge to explore and transform the data Limited for Data Discovery 19/04/201610

Oracle Big Data Discovery Overview Data Exploration & Discovery Interactive catalog of all data Assess attribute statistics, data quality and outliers Quick data exploration or create dashboards and applications (without intervention of IT) Data Transformation with Spark in Hadoop Apply built-in transformations or write your own scripts Data Enrichment Text: Entity extraction, relevant terms, sentiment, language detection Geographical information: address, IP, reverse Preview results, undo, commit and replay transforms Collaborative environment Share and bookmarks Create and share transformed datasets 19/04/201611

Architecture overview 19/04/ Dgraph In-Memory Columnar Database Data Storage 4x Xeon E v2 (15 cores each) 2 TB RAM 4.8 TB Flash + 6 x 1.2 TB 10K HDD CDH nodes, 24 GB ram Intel Xeon 2.27GHz 165 TB HDFS Data Integration Resource Management (YARN) Oracle Big Data Discovery Libraries + Hive table detector

19/04/ Quick Data Exploration

19/04/ Data Transformation UI - ETL

Data visualization and discovery is an important area in data analytics Facilitates users to visualize and explore their data Find correlations, extract insight and useful information Important points Flexible and user-friendly platform Advanced data visualization and exploration Collaborative Application to different domains Controls and Operations IT Infrastructure Monitoring Human Resources 19/04/ Conclusions