Visual Analytics Sandbox

Slides:



Advertisements
Similar presentations
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Advertisements

Turning Data into Value Ion Stoica CEO, Databricks (also, UC Berkeley and Conviva) UC BERKELEY.
Berkeley Data Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY.
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Copyright © 2014, SAS Institute Inc. All rights reserved. BIG DATA, BIG INSIGHT TDWI APRIL 2014.
Why Spark on Hadoop Matters
Running Hadoop-as-a-Service in the Cloud
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Tyson Condie.
Our Experience Running YARN at Scale Bobby Evans.
Page 1 GADD Software & GADD Analytics 1.5 Public version, January 2015, gaddsoftware.com GADD Analytics.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Enabling data management in a big data world Craig Soules Garth Goodson Tanya Shastri.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Nov 2006 Google released the paper on BigTable.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
3 DAYS ON JANUARY 16 th, 17 th & 18 th 2015 Santa Clara Convention Center, 5001 Great America Parkway, Santa Clara, CA 95054, United States.
Spark and Jupyter 1 IT - Analytics Working Group - Luca Menichetti.
HADOOP Course Content By Mr. Kalyan, 7+ Years of Realtime Exp. M.Tech, IIT Kharagpur, Gold Medalist. Introduction to Big Data and Hadoop Big Data › What.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?
Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Microsoft Ignite /28/2017 6:07 PM
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Hadoop Big Data Usability Tools and Methods. On the subject of massive data analytics, usability is simply as crucial as performance. Right here are three.
The Big Data Network (phase 2) Cloud Hadoop system
Big Data & Test Automation
OMOP CDM on Hadoop Reference Architecture
Best IT Training Institute in Hyderabad
Connected Infrastructure
Berkeley Data Analytics Stack - Apache Spark
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Hadoop.
Why is my Hadoop* job slow?
Hadoop and Analytics at CERN IT
Apache hadoop & Mapreduce
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
An Open Source Project Commonly Used for Processing Big Data Sets
Zhangxi Lin Texas Tech University
Chapter 14 Big Data Analytics and NoSQL
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Data Platform and Analytics Foundational Training
Berkeley Data Analytics Stack (BDAS) Overview
SQOOP.
OGSA Data Architecture Scenarios
Team 2 – Mike, Rich, Sam and Steven DPS – PACE University
07 | Analyzing Big Data with Excel
Ministry of Higher Education
Data science and machine learning at scale, powered by Jupyter
Hadoop for SQL Server Pros
Introduction to Apache
Overview of big data tools
TIM TAYLOR AND JOSH NEEDHAM
Charles Tappert Seidenberg School of CSIS, Pace University
IBM C IBM Big Data Engineer. You want to train yourself to do better in exam or you want to test your preparation in either situation Dumpspedia’s.
Streaming data processing using Spark
Big-Data Analytics with Azure HDInsight
Server & Tools Business
Presentation transcript:

Visual Analytics Sandbox Satya Katragadda January 25, 2018

Agenda Why Big Data? Goals Visual Analytics Sandbox Traditional Workflow in a Big Data Environment VA Sandbox: Software Stack VA Sandbox: Execution Examples

Why Big Data? Reports, e.g., Diagnosis, e.g., Decisions, e.g., Track business processes, transactions Diagnosis, e.g., Why is user engagement dropping? Why is the system slow? Detect spam, worms, viruses, DDoS attacks Decisions, e.g., Decide what feature to add Decide what ad to show Block worms, viruses, …

Goals Low latency (interactive) queries on historical data: enable faster decisions E.g., identify why a site is slow and fix it Low latency queries on live data (streaming): enable decisions on real-time data E.g., detect & block worms in real-time (a worm may infect 1mil hosts in 1.3sec) Sophisticated data processing: enable “better” decisions E.g., anomaly detection, trend analysis

Visual Analytics Sandbox

Big Data Workflow Data Ingestion Data Management Data Processing Visualization Resource Management

VA Sandbox: Software Stack

VA Sandbox: Resource Manager

VA Sandbox: Data Injestion

VA Sandbox: Data Storage

VA Sandbox: Processing and Visualization

VA Sandbox Stephens Hall Accessible through university network

VA Sandbox: Access

VA Sandbox: Execution

VA Sandbox: Input

VA Sandbox: Spark Script

VA Sandbox: Spark Output

Alternative Execution Environment HUE: Hadoop User Experience An open-source Web interface that supports Apache Hadoop and its ecosystem Component Applications Editor SQL, Pig, Spark Browsers YARN, Oozie, Impala, HBase, Livy Scheduler Oozie Dashboard Solr, SQL (Impala, Hive...)

HUE: File Browser

HUE: Job Execution

HUE: Output

HUE: Editors

HUE: Schedulers

HUE: Dashboards

Satya Katragadda RM 118, Abdalla Hall satya@Louisiana.edu Questions? Satya Katragadda RM 118, Abdalla Hall satya@Louisiana.edu