Real-Time Big Data Use Cases John Leach CTO, Splice Machine.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Big Data: Analytics Platforms Donald Kossmann Systems Group, ETH Zurich 1.
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J.
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
The NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Summary of “ Oracle does about-face on NoSQL ” Jaikumar Vijayan, ComputerWorld, Oct 4th, 2011 Presented by: James Klassen.
Chapter 14 The Second Component: The Database.
GridGain In-Memory Data Fabric:
Architecting for the Internet of Things
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
HOL9396: Oracle Event Processing 12c
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Business Intelligence: The Next Big Thing (Really!) John Bair CTO, Ajilitee Sep 14, 2012 Presented to TDWI St. Louis Chapter.
1 TECHNOLOGY TRENDS FOR 2013 Kaushal Amin, Chief Technology Officer KMS Technology – Atlanta, GA, USA.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Tyson Condie.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.
Analytics from 330 million smartphones Sean Byrnes CTO & Co-founder.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Redefining Service. Customer finds the problem Updating the Service Model 2 Manual troubleshooting.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
MIS2502: Data Analytics Advanced Analytics - Introduction.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
An Introduction To Big Data For The SQL Server DBA.
BIG DATA/ Hadoop Interview Questions.
Apache Hadoop on Windows Azure Avkash Chauhan
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
Microsoft Ignite /28/2017 6:07 PM
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Big thanks to everyone!.
Big Data & Test Automation
Euro17 LSO Hackathon Open LSO Analytics
Connected Infrastructure
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Smart Building Solution
CS122B: Projects in Databases and Web Applications Winter 2017
Smart Building Solution
Operational & Analytical Database
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Powering real-time analytics on Xfinity using Kudu
Turning back time … … to 1998.
Eric Kalabacos Vice President, Customer Solutions November 9, 2018
Overview of big data tools
Taming the Big Data Fire Hose
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big DATA.
Presentation transcript:

Real-Time Big Data Use Cases John Leach CTO, Splice Machine

2 disruptive Befo re After PhDsJava Programmers Data Expensive to Store Distributed Computing Across Commodity Servers Data Cheap to Store

3 obstacles MapReduce Java programmers are scarce and costly Limited use cases because of batch nature of Hadoop

4 Moving Hadoop Beyond Batch Analytics to Power Real-Time Apps Hadoop – not just for data scientists anymore Distributed File System Java MapReduce Programs Read-Only Batch Analytics Real-Time Datastores Distributed RDBMS SQL-99 Queries Real-Time Updates with ACID Transactions Real-Time Apps and Analytics

5 real-time Big Data use cases Ad Technology Digital Marketing Fraud Detection Internet of Things Cyberthreat Security Network Monitoring Personalized Medicine

6 case study: Rocket Fuel

7 case study: digital marketing Powers Unica app and Cognos Scale-out with commodity servers Made queries 3x-7x faster Achieved over 10x price/perf improvement Replaced Oracle RAC DB Initial Results Clien ts Consumers Unica Real-Time Personalization Real-Time Actions Cross-Channel Campaigns Oracle

8 fraud detection Correlate spending patterns based on real-time movements or trips Move beyond simple rules Prevent false positives Catch fraud faster Increase customer satisfaction Intelligent Fraud Detection Roadtrip to Nevada 1.Start in San Francisco Use credit card for gas in Sacramento Use credit card in Tahoe for lunch Credit card denied for gas because you left CA Spend 15 minutes on phone to get credit card reinstated Benefits

9 IOT: network monitoring Detect and isolate faults based by trending real-time events Perform remote resets Increase customer satisfaction Reduce costly calls and “truck rolls” Proactive Fault Response Cable Set-Top Boxes Remote Resets Scale- out RDBMS Telemetry Data Network Monitoring App Benefits

10 IOT: cyberthreat security Correlate millions of events/sec against 3-5 years of firewall history to identify “sleepers” waking up Prevents loss of sensitive data such as credit cards Reduce embarrassing public exposure Real-Time Threat Response Real-Time Responses Scale- out RDBMS Network Events Security Monitoring App Network Firewalls Benefits

11 IOT: personalized medicine Genomic Data Doctors Personalized Treatment App Coordinate care with EMRs Identify complications w/ genetic data Drive real-time response w/ device data Reduce hospital readmissions Eliminate lost revenue under ObamaCare Personalized Treatment Plans Scale- out RDBMS Electronic Medical Records (EMRs) Medical Monitoring Devices Personalized Treatment Plans Alerts Benefits

12 scale up vs. scale-out Scale Up  e.g., Exadata  Very expensive  Poor price/performance Scale Out NoSQL NewSQL Proprietary SQL-on- Hadoop Hadoop RDBMS Analytic Engines How do I scale?  e.g., MongoDB  Limited SQL  No transactions  May have weak consistency or no joins  e.g., NuoDB  Unproven scalability  No Hadoop  e.g., Impala  No transactions  No real-time updates  Can’t power a real- time app  e.g., Splice Machine  Proven scale-out architecture  Transactional RDBMS  Power real-time apps

13 The only Hadoop RDBMS Standard ANSI SQL Horizontal Scale-Out Real-Time Updates ACID Transactions Powers OLAP and OLTP Seamless BI Integration Splice Machine

14 ≈ proven building blocks SQL Scale ≈ Apache Derby

15 how we do it

16 distributed query processing Parallelized computation across cluster Moves computation to the data Utilizes HBase co-processors No MapReduce

17 summary Distributed Computing Disruptive technology Data now cheap to store Real-Time Use Case Types Port existing operational applications experiencing cost or scaling issues Develop new applications that can leverage historical data in real-time Examples Digital marketing Ad Tech Fraud Detection Internet of Things

18 Questions?

Real-Time Big Data Use Cases John Leach CTO, Splice Machine