One Billion Rows Per Second: Analytics for the Digital Media Markets STRATA SUMMIT NYC September 21, 2011 MICHAEL DRISCOLL CO-FOUNDER &

Slides:



Advertisements
Similar presentations
1. SQL Server 2014 In-Memory by Design Arthur Zubarev June 21, 2014.
Advertisements

Big Data Working with Terabytes in SQL Server Andrew Novick
Real-Time Big Data Use Cases John Leach CTO, Splice Machine.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Technical Evangelist Tugdual “Tug” Grall BigData - NoSQL Hadoop - Couchbase.
A Fast Growing Market. Interesting New Players Lyzasoft.
Approximate Queries on Very Large Data UC Berkeley Sameer Agarwal Joint work with Ariel Kleiner, Henry Milner, Barzan Mozafari, Ameet Talwalkar, Michael.
One Billion Rows Per Second: Analytics for the Digital Media Markets XLDB October 19, 2011 MICHAEL DRISCOLL CO-FOUNDER &
Evaluation of NoSQL databases for DIRAC monitoring and beyond
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Data-centric computing with Netezza Architecture DISC reading group September 24, 2007.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Mixing Low Latency with Analytical Workloads for Customer Experience Management Neil Ferguson, Development Lead, NICE Systems.
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential.
MD240 - MIS Oct. 4, 2005 Databases & the Data Asset Harrah’s & Allstate Cases.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
IMDGs An essential part of your architecture. About me
An Introduction to HDInsight June 27 th,
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
…optimise your IT investments Warehousing for low latency analytics Philip Howard Research Director – Bloor Research.
IT Architectures for Handling Big Data in Official Statistics: the Case of Scanner Data in Istat Gianluca D’Amato, Annunziata Fiore, Domenico Infante,
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Intuitions for Scaling Data-Centric Architectures
What is Big Query?.
Breaking points of traditional approach What if you could handle big data?
Enterprise Solutions Chapter 11 – In-memory Technology.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Modern Data Warehousing Symmetric Multi-Processing SQL (SMP) vs Massive Parallel Processing SQL (MPP) Alain Dormehl P-Cubed Session Level : Intermediary.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Big Data Analytics Hadoop is here to Stay!. What is Big Data? Large databases which are hard to dealComplex and Unstructured dataNeed for Parallel ProcessingHigh.
Microsoft Ignite /28/2017 6:07 PM
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
JET INFOSYSTEMS The main approach to Big Data parallel processing: Oracle way Aleksey Struchenko Database Department Leader.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Dumps PDF Perform Data Engineering on Microsoft Azure HD Insight dumps.html Complete PDF File Download From.
Big Data & Test Automation
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Image taken from: slideshare
5/7/ :44 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
BigData - NoSQL Hadoop - Couchbase
Hadoop and Analytics at CERN IT
Zhangxi Lin, The Rawls College,
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Why Is My SQL DW Query Slow?
Operational & Analytical Database
Dremel.
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Data Platform and Analytics Foundational Training
This meme comes from South Park (S2E )
The Cognitive Design Principles of Interactive Visual Analytics
Flexible Distributed Reporting for Millions of Publishers and Thousands of Advertisers Berlin |
AGENDA Buzz word. AGENDA Buzz word What is BIG DATA ? Big Data refers to massive, often unstructured data that is beyond the processing capabilities.
E-Distribution Service uses Cloud Infrastructure to Reduce Costs, Gain Flexibility, and Expand Globally “Windows Azure helps makes it simple, fast, and.
Architecture of modern data warehouse
Big Data.
Presentation transcript:

One Billion Rows Per Second: Analytics for the Digital Media Markets STRATA SUMMIT NYC September 21, 2011 MICHAEL DRISCOLL CO-FOUNDER &

Taming the Inferno of the Online Ad Markets billions of microtransactions per day dozens of publisher, advertiser, & audience attributes

Goal: Fast Dashboards Over Big Data

data crunched in minutes queries in seconds dashboard database ingestion Goal: Fast Dashboards Over Big Data

data crunched in minutes queries in minutes dashboard database ingestion Solution 1: Relational Database MPP relational DB Hadoop

data crunched in hours queries in seconds dashboard database ingestion Solution 2: HBase Hadoop

data crunched in minutes queries in seconds dashboard database ingestion Solution 3: Do It Ourselves: Druid Druid Hadoop

Four Principles of Performance at Scale SUMMARIZE DISTRIBUTE PARALLELIZE STORE IN-MEMORY 100x smaller vs raw data 100x throughput vs a single node 100x faster vs reading disk 10^6 Druid can filter and aggregate over 1 billion rows per second on a 50-core cluster, or 20m rows per core per second factor increase

Consequences of Speed: Data Freshness photo credit: Lars P.

Consequences of Speed: Blue Sky Exploration photo credit: MonkeyAt Large

Consequences of Speed: Interactivity photo credit tonylanciabeta

One Billion Rows Per Second: Analytics for the Digital Media Markets QUESTIONS? CONTACT ME AT MICHAEL DRISCOLL CO-FOUNDER &