The Evolution of Big Data Netflix

Slides:



Advertisements
Similar presentations
Impacts of 3 rd Party IaaS on broadband network operations and businesses Prabhat Kumar Managing Partner, i 3 m 3 Solutions.
Advertisements

Challenges Facing Enterprise IT REDUCED MANAGEMENT NEW ECONOMICS INCREASED OPPORTUNITIES.
Current impacts of cloud migration on broadband network operations and businesses David Sterling Partner, i 3 m 3 Solutions.
The shadow war: What policymakers need to know about cybersecurity Eric Miller Vice President, Policy, Innovation, and Competitiveness Canadian Council.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Jose Jimenez Director. International Programmes Telefónica Digital Future INTERNET – SMART CITIES Advancing the global competitiveness of the EU economy.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Riding the Mobile Gold Rush: Leveraging Cloud Data Across Multiple Mobile Platforms Bruno Terkaly Senior Developer Evangelist Microsoft Corporation
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
Business Intelligence Overview Marc Schöni Technical Solution Professional | Business Intelligence Microsoft Switzerland.
Top 5 Small Business Tips on Creating a Mobile App.
Apache Spark and the future of big data applications Eric Baldeschwieler.
© 2013 IBM Corporation Version 1.0 The New Eye Insight through Big Data and Analytics: A Case Study on Citizen Sentiment Analysis Sandipan Sarkar, Executive.
4G-LTE: Enhancing Efficiency in Organizations. Factors Impacting Digitization Processes and Systems January Powerful Platforms and Devices Storage.
Building BI App on Cloud Rohit Chatter Sr.
THE LATEST TOP 10 STRATEGIC TECHNOLOGY TRENDS Chris Shayan, Software Solution Architect, Pyramid Consulting According to Gartner Report.
Key data on online engagement: platform On platform: 167 members, 67 discussions, 330 comments Most discussed: Most active: AxelSchultze ThierryNabeth.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Developer TECH REFRESH 15 Junho 2015 #pttechrefres h Understand your end-users and your app with Application Insights.
© 2009 IBM Corporation 1 The API Economy and Cast Iron Web API Andrew Daniel – Cast Iron UI Developer Andrew Daniel – Cast Iron Web API Software Engineer.
Analytics from 330 million smartphones Sean Byrnes CTO & Co-founder.
© SAP AG All rights reserved. / Page 1 Summary of SAP Today SAP AG in 2008 revenues: € billion Around 92,000 companies run SAP software Providing.
Reporting and Analysis With Microsoft Office. Reporting and Analysis Business User Reporting & Analysis OLAP Data Warehouse.
devices billion Core-Business Applications Mobil e Cloud Agile Extend to any device Take advantage of cloud scale and economics.
Microsoft Public Cloud Services
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
IoT Standards Harm Jan Arendshorst Head of Product Management Professional Services Confidential and proprietary materials for authorized Verizon personnel.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
An Introduction To Big Data For The SQL Server DBA.
Atlantic Coast Operations Business Intelligence Mobility Project.
Microsoft Partner since 2011
IoT R&I on IoT integration and platforms INTERNET OF THINGS
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Extreme Scale Infrastructure
Business Intelligence Overview
Qlik + Cloudera 10 Points of Integration
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Maxeler Technologies Multiscale Dataflow Technology solves mission-critical compute problems with 20-50x improvement in performance & energy Application.
Run Azure Services in your datacenter
data & analytics beyond dashboards
Technology as a Strategic Asset Association Technology Trends, Innovation and Transformation April 28, 2016 Tom Lehman Lehman Associates, LLC Lehman Reports.
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Reporting and Analysis With Microsoft Office
Orchestrating Data and Services with Azure Data Factory
Microsoft Azure: The only consistent Hybrid Cloud
A UNIFIED ECOSYSTEM FOR MARKET DATA VISUALIZATION
“The HEP Cloud Facility: elastic computing for High Energy Physics” Outline Computing Facilities for HEP need to evolve to address the new needs of the.
Modern Data Management
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Data Platform and Analytics Foundational Training
Test Automation for IoT solutions A Paradigm shift
Predictions For The Future of Cloud Computing. Cloud Services In India.
9/19/2018 7:06 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
A developers guide to Azure SQL Data Warehouse
Connecting the European Grid Infrastructure to Research Communities
© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024 Mobile Backend as a Service.
Workforce Productivity Competency
DESIGN, DEPLOY, COLLABORATE.
Azure Data Catalog Adoption Patterns and Best Practices
Your gateway to cloud innovation
Cloud Computing: IT Seminar
Learn. Imagine. Build. .NET Conf
Analytics in the Cloud using Microsoft Azure
Quoting and Billing: Commercialization of Big Data Analytics
Computer Services Business challenge
Adobe uses global cloud to create, deliver, and manage better digital experiences Adobe wanted to help its customers understand how people engage with.
AWS Computing NTEG June 2019 New Technology Exploration Group.
Presentation transcript:

The Evolution of Big Data Platform @ Netflix Eva Tse July 22, 2015

.

Our biggest challenge is scale

Netflix Key Business Metrics 65+ million members 50 countries 1000+ devices supported 10 billion hours / quarter

Global Expansion 200 countries by end of 2016

Big Data Size Total ~20 PB DW on S3 Read ~10% DW daily Write ~10% of read data daily ~ 500 billion events daily ~ 350 active users

Our traditional BI stack is our competition

How do we meet the functionality bar and yet make it scale? How do we make big data bite-size again?

Our North Star Infrastructure Architecture Self-serve No undifferentiated heavy lifting Architecture Scalable and sustainable Self-serve Ecosystem of tools

Data Pipelines Event Data Suro/Kafka Ursula Cloud apps 15 min AWS S3 Dimension Data Aegisthus Cassandra SS Tables Daily

Big Data API Big Data Portal Metacat AWS S3 Data movement Parquet FF Metacat (Federated metadata service) Pig workflow visualization Data movement Data visualization (Hadoop clusters) Job/Cluster perf Data lineage Data quality Storage Compute Service Tools (Federated execution service) Big Data Portal API Portal Big Data API AWS S3

Evolving Big Data Processing Needs Analytics ETL Interactive data exploration Interactive slice & dice RT analytics & iterative/ML algo

Evolving Services/Tools Ecosystem API Portal Evolving Services/Tools Ecosystem Data movement Data visualization Big Data API Big Data Portal Data lineage (Federated execution service) Data quality Metacat Pig workflow visualization (Federated metadata service) Job/Cluster perf visualization

AWS S3 as our DW Storage S3 as single source of truth (not HDFS) 11 9’s durability and 4 9’s availability Separate compute and storage Key enablement to multiple clusters easy upgrade via r/b deployment

Evolution of Big Data Processing Systems

Analytics Hive-QL is close to ANSI SQL syntax Hive metastore serves as single source of truth for metadata for big data

Better language construct for ETL Contributions since 0.11 Customization Integration with Metacat to Hive Metastore Integration with S3

Interactive data exploration and experimentation Why we like presto? Integration with Hive metastore Easy integration with S3 Works at petabyte scale ANSI SQL for usability Fast

Our contributions S3 file system Query optimizations Complex types support Parquet file format integration Working on predicate pushdown

Parquet Columnar file format Supported across Hive, Pig, Presto, Spark Performance benefits across different processing engines Working on vectorized read, lazy load and lazy materialization

Interactive dashboard for slicing and dicing Column-based in-memory data store for time series data Serves a specific use case very well

ETL, RT analytics, ML algorithms Why we like Spark? Cohesive environment – batch and ‘stream’ processing Multiple language support – Scala, Python Performance benefits Run on top of YARN for multi-tenancy Community momentum

Evolution of Services/Tools Ecosystem API Portal Evolution of Services/Tools Ecosystem Data movement Data visualization Big Data API Big Data Portal Data lineage (Federated execution service) Data quality Metacat Pig workflow visualization (Federated metadata service) Job/Cluster perf visualization

Federated execution engine Expose [your fave big data engine] as a service Flexible data model to support future job types Cluster configuration management

Metacat Federated metadata catalog for the whole data platform Proxy service to different metadata sources Data metrics, data usage, ownership, categorization and retention policy … Common interface for tools to interact with metadata To be open sourced in 2015 on Netflix OSS

Big Data API Big Data Portal Metacat d d Data movement Service Tools API Portal Data movement Data visualization Big Data API Big Data Portal Data lineage (Federated execution service) Data quality Metacat Pig workflow visualization (Federated metadata service) Job/Cluster perf visualization

Big Data API Integration layer for our ecosystem of tools and services Python library (called Kragle) Building block for our ETL workflow Building block for Big Data Portal

Big Data Portal One stop shop for all big data related tools and services Built on top of Big Data API

Open source is an integral part of our strategy to achieve scale

Big Data Processing Systems Services/Tools Ecosystem

Why use Open Source? Collaborate with other internet scale tech companies Unchartered area/scale, lock-in is not desirable Need the flexibility to achieve scalability BUT… Lots of choices White box approach

Why contribute back? Non IP or trade secret Help shape direction of projects Don’t want to fork and diverge Attract top talent

Why contribute our own tool? Share our goodness Set industry standard Community can help evolve the tool

Is open source right for you?

Measuring big data - understanding data by usage By Charles Smith, Netflix Tomorrow @ 1:40-2:20pm

Eva Tse etse@netflix.com jobs.netflix.com