Real-time Stream Processing Architecture for Comcast IP Video

Slides:



Advertisements
Similar presentations
Cloud Transcoding Matthew Johnson, Ph.D. VP Software Engineering Unicorn Media, Inc.
Advertisements

NIST Big Data Public Working Group Security and Privacy Subgroup Presentation September 30, 2013 Arnab Roy, Fujitsu Akhil Manchanda, GE Nancy Landreville,
A Survey of Distributed Database Management Systems Brady Kyle CSC
The NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB.
Running Hadoop-as-a-Service in the Cloud
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
GridGain In-Memory Data Fabric:
Presentation by Krishna
Pulsar Realtime Analytics At Scale Tony Ng, Sharad Murthy June 11, 2015.
Using Conviva 29 Aug Summary Who are we? What is the problem we needed to solve? How was Spark essential to the solution? What can Spark.
Microsoft Big Data Essentials Module 1 - Introduction to Big Data
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
In-Memory Data Grid Use Cases & Patterns Jean-Noel Moyne TIBCO Fellow © Copyright TIBCO Software Inc.
Capabilities Briefing
Hadoop Ecosystem Overview
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Department of Veterans Affairs VLER Core Vendor Days 1/24, 1/25.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
35% of Internet traffic is video today, by % Growing at ~50% CAGR TV IP Delivery ~50 million internet connected TVs sold this year 150M+ video.
Oracle Coherence Product Overview Raanan Dagan / Coherence Team.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Creating New Business Value with Big Data Attivio Active Intelligence Engine®
IMDGs An essential part of your architecture. About me
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Technology for Tomorrow’s High Performance Exchanges Paul Michaud Global Executive IT Architect for Financial Markets November 2009 © 2009 IBM Corporation.
Mingfei Yan Program manager Windows Azure Media Services.
DELIVERING THE ENTERPRISE FABRIC FOR BIG DATA Aiaz Kazi SVP, Platform Strategy and Adoption
Agenda Motion Imagery Challenges Overview of our Cloud Activities -Big Data -Large Data Implementation Lessons Learned Summary.
Presenters: Rezan Amiri Sahar Delroshan
CollabDraw Real-time Collaborative Drawing Board Shishir Prasad Prashant Saxena Prakhar Panwaria.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Windows Azure Conference 2014 LAMP on Windows Azure.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Content Delivery from the Cloud Chris Rittler Deluxe Digital Distribution April 8, 2013.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Intuitions for Scaling Data-Centric Architectures
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
REST By: Vishwanath Vineet.
Simplifying Cloud Connectivity for Your Clients Presenter: Tom SharkeyTom Sharkey December 8,
1 Requirement Specification for IoT API layer Company:Tata Consultancy Services Author(s):Avik Ghose Contact
Cloudera Kudu Introduction
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
Streaming Analytics with Spark 1 Magnoni Luca IT-CM-MM 09/02/16EBI - CERN meeting.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Seminar: Deep Dive into Oracle NoSQL Technologies and Solutions Presenter: Zohar Elkayam, CTO, Brillix.
BIG DATA BIGDATA, collection of large and complex data sets difficult to process using on-hand database tools.
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Multicast in Information-Centric Networking March 2012.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Microsoft Ignite /28/2017 6:07 PM
Big thanks to everyone!.
DATA Storage and analytics with AZURE DATA LAKE
Connected Infrastructure
TV Broadcasting What to look for Architecture TV Broadcasting Solution
Introduction to Spark Streaming for Real Time data analysis
Enterprise Town Hall solution
Collecting heterogeneous data into a central repository
2016 Citrix presentation.
Connected Infrastructure
CHAPTER 3 Architectures for Distributed Systems
In-Memory Performance
Taming the Big Data Fire Hose
Presentation transcript:

Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Agenda Comcast VIPER Overview Architecture Overview Q & A

Comcast Video IP Engineering and Research (VIPER) Preparation Delivery Video Players Video Players Analysis Packaging Origination Storage Transcoding iOS Android Xbox Live Samsung Storm

Why Do We Focus on Real-time? Proactively diagnose issues Form real-time intelligence Help deliver best possible video experience Prime Time Viewership

Video Player Analytics Protocol Live and On Demand JSON event objects Key metrics Bitrate Frame rate Fragments Errors We collect and use all data in accordance with best consumer privacy practices and applicable laws

Player Sessions: Key In Understanding Video Experience

High Level Architecture And Data Flow

Flume: Data collection Tier Collect, aggregate and move large amounts of data Distributed, scalable, reliable, customizable Multi-tier architecture

Storm: Stream Processing Tier

Player Sessions in Real-time Sessions in Flume? Technical issues: consistent hash and exactly-once semantics Design goals Separation of concerns Session write-through rate?

Flume Edge Tier: Video Player Analytics End Point Analytics events over HTTPS HTTP Source Re-batch with inner sink and source

Flume Mid Tier: Processing and Routing Data Video Player Event processing Geo-location, asset metadata, validation, to-storm Replication channel processor: HDFS sink Storm sink

Bridging Flume to Storm: Flume2Storm Connector Service discovery Distributed, scalable and reliable Low latency

Simplified Video Player Storm Topology

Requirements for Read/Writes from Storm Bolts Functionality beyond key/value stores Real-time and historic window queries Speed of in-memory writes and durability of disk

Utilizing MemSQL for Persistence Distributed in-memory SQL database ACID, highly available, fault tolerant Aggregators route queries to leaves Leaves are auto-sharded Solves our intense read/writes

Isolated Analysts and Ingest Aggregators

Achievements In Utilizing MemSQL Complex queries in milliseconds Fault-tolerant Storm bolt state Joins now available outside of Storm bolts Foreign key shards Complex data streams Dynamic alters without locks or down time JSON type Isolated aggregator groups Sustaining intense write-through rates while

Wrapping Up Real-time at Comcast scale Builds foundation Millions of video players Horizontal scale everywhere Aggregated metrics across US and complex analysis Real-time API Builds foundation Advanced real-time analytics Better platform for innovation Alerts on complex objects Supplemental real-time data back to clients Popularity-based CDN

Thank You christopher_lintz@cable.comcast.com gabriel_commea@cable.comcast.com