Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau
Agenda Comcast VIPER Overview Architecture Overview Q & A
Comcast Video IP Engineering and Research (VIPER) Preparation Delivery Video Players Video Players Analysis Packaging Origination Storage Transcoding iOS Android Xbox Live Samsung Storm
Why Do We Focus on Real-time? Proactively diagnose issues Form real-time intelligence Help deliver best possible video experience Prime Time Viewership
Video Player Analytics Protocol Live and On Demand JSON event objects Key metrics Bitrate Frame rate Fragments Errors We collect and use all data in accordance with best consumer privacy practices and applicable laws
Player Sessions: Key In Understanding Video Experience
High Level Architecture And Data Flow
Flume: Data collection Tier Collect, aggregate and move large amounts of data Distributed, scalable, reliable, customizable Multi-tier architecture
Storm: Stream Processing Tier
Player Sessions in Real-time Sessions in Flume? Technical issues: consistent hash and exactly-once semantics Design goals Separation of concerns Session write-through rate?
Flume Edge Tier: Video Player Analytics End Point Analytics events over HTTPS HTTP Source Re-batch with inner sink and source
Flume Mid Tier: Processing and Routing Data Video Player Event processing Geo-location, asset metadata, validation, to-storm Replication channel processor: HDFS sink Storm sink
Bridging Flume to Storm: Flume2Storm Connector Service discovery Distributed, scalable and reliable Low latency
Simplified Video Player Storm Topology
Requirements for Read/Writes from Storm Bolts Functionality beyond key/value stores Real-time and historic window queries Speed of in-memory writes and durability of disk
Utilizing MemSQL for Persistence Distributed in-memory SQL database ACID, highly available, fault tolerant Aggregators route queries to leaves Leaves are auto-sharded Solves our intense read/writes
Isolated Analysts and Ingest Aggregators
Achievements In Utilizing MemSQL Complex queries in milliseconds Fault-tolerant Storm bolt state Joins now available outside of Storm bolts Foreign key shards Complex data streams Dynamic alters without locks or down time JSON type Isolated aggregator groups Sustaining intense write-through rates while
Wrapping Up Real-time at Comcast scale Builds foundation Millions of video players Horizontal scale everywhere Aggregated metrics across US and complex analysis Real-time API Builds foundation Advanced real-time analytics Better platform for innovation Alerts on complex objects Supplemental real-time data back to clients Popularity-based CDN
Thank You christopher_lintz@cable.comcast.com gabriel_commea@cable.comcast.com