Distributed, real-time actionable insights on high-volume data streams

Distributed, real-time actionable insights on high-volume data streams
Conflux Distributed, real-time actionable insights on high-volume data streams Vinay Eswara Jai Krishna Gaurav Srivastava 2nd December 2016

Introduction A monitoring system based on time series data needed for a cloud scale data center. Supported multi tenancy. Design configuration maximums: Configuration Maximums Reference 50,000 VMs / 30,000 powered on VMs 85 metrics each emitted every 20 seconds. Data volume : (30,000 VMs x 85 metrics x 1KB packets) / 20 Approximately 127 MB/s or 11 TB per day, not accounting for compression. CONFIDENTIAL

Objective Group streams arbitrarily in real-time. E.g all customer VMs or capacity utilization / Accounting workload etc Was able to compute machine learning style models fast : M = A x (s1) a’ + B x (s2) b’ + C x (s3) c’ where s1, s2, s3 are streams. Handle updates to model functions and groups fast, at the same time being highly available, horizontally scalable and easy to deploy using VM templates. CONFIDENTIAL

Existing solutions Twitter Heron, Kestrel Google Millwheel, Photon
Apache Spark, Storm, Samza CONFIDENTIAL

Background: naive solution - Modulo
ID % num servers (3) = server number Server S2 crashes. S0 S1 S2 Rehashing occurs. Users not related to crash are shuffled : U3, U4, U9 u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 ID % num servers (2) = server number S0 S1 Is it possible to only redistribute users homed on the crashed server ? u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 CONFIDENTIAL

Background: consistent hashing primer
Nodes Users CONFIDENTIAL

Vnodes in Practice, node failure
CONFIDENTIAL

Definitions: Packet: is the atomic unit of input and output in Conflux. It is a set of (ID, Metric, Timestamp, Value) tuples. Stream: is a logically unbounded sequence of tuples bearing the same ID. Routing: is the process of consistent hashing the ID in each packet with the number of live nodes in the conflux cluster to decide which node to deliver the packet to. Metric: is an individual, time stamped, measurable property of a phenomenon being observed Note: All timestamps are UTC (client provided) CONFIDENTIAL

Method : Consistent hashing in Conflux
Each stream has a unique ID. Consistent hash of that ID = Conflux node This shards the universe of streams into the number of nodes : cache partitioning. Failure : Batch acknowledgements lead to retransmit of batch. Failure : Cassandra replication leads to data being available locally again, since hashes match! CONFIDENTIAL

Groups A set of streams with ID’s is a group: e.g. G = A + B + C + D
Conflux treats a group itself as a stream with ID ‘G’. This allows group composition: e.g : GoG = G1 + G2 + G3 When ingesting a packet with some ID ‘X’ of group ‘G’, conflux simply retransmits the packet changing its ID to ‘G’. This is called feed- forward. CONFIDENTIAL

Merging streams based on groups
Streams hashed to different nodes B Membership is cached at each node A=>G B=>G C=>G D=>G C D G Consistent hash ring Data re-transmitted with group ID ‘G’ : Feed forward CONFIDENTIAL

Models / Formulae A Streams hashed to different nodes B
Membership and first stage of computation is cached at each node X x Ax =>G Y x By =>G Z x Cz =>G W x Dw =>G C D G Consistent hash ring Data re-transmitted with group ID ‘G’ : Feed forward G = (X x Ax)+ (Y x By)…. CONFIDENTIAL

Group Gx create, with members A,B,C
CONFIDENTIAL

Group Gx member delete CONFIDENTIAL

Implementation Single unit of deployment
Thresholding + HTTP callouts = customized actions. Data persisted with TTL into Cassandra for disk reclamation. Cassandra Compaction is done daily in an off peak window JavaScript engine is used to define groups / formulae on the fly. 5 node cluster 8 vCPU, 32GB RAM, 2 TB disk CONFIDENTIAL

Results 1 node ingestion rate with approximately 60% CPU:
Recovery : Single node failure with 5 node cluster Run# Avg CPU before Msg/s before Max CPU in recovery Avg CPU After 1 63% 1647 93% 75% 2 64% 1587 97% 88% 3 62% 1688 96% 72% 4 1711 100% 86% 5 61% 1649 95% 80% CONFIDENTIAL

Conclusion : How is Conflux different
Conflux uses routing using consistent hashing to ensure all related streams of a group or formula end up on the same node This allows for fast in-memory evaluation using cached data on one node. Using the same consistent hash function for message routing as for persistence ensures that reads and writes are always on local disk. Consistent hashing also ensures read write locality is preserved in case of failure. CONFIDENTIAL

Future work Tree - Group for load balancing
Tree group fast updates, compaction. Pure-dynamic groups defined by a function : e.g All nodes whose CPU > 80% CONFIDENTIAL

FAQ Does more RAM per node help ? => to an extent
Does more CPU per node help ? => Oh yes!! What if a rack dies ? => VM affinity, anti affinity What if a datacenter dies => tough luck Why not Spark Heron Samza Can I go across geographies ? => no, backplane IP ensures superfast connections. CONFIDENTIAL

Distributed, real-time actionable insights on high-volume data streams

Similar presentations

Presentation on theme: "Distributed, real-time actionable insights on high-volume data streams"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed, real-time actionable insights on high-volume data streams

Similar presentations

Presentation on theme: "Distributed, real-time actionable insights on high-volume data streams"— Presentation transcript:

Similar presentations

About project

Feedback