Download presentation
Presentation is loading. Please wait.
Published byCamilla Wright Modified over 6 years ago
1
Distributed, real-time actionable insights on high-volume data streams
Conflux Distributed, real-time actionable insights on high-volume data streams Vinay Eswara Jai Krishna Gaurav Srivastava 2nd December 2016
2
Introduction A monitoring system based on time series data needed for a cloud scale data center. Supported multi tenancy. Design configuration maximums: Configuration Maximums Reference 50,000 VMs / 30,000 powered on VMs 85 metrics each emitted every 20 seconds. Data volume : (30,000 VMs x 85 metrics x 1KB packets) / 20 Approximately 127 MB/s or 11 TB per day, not accounting for compression. CONFIDENTIAL
3
Objective Group streams arbitrarily in real-time. E.g all customer VMs or capacity utilization / Accounting workload etc Was able to compute machine learning style models fast : M = A x (s1) a’ + B x (s2) b’ + C x (s3) c’ where s1, s2, s3 are streams. Handle updates to model functions and groups fast, at the same time being highly available, horizontally scalable and easy to deploy using VM templates. CONFIDENTIAL
4
Existing solutions Twitter Heron, Kestrel Google Millwheel, Photon
Apache Spark, Storm, Samza CONFIDENTIAL
5
Background: naive solution - Modulo
ID % num servers (3) = server number Server S2 crashes. S0 S1 S2 Rehashing occurs. Users not related to crash are shuffled : U3, U4, U9 u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 ID % num servers (2) = server number S0 S1 Is it possible to only redistribute users homed on the crashed server ? u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 CONFIDENTIAL
6
Background: consistent hashing primer
Nodes Users CONFIDENTIAL
7
Vnodes in Practice, node failure
CONFIDENTIAL
8
Vnodes in Practice, node failure
CONFIDENTIAL
9
Vnodes in Practice, node failure
CONFIDENTIAL
10
Definitions: Packet: is the atomic unit of input and output in Conflux. It is a set of (ID, Metric, Timestamp, Value) tuples. Stream: is a logically unbounded sequence of tuples bearing the same ID. Routing: is the process of consistent hashing the ID in each packet with the number of live nodes in the conflux cluster to decide which node to deliver the packet to. Metric: is an individual, time stamped, measurable property of a phenomenon being observed Note: All timestamps are UTC (client provided) CONFIDENTIAL
11
Method : Consistent hashing in Conflux
Each stream has a unique ID. Consistent hash of that ID = Conflux node This shards the universe of streams into the number of nodes : cache partitioning. Failure : Batch acknowledgements lead to retransmit of batch. Failure : Cassandra replication leads to data being available locally again, since hashes match! CONFIDENTIAL
12
Groups A set of streams with ID’s is a group: e.g. G = A + B + C + D
Conflux treats a group itself as a stream with ID ‘G’. This allows group composition: e.g : GoG = G1 + G2 + G3 When ingesting a packet with some ID ‘X’ of group ‘G’, conflux simply retransmits the packet changing its ID to ‘G’. This is called feed- forward. CONFIDENTIAL
13
Merging streams based on groups
Streams hashed to different nodes B Membership is cached at each node A=>G B=>G C=>G D=>G C D G Consistent hash ring Data re-transmitted with group ID ‘G’ : Feed forward CONFIDENTIAL
14
Models / Formulae A Streams hashed to different nodes B
Membership and first stage of computation is cached at each node X x Ax =>G Y x By =>G Z x Cz =>G W x Dw =>G C D G Consistent hash ring Data re-transmitted with group ID ‘G’ : Feed forward G = (X x Ax)+ (Y x By)…. CONFIDENTIAL
15
Group Gx create, with members A,B,C
CONFIDENTIAL
16
Group Gx member delete CONFIDENTIAL
17
Implementation Single unit of deployment
Thresholding + HTTP callouts = customized actions. Data persisted with TTL into Cassandra for disk reclamation. Cassandra Compaction is done daily in an off peak window JavaScript engine is used to define groups / formulae on the fly. 5 node cluster 8 vCPU, 32GB RAM, 2 TB disk CONFIDENTIAL
18
Results 1 node ingestion rate with approximately 60% CPU:
Recovery : Single node failure with 5 node cluster Run# Avg CPU before Msg/s before Max CPU in recovery Avg CPU After 1 63% 1647 93% 75% 2 64% 1587 97% 88% 3 62% 1688 96% 72% 4 1711 100% 86% 5 61% 1649 95% 80% CONFIDENTIAL
19
Conclusion : How is Conflux different
Conflux uses routing using consistent hashing to ensure all related streams of a group or formula end up on the same node This allows for fast in-memory evaluation using cached data on one node. Using the same consistent hash function for message routing as for persistence ensures that reads and writes are always on local disk. Consistent hashing also ensures read write locality is preserved in case of failure. CONFIDENTIAL
20
Future work Tree - Group for load balancing
Tree group fast updates, compaction. Pure-dynamic groups defined by a function : e.g All nodes whose CPU > 80% CONFIDENTIAL
21
Q & A
22
FAQ Does more RAM per node help ? => to an extent
Does more CPU per node help ? => Oh yes!! What if a rack dies ? => VM affinity, anti affinity What if a datacenter dies => tough luck Why not Spark Heron Samza Can I go across geographies ? => no, backplane IP ensures superfast connections. CONFIDENTIAL
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.