Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intuitions for Scaling Data-Centric Architectures

Similar presentations


Presentation on theme: "Intuitions for Scaling Data-Centric Architectures"— Presentation transcript:

1 Intuitions for Scaling Data-Centric Architectures
Ben Stopford Confluent Inc

2 Intuition does not come to the unprepared mind A.E.
Intuitions for Scale Intuition does not come to the unprepared mind A.E.

3 Locality & Sequential Addressing

4 Computers work best with sequential workloads
Disk buffer Page cache L3 cache L2 cache L1 cache Pre-fetch is your friend

5 Random vs. Sequential Addressing
300 reads/sec 200MB/s HDD can do ~300 reads /sec or 200MB/s sequentially For a 100B row that’s 7000x faster sequentially. Memory x faster when sequential Cache hierarchy aids sequential access Prefetching actually acts against random access at high throughputs. JVM highlights this further, Object arrays around 50x slower than primitives (although this is a size thing too). => Sequential Disk ~ Random Memory => as much sequential access and as little random access as possible e.g. sequential is ~7000x faster for 100B rows

6 Random RAM ~ Sequential Disk
This isn’t just Disk L3 L2 10-100x L1 Random RAM ~ Sequential Disk

7 Files

8 We can write sequentially to a file quickly

9 Reading Efficiently Scan Position & Scan (pages)

10 Avoid Random Reads

11 Writing Tradeoffs Append Only Journal (Sequential IO) Update in Place
v1 Append Only Journal (Sequential IO) v2 v1 Append data to the tail of the file, sequentially. Updates either in place or append only If later, scan entirety in reverse chronological Use offsets to address if fixed width fields *Seq access is fast* Append Only great for write performance (append only) good for aggregate functions Update in Place great for row-based addressing Poor update performance Read Good for O(n) operations (aggregations etc) Not so good for selective queries Update in Place Ordered File (Random IO) v2

12 Supporting Lookups

13 Add Indexes for Selectivity
bob dave fred hary mike steve vince [PIC] Heap file, append only Index: Pairs [Name:Offset], sorted by name Binary search to get offsets of matching rows. Trees are similar Heap file

14 Goodbye Sequential Write Performance
Random IO bob dave fred hary mike steve vince This necessitates significant random IO in almost all tree implementations. Those that don’t suffer other problems like write amplification. Sequential IO

15 Option A: Put Index in Memory
RAM Disk

16 Option B: Use a chronology of small index files
Writes sort batch up write to disk small index file older files

17 …with tricks to optimise out the need for random IO
RAM file metadata & bloom filter Disk

18 Log Structured Merge Trees
A collection of small, immutable indexes Append only, de-duplicate by merging files Low memory index structures increase read performance Shift problem of Random Access from “write” to “read” concern

19 Option C: Brute Force A B C ‘column per file’ arrangement
same order for each file A B C B1 C1 A1 Use a ‘column per file’ arrangement Keep same ordering for each file Well suited to aggregations etc which examine all values for a column, but not all columns Compress columns, keep compressed as long as possible Stream columns in a tight loop (merge join) A2 B2 C2 A3 B3 C3 A4 B4 C4

20 Option C: Columnar B1 C1 A1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Merge Join
compressed columns A3 B3 C3 A4 B4 C4

21 Brute Force, by Column Less IO, by column, compressed
Held in Row order => merge joins via rowid Predicates can operate on compressed data Late materialisation.

22 Many of the most scalable technologies play to one of these core efficiencies

23 Riak, Mongo etc RAM Disk

24 Kafka (Queues are Databases Jim Gray)

25 Hbase, Cassandra, RocksDB etc
LSM

26 Redshift etc, Parquet (Hadoop)
B C B1 C1 A1 C2 A2 B2 A3 B3 C3 A4 B4 C4

27 Partitioning & Replication
Parallelism Partitioning & Replication

28 single endpoint query routing
Partitioning - KV K-V stores single endpoint query routing K-V stores -> single endpoint query routing Divide and conquer (MR style) for long running computations Shared nothing limitations with secondary indexes. HBase never added secondary indexes

29 Partitioning - Batch Divide and conquer

30 Partitioning: Concurrency Limits
Use of secondary indexes can limit concurrency at scale

31 Replication

32 Replication Replication provides one route out of this.
Replicas isolate load -> scales out concurrency for general workloads. Obviously provides redundancy etc too. If async, trades off against consistency (CAP)

33 Atomaticity & Ordering
Strong Consistency is expensive Serialisability ~ unachievable In a distributed system it’s very expensive Trades off with availability (CAP) These can be expensive

34 Solution: Avoid, Isolate or embrace disorder (Bloom etc)
Two options => Isolate consistency => Embrace inconsistency => Bloom etc (default to disorderly, force order at ‘last responsible moment’) Atomic (Mutable) Immutable

35 Circling Synchronous, Mutable State
Trapped in the Persist & Query pattern… in a fully ACID world

36 Separating Paradigms - CQRS
Client Command Query CQRS (Command Query Responsibility Segregation) Async Immutable DB DB Denormalise /Precompute

37 DRUID Query hits both history node realtime node
Separating write optimised and read optimised sections history node realtime node

38 Operational /Analytic Bridge
DATA Client Mutable Search SQL NoSQL Stream Immutable Views denormalise

39 Lambda Architecture Separating Stream & Batch
Stream layer (fast) Batch Layer Serving Layer All your data Query

40 Stream Data platforms Views Stream processor Client All your data
Search Client All your data Kafka Columnar Client Hadoop

41 Isolate consistency concerns, Leverage in-flight data, Promote immutable replicas
Sys 1 Stream Sys 2 Sys 3

42 Things we Like

43 Treating state is an immutable chronology
time

44 Listening and reacting to things as they are written

45 Replaying things that happened before
history Enrich views Regenerate state

46 Avoiding (or Isolating) the need to mutate
Mutable Immutable

47 Read-optimising the immutable
Denormalise

48 Primitive operations for Shards and Replicas (sync/async)

49 Being able to reason about time in an asynchronous world

50 Blending the utility of different tools in a single data platform
Sys 1 Stream Sys 2 Sys 3

51 slides available @ benstopford.com
Thanks slides benstopford.com


Download ppt "Intuitions for Scaling Data-Centric Architectures"

Similar presentations


Ads by Google