Download presentation
1
Intuitions for Scaling Data-Centric Architectures
Ben Stopford Confluent Inc
2
Intuition does not come to the unprepared mind A.E.
Intuitions for Scale Intuition does not come to the unprepared mind A.E.
3
Locality & Sequential Addressing
4
Computers work best with sequential workloads
Disk buffer Page cache L3 cache L2 cache L1 cache Pre-fetch is your friend
5
Random vs. Sequential Addressing
300 reads/sec 200MB/s HDD can do ~300 reads /sec or 200MB/s sequentially For a 100B row that’s 7000x faster sequentially. Memory x faster when sequential Cache hierarchy aids sequential access Prefetching actually acts against random access at high throughputs. JVM highlights this further, Object arrays around 50x slower than primitives (although this is a size thing too). => Sequential Disk ~ Random Memory => as much sequential access and as little random access as possible e.g. sequential is ~7000x faster for 100B rows
6
Random RAM ~ Sequential Disk
This isn’t just Disk L3 L2 10-100x L1 Random RAM ~ Sequential Disk
7
Files
8
We can write sequentially to a file quickly
9
Reading Efficiently Scan Position & Scan (pages)
10
Avoid Random Reads
11
Writing Tradeoffs Append Only Journal (Sequential IO) Update in Place
v1 Append Only Journal (Sequential IO) v2 v1 Append data to the tail of the file, sequentially. Updates either in place or append only If later, scan entirety in reverse chronological Use offsets to address if fixed width fields *Seq access is fast* Append Only great for write performance (append only) good for aggregate functions Update in Place great for row-based addressing Poor update performance Read Good for O(n) operations (aggregations etc) Not so good for selective queries Update in Place Ordered File (Random IO) v2
12
Supporting Lookups
13
Add Indexes for Selectivity
bob dave fred hary mike steve vince [PIC] Heap file, append only Index: Pairs [Name:Offset], sorted by name Binary search to get offsets of matching rows. Trees are similar Heap file
14
Goodbye Sequential Write Performance
Random IO bob dave fred hary mike steve vince This necessitates significant random IO in almost all tree implementations. Those that don’t suffer other problems like write amplification. Sequential IO
15
Option A: Put Index in Memory
RAM Disk
16
Option B: Use a chronology of small index files
Writes sort batch up write to disk small index file older files
17
…with tricks to optimise out the need for random IO
RAM file metadata & bloom filter Disk
18
Log Structured Merge Trees
A collection of small, immutable indexes Append only, de-duplicate by merging files Low memory index structures increase read performance Shift problem of Random Access from “write” to “read” concern
19
Option C: Brute Force A B C ‘column per file’ arrangement
same order for each file A B C B1 C1 A1 Use a ‘column per file’ arrangement Keep same ordering for each file Well suited to aggregations etc which examine all values for a column, but not all columns Compress columns, keep compressed as long as possible Stream columns in a tight loop (merge join) A2 B2 C2 A3 B3 C3 A4 B4 C4
20
Option C: Columnar B1 C1 A1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Merge Join
compressed columns A3 B3 C3 A4 B4 C4
21
Brute Force, by Column Less IO, by column, compressed
Held in Row order => merge joins via rowid Predicates can operate on compressed data Late materialisation.
22
Many of the most scalable technologies play to one of these core efficiencies
23
Riak, Mongo etc RAM Disk
24
Kafka (Queues are Databases Jim Gray)
25
Hbase, Cassandra, RocksDB etc
LSM
26
Redshift etc, Parquet (Hadoop)
B C B1 C1 A1 C2 A2 B2 A3 B3 C3 A4 B4 C4
27
Partitioning & Replication
Parallelism Partitioning & Replication
28
single endpoint query routing
Partitioning - KV K-V stores single endpoint query routing K-V stores -> single endpoint query routing Divide and conquer (MR style) for long running computations Shared nothing limitations with secondary indexes. HBase never added secondary indexes
29
Partitioning - Batch Divide and conquer
30
Partitioning: Concurrency Limits
Use of secondary indexes can limit concurrency at scale
31
Replication
32
Replication Replication provides one route out of this.
Replicas isolate load -> scales out concurrency for general workloads. Obviously provides redundancy etc too. If async, trades off against consistency (CAP)
33
Atomaticity & Ordering
Strong Consistency is expensive Serialisability ~ unachievable In a distributed system it’s very expensive Trades off with availability (CAP) These can be expensive
34
Solution: Avoid, Isolate or embrace disorder (Bloom etc)
Two options => Isolate consistency => Embrace inconsistency => Bloom etc (default to disorderly, force order at ‘last responsible moment’) Atomic (Mutable) Immutable
35
Circling Synchronous, Mutable State
Trapped in the Persist & Query pattern… in a fully ACID world
36
Separating Paradigms - CQRS
Client Command Query CQRS (Command Query Responsibility Segregation) Async Immutable DB DB Denormalise /Precompute
37
DRUID Query hits both history node realtime node
Separating write optimised and read optimised sections history node realtime node
38
Operational /Analytic Bridge
DATA Client Mutable Search SQL NoSQL Stream Immutable Views denormalise
39
Lambda Architecture Separating Stream & Batch
Stream layer (fast) Batch Layer Serving Layer All your data Query
40
Stream Data platforms Views Stream processor Client All your data
Search Client All your data Kafka Columnar Client Hadoop
41
Isolate consistency concerns, Leverage in-flight data, Promote immutable replicas
Sys 1 Stream Sys 2 Sys 3
42
Things we Like
43
Treating state is an immutable chronology
time
44
Listening and reacting to things as they are written
45
Replaying things that happened before
history Enrich views Regenerate state
46
Avoiding (or Isolating) the need to mutate
Mutable Immutable
47
Read-optimising the immutable
Denormalise
48
Primitive operations for Shards and Replicas (sync/async)
49
Being able to reason about time in an asynchronous world
50
Blending the utility of different tools in a single data platform
Sys 1 Stream Sys 2 Sys 3
51
slides available @ benstopford.com
Thanks slides benstopford.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.