Intuitions for Scaling Data-Centric Architectures

Name: Intuitions for Scaling Data-Centric Architectures
Uploaded: 2017-08-10T05:13:27+00:00
Duration: PTM12S4
Channel: Cody Allen
Description: Intuitions for Scaling Data-Centric Architectures

Intuitions for Scaling Data-Centric Architectures
Ben Stopford Confluent Inc

Intuition does not come to the unprepared mind A.E.
Intuitions for Scale Intuition does not come to the unprepared mind A.E.

Locality & Sequential Addressing

Computers work best with sequential workloads
Disk buffer Page cache L3 cache L2 cache L1 cache Pre-fetch is your friend

Random vs. Sequential Addressing
300 reads/sec 200MB/s HDD can do ~300 reads /sec or 200MB/s sequentially For a 100B row that’s 7000x faster sequentially. Memory x faster when sequential Cache hierarchy aids sequential access Prefetching actually acts against random access at high throughputs. JVM highlights this further, Object arrays around 50x slower than primitives (although this is a size thing too). => Sequential Disk ~ Random Memory => as much sequential access and as little random access as possible e.g. sequential is ~7000x faster for 100B rows

Random RAM ~ Sequential Disk
This isn’t just Disk L3 L2 10-100x L1 Random RAM ~ Sequential Disk

We can write sequentially to a file quickly

Reading Efficiently Scan Position & Scan (pages)

Avoid Random Reads

Writing Tradeoffs Append Only Journal (Sequential IO) Update in Place
v1 Append Only Journal (Sequential IO) v2 v1 Append data to the tail of the file, sequentially. Updates either in place or append only If later, scan entirety in reverse chronological Use offsets to address if fixed width fields *Seq access is fast* Append Only great for write performance (append only) good for aggregate functions Update in Place great for row-based addressing Poor update performance Read Good for O(n) operations (aggregations etc) Not so good for selective queries Update in Place Ordered File (Random IO) v2

Supporting Lookups

Add Indexes for Selectivity
bob dave fred hary mike steve vince [PIC] Heap file, append only Index: Pairs [Name:Offset], sorted by name Binary search to get offsets of matching rows. Trees are similar Heap file

Goodbye Sequential Write Performance
Random IO bob dave fred hary mike steve vince This necessitates significant random IO in almost all tree implementations. Those that don’t suffer other problems like write amplification. Sequential IO

Option A: Put Index in Memory
RAM Disk

Option B: Use a chronology of small index files
Writes sort batch up write to disk small index file older files

…with tricks to optimise out the need for random IO
RAM file metadata & bloom filter Disk

Log Structured Merge Trees
A collection of small, immutable indexes Append only, de-duplicate by merging files Low memory index structures increase read performance Shift problem of Random Access from “write” to “read” concern

Option C: Brute Force A B C ‘column per file’ arrangement
same order for each file A B C B1 C1 A1 Use a ‘column per file’ arrangement Keep same ordering for each file Well suited to aggregations etc which examine all values for a column, but not all columns Compress columns, keep compressed as long as possible Stream columns in a tight loop (merge join) A2 B2 C2 A3 B3 C3 A4 B4 C4

Option C: Columnar B1 C1 A1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Merge Join
compressed columns A3 B3 C3 A4 B4 C4

Brute Force, by Column Less IO, by column, compressed
Held in Row order => merge joins via rowid Predicates can operate on compressed data Late materialisation.

Many of the most scalable technologies play to one of these core efficiencies

Riak, Mongo etc RAM Disk

Kafka (Queues are Databases Jim Gray)

Hbase, Cassandra, RocksDB etc
LSM

Redshift etc, Parquet (Hadoop)
B C B1 C1 A1 C2 A2 B2 A3 B3 C3 A4 B4 C4

Partitioning & Replication
Parallelism Partitioning & Replication

single endpoint query routing
Partitioning - KV K-V stores single endpoint query routing K-V stores -> single endpoint query routing Divide and conquer (MR style) for long running computations Shared nothing limitations with secondary indexes. HBase never added secondary indexes

Partitioning - Batch Divide and conquer

Partitioning: Concurrency Limits
Use of secondary indexes can limit concurrency at scale

Replication

Replication Replication provides one route out of this.
Replicas isolate load -> scales out concurrency for general workloads. Obviously provides redundancy etc too. If async, trades off against consistency (CAP)

Atomaticity & Ordering
Strong Consistency is expensive Serialisability ~ unachievable In a distributed system it’s very expensive Trades off with availability (CAP) These can be expensive

Solution: Avoid, Isolate or embrace disorder (Bloom etc)
Two options => Isolate consistency => Embrace inconsistency => Bloom etc (default to disorderly, force order at ‘last responsible moment’) Atomic (Mutable) Immutable

Circling Synchronous, Mutable State
Trapped in the Persist & Query pattern… in a fully ACID world

Separating Paradigms - CQRS
Client Command Query CQRS (Command Query Responsibility Segregation) Async Immutable DB DB Denormalise /Precompute

DRUID Query hits both history node realtime node
Separating write optimised and read optimised sections history node realtime node

Operational /Analytic Bridge
DATA Client Mutable Search SQL NoSQL Stream Immutable Views denormalise

Lambda Architecture Separating Stream & Batch
Stream layer (fast) Batch Layer Serving Layer All your data Query

Stream Data platforms Views Stream processor Client All your data
Search Client All your data Kafka Columnar Client Hadoop

Isolate consistency concerns, Leverage in-flight data, Promote immutable replicas
Sys 1 Stream Sys 2 Sys 3

Things we Like

Treating state is an immutable chronology
time

Listening and reacting to things as they are written

Replaying things that happened before
history Enrich views Regenerate state

Avoiding (or Isolating) the need to mutate
Mutable Immutable

Read-optimising the immutable
Denormalise

Primitive operations for Shards and Replicas (sync/async)

Being able to reason about time in an asynchronous world

Blending the utility of different tools in a single data platform
Sys 1 Stream Sys 2 Sys 3

slides available @ benstopford.com
Thanks slides benstopford.com

Intuitions for Scaling Data-Centric Architectures

Similar presentations

Presentation on theme: "Intuitions for Scaling Data-Centric Architectures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intuitions for Scaling Data-Centric Architectures

Similar presentations

Presentation on theme: "Intuitions for Scaling Data-Centric Architectures"— Presentation transcript:

Similar presentations

About project

Feedback