Download presentation
Presentation is loading. Please wait.
Published byJocelyn Richard Modified over 8 years ago
2
An Introduction to Super-Scalability
3
But first…
4
1 ENIAC1 Teletype
5
1 MainframeN Terminals
6
N ServersN Terminals
7
N ServersN PCs
8
N Web ServersN Browsers
9
N Web ServersN AJAX Apps
10
N ClustersN AJAX Apps
11
N ClustersN*M Phones
12
N CloudletsN*M Phones
19
CPUDiskMemory Network
20
Time / Throughput Space / Capacity
21
Time / Throughput Space / Capacity Complexity Locking
23
(but how to scale?)
24
Just make it bigger (vertical scaling)
25
(super-scalability)
27
Not Super One big data store One big memory store Make it bigger Make it redundant E.g. Full activity logging Partitioning Sharding / Hashing Growth = Add Partition Tradeoff: Splitting Partitions Tradeoff: Redundancy becomes a distribution problem ……CCBBAA
28
Not Super Number of objects increase As relations increase, add time or space requirements Common with graph problems E.g. PageRank Distribution Chop up problem / workload Map/Reduce Tradeoff: coordination Tradeoff: network
29
Not Super Tune your code Tune your database Tune your network Better hardware Optimization As fast as possible Can’t scale as fast as growth Specialization – ONE thing Caching - Reduces work in trade for space Tradeoff: space Tradeoff: coordination
30
Not Super One at a time Serialized access Parallelizing / Estimating Separate reads & writes Non-locking estimation Reduce contention Tradeoff: space Tradeoff: coordination
33
Partitions: Data & Processing Sharding Worker Processes Coordination: Distribution & Ordering Queues & Managers Separate Read/Write Access What does this make the system look like?
34
And now…
35
Atomicity – all or nothing Consistency – always correct Isolation – changesets executed independently Durability – once committed, stays so Really hard to scale in one big block (although SSDs + RAM helps!)
36
(it depends)
37
Basically Available Soft State Eventual Consistency A node will either eventually get a change or retire Well…still need conflict resolution BASE is NOT ACID (get it?)
39
Choose TWO: Consistency Availability Partition tolerance ManagerManager Replica 1 Replica 2 Double Outage! Client 1 Client 2
42
Log Profile Tune Test Divide Compare Partition No, really, log a lot
43
1.The network is reliable.network 2.Latency is zero.Latency 3.Bandwidth is infinite.Bandwidth 4.The network is secure.secure 5.Topology doesn't change.Topology 6.There is one administrator.administrator 7.Transport cost is zero. 8.The network is homogeneous.
45
Separate operations for: Command – perform an action Query – returns data about state Promotes simpler programs Allows Command Queues Reduces locking
46
Applications SaaS Storage Identity Runtime Queue / Bus PaaS Compute Block Data Network IaaS
47
ComponentExample ComputeAmazon EC2 Azure Web/Worker Roles StorageAmazon S3 Azure TableStore NetworkAny CDN
48
ComponentExample DatabaseSQL Azure Postgres MySQL NoSQLCassandra Redis BigTable MongoDB CacheMemcache QueueAzure Service Bus ProcessingHadoop Storm
49
Salesforce? (Also sort of a platform) Whateva!
50
Cassandra
51
A “scalable” key-value store Automatic partitioning Automatic replicas
55
Worse than SQL Tuning?
57
Get user by user id Get item by item id Get all the items that a particular user likes Get all the users who like a particular item
58
Can’t get all the items that a particular user likes (without a giant scan)
59
N-M relationship is modeled with two tables. But Properties require secondary lookups.
60
Can put some data in the indexes if your queries need it. (Or serialize data.)
61
SuperColumns let you store other dimensions of data. (eek?)
62
Composite (sorted) column keys let you do neat things like time-order the mapping.
63
Roll your own model – see www.datastax.com for great data model articleswww.datastax.com
64
Each Tuple has a Timestamp Last change wins Requires clock synchronization (Working on other strategies)
65
But wait, there’s more….
66
N*M*Q CloudletsN*M*Q Devices
67
It’s coming. Can your servers handle it?
68
Arduino Netduino Raspberry Pi ($25)
69
Cross-thing sharing Data storage Analysis
70
Communication Network Effect Analytics
71
Self-sufficient unit of scale All components required to operate a portion of workload Known performance characteristics Known cost to interact with other cells
72
How big is your project?
73
50,000 doctors 100 editors 500GB of data Does it matter?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.