Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Super-Scalability But first…

Similar presentations


Presentation on theme: "An Introduction to Super-Scalability But first…"— Presentation transcript:

1

2 An Introduction to Super-Scalability

3 But first…

4 1 ENIAC1 Teletype

5 1 MainframeN Terminals

6 N ServersN Terminals

7 N ServersN PCs

8 N Web ServersN Browsers

9 N Web ServersN AJAX Apps

10 N ClustersN AJAX Apps

11 N ClustersN*M Phones

12 N CloudletsN*M Phones

13

14

15

16

17

18

19 CPUDiskMemory Network

20 Time / Throughput Space / Capacity

21 Time / Throughput Space / Capacity Complexity Locking

22

23 (but how to scale?)

24 Just make it bigger (vertical scaling)

25 (super-scalability)

26

27 Not Super One big data store One big memory store Make it bigger Make it redundant E.g. Full activity logging Partitioning Sharding / Hashing Growth = Add Partition Tradeoff: Splitting Partitions Tradeoff: Redundancy becomes a distribution problem ……CCBBAA

28 Not Super Number of objects increase As relations increase, add time or space requirements Common with graph problems E.g. PageRank Distribution Chop up problem / workload Map/Reduce Tradeoff: coordination Tradeoff: network

29 Not Super Tune your code Tune your database Tune your network Better hardware Optimization As fast as possible Can’t scale as fast as growth Specialization – ONE thing Caching - Reduces work in trade for space Tradeoff: space Tradeoff: coordination

30 Not Super One at a time Serialized access Parallelizing / Estimating Separate reads & writes Non-locking estimation Reduce contention Tradeoff: space Tradeoff: coordination

31

32

33 Partitions: Data & Processing Sharding Worker Processes Coordination: Distribution & Ordering Queues & Managers Separate Read/Write Access What does this make the system look like?

34 And now…

35 Atomicity – all or nothing Consistency – always correct Isolation – changesets executed independently Durability – once committed, stays so Really hard to scale in one big block (although SSDs + RAM helps!)

36 (it depends)

37 Basically Available Soft State Eventual Consistency A node will either eventually get a change or retire Well…still need conflict resolution BASE is NOT ACID (get it?)

38

39 Choose TWO: Consistency Availability Partition tolerance ManagerManager Replica 1 Replica 2 Double Outage! Client 1 Client 2

40

41

42 Log Profile Tune Test Divide Compare Partition No, really, log a lot

43 1.The network is reliable.network 2.Latency is zero.Latency 3.Bandwidth is infinite.Bandwidth 4.The network is secure.secure 5.Topology doesn't change.Topology 6.There is one administrator.administrator 7.Transport cost is zero. 8.The network is homogeneous.

44

45 Separate operations for: Command – perform an action Query – returns data about state Promotes simpler programs Allows Command Queues Reduces locking

46 Applications SaaS Storage Identity Runtime Queue / Bus PaaS Compute Block Data Network IaaS

47 ComponentExample ComputeAmazon EC2 Azure Web/Worker Roles StorageAmazon S3 Azure TableStore NetworkAny CDN

48 ComponentExample DatabaseSQL Azure Postgres MySQL NoSQLCassandra Redis BigTable MongoDB CacheMemcache QueueAzure Service Bus ProcessingHadoop Storm

49 Salesforce? (Also sort of a platform) Whateva!

50 Cassandra

51 A “scalable” key-value store Automatic partitioning Automatic replicas

52

53

54

55 Worse than SQL Tuning?

56

57 Get user by user id Get item by item id Get all the items that a particular user likes Get all the users who like a particular item

58 Can’t get all the items that a particular user likes (without a giant scan)

59 N-M relationship is modeled with two tables. But Properties require secondary lookups.

60 Can put some data in the indexes if your queries need it. (Or serialize data.)

61 SuperColumns let you store other dimensions of data. (eek?)

62 Composite (sorted) column keys let you do neat things like time-order the mapping.

63 Roll your own model – see www.datastax.com for great data model articleswww.datastax.com

64 Each Tuple has a Timestamp Last change wins Requires clock synchronization (Working on other strategies)

65 But wait, there’s more….

66 N*M*Q CloudletsN*M*Q Devices

67 It’s coming. Can your servers handle it?

68 Arduino Netduino Raspberry Pi ($25)

69 Cross-thing sharing Data storage Analysis

70 Communication Network Effect Analytics

71 Self-sufficient unit of scale All components required to operate a portion of workload Known performance characteristics Known cost to interact with other cells

72 How big is your project?

73 50,000 doctors 100 editors 500GB of data Does it matter?


Download ppt "An Introduction to Super-Scalability But first…"

Similar presentations


Ads by Google