Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Art of Database Sharding Maxym Kharchenko Amazon.com.

Similar presentations


Presentation on theme: "The Art of Database Sharding Maxym Kharchenko Amazon.com."— Presentation transcript:

1 The Art of Database Sharding Maxym Kharchenko Amazon.com

2 Whoami Started as a database kernel developer –Network database: db_VISTA ORACLE DBA for ~ 10-12 years –Starting with ORACLE 8 Last 3 years: Sr. Persistence Engineer @Amazon.com OCM, ORACLE Ace Associate Blog: http://intermediatesql.com Twitter: @maxymkh

3 Agenda The “big data” scaling problem Solving scaling with “sharding” Practical sharding Your sharding experience: Good and bad

4 How to scale a database Old System New System Problem 20132014201520162017

5 The Big Data problem

6 Vertical Scaling

7 Scaling Up …

8

9 Scaled!

10 “Scaling up” math: System capabilities 2+2=3

11 “Scaling up” math: System cost 2+2=7

12 Scale out, not up

13 Use lots of cheap machines Not bigger machines

14 Commodity hardware = $$$$$$$

15 Distributed System

16

17

18 Distributed computing is hard

19 Shared Nothing (“Sharded”) System

20 Sharding is (relatively) easy

21 Split your data into small independent chunks And run each chunk on cheap commodity hardware

22 How to split your data Data

23 How to split your data

24

25

26

27 Vertical Partitioning

28

29

30 Horizontal Partitioning

31 Sharding

32 CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) );

33 CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) ) SHARD BY ( ) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES ); Sharding

34 Split size evenly SHARD BY LIST ( first_letter(author) ) ( SPLIT SIZE evenly ); A-G H-M N-T U-Z

35 Split load evenly SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 1-100101-200201-300301-400

36 Split load evenly SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 0123

37 Discourage cross shard access SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE id = 34567876; SELECT title FROM books WHERE id = 34567876;

38 Discourage cross shard access SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;

39 Discourage cross shard access SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS ); 0123 SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;

40 Discourage data move SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE ); 0123

41 Discourage data move SHARD BY mod(hash_function(author), 6) ( DISCOURAGE DATA MOVE ); 0123 45

42 Resharding HashMod/4 11 22 33 40 51 62 73 80 91 102 113 120 HashMod/4Mod/6 111 222 333 404 515 620 731 802 913 1024 1135 1200

43 Physical and Logical shards SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE ); DB 1DB 2DB 3DB 4

44 Executing queries def shard_query(sql, binds, shard_key): """ Execute query in the correct db """ shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds) SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;

45 Implementing Shards: Standbys Unsharded Standby Shard 1 Shard 2 Apps Read Only Drop non-qualifying data

46 Implementing Shards: Tables Shard1 Apps Tab A Shard 2 MV A Tab A Create materialized view … as select … from a@shard1 Create materialized view … as select … from a@shard1 Drop materialized view … preserve table Drop materialized view … preserve table Read Only

47 Implementing Shards: Moving “data head” Shard 1 Apps Shard 2 Logical Shard Physical Shard (1,2,3,4)1 (5,6,7,8)2 TimeLogical Shard Physical Shard 2011(1,2,3,4)1 2011(5,6,7,8)2 TimeLogical Shard Physical Shard 2011(1,2,3,4)1 2011(5,6,7,8)2 2012(1,2)1 2012(3,4)3 2012(5,6)2 2012(7,8)4 Shard 3 Shard 4

48 Data protection Shard 1 Shard 2 Shard 4 Shard 3 Stb 1 Stb 2 Stb 4 Stb 3 App

49 Why shards are awesome (potentially) Unlimited scaling Local ACID + relational Better maintenance Eggs not in one basket “Apples to apples comparison” with other shards

50 Why shards are NOT so great More systems –Power, rack space etc –Needs automation … bad –More likely to fail overall Some operations become difficult: –Transactions across shards –Foreign keys across shards More work: –Applications, developers, DBAs –High skill, DIY everything

51 Takeaways More > Bigger ORACLE is still cool

52 Thank you! maxym@amazon.com Twitter: maxymkh@ Blog: http://intermediatesql.com maxym@amazon.comhttp://intermediatesql.com


Download ppt "The Art of Database Sharding Maxym Kharchenko Amazon.com."

Similar presentations


Ads by Google