Download presentation
Presentation is loading. Please wait.
Published byFelicity Barker Modified over 9 years ago
1
The Art of Database Sharding Maxym Kharchenko Amazon.com
2
Whoami Started as a database kernel developer –Network database: db_VISTA ORACLE DBA for ~ 10-12 years –Starting with ORACLE 8 Last 3 years: Sr. Persistence Engineer @Amazon.com OCM, ORACLE Ace Associate Blog: http://intermediatesql.com Twitter: @maxymkh
3
Agenda The “big data” scaling problem Solving scaling with “sharding” Practical sharding Your sharding experience: Good and bad
4
How to scale a database Old System New System Problem 20132014201520162017
5
The Big Data problem
6
Vertical Scaling
7
Scaling Up …
9
Scaled!
10
“Scaling up” math: System capabilities 2+2=3
11
“Scaling up” math: System cost 2+2=7
12
Scale out, not up
13
Use lots of cheap machines Not bigger machines
14
Commodity hardware = $$$$$$$
15
Distributed System
18
Distributed computing is hard
19
Shared Nothing (“Sharded”) System
20
Sharding is (relatively) easy
21
Split your data into small independent chunks And run each chunk on cheap commodity hardware
22
How to split your data Data
23
How to split your data
27
Vertical Partitioning
30
Horizontal Partitioning
31
Sharding
32
CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) );
33
CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) ) SHARD BY ( ) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES ); Sharding
34
Split size evenly SHARD BY LIST ( first_letter(author) ) ( SPLIT SIZE evenly ); A-G H-M N-T U-Z
35
Split load evenly SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 1-100101-200201-300301-400
36
Split load evenly SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 0123
37
Discourage cross shard access SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE id = 34567876; SELECT title FROM books WHERE id = 34567876;
38
Discourage cross shard access SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;
39
Discourage cross shard access SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS ); 0123 SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;
40
Discourage data move SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE ); 0123
41
Discourage data move SHARD BY mod(hash_function(author), 6) ( DISCOURAGE DATA MOVE ); 0123 45
42
Resharding HashMod/4 11 22 33 40 51 62 73 80 91 102 113 120 HashMod/4Mod/6 111 222 333 404 515 620 731 802 913 1024 1135 1200
43
Physical and Logical shards SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE ); DB 1DB 2DB 3DB 4
44
Executing queries def shard_query(sql, binds, shard_key): """ Execute query in the correct db """ shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds) SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;
45
Implementing Shards: Standbys Unsharded Standby Shard 1 Shard 2 Apps Read Only Drop non-qualifying data
46
Implementing Shards: Tables Shard1 Apps Tab A Shard 2 MV A Tab A Create materialized view … as select … from a@shard1 Create materialized view … as select … from a@shard1 Drop materialized view … preserve table Drop materialized view … preserve table Read Only
47
Implementing Shards: Moving “data head” Shard 1 Apps Shard 2 Logical Shard Physical Shard (1,2,3,4)1 (5,6,7,8)2 TimeLogical Shard Physical Shard 2011(1,2,3,4)1 2011(5,6,7,8)2 TimeLogical Shard Physical Shard 2011(1,2,3,4)1 2011(5,6,7,8)2 2012(1,2)1 2012(3,4)3 2012(5,6)2 2012(7,8)4 Shard 3 Shard 4
48
Data protection Shard 1 Shard 2 Shard 4 Shard 3 Stb 1 Stb 2 Stb 4 Stb 3 App
49
Why shards are awesome (potentially) Unlimited scaling Local ACID + relational Better maintenance Eggs not in one basket “Apples to apples comparison” with other shards
50
Why shards are NOT so great More systems –Power, rack space etc –Needs automation … bad –More likely to fail overall Some operations become difficult: –Transactions across shards –Foreign keys across shards More work: –Applications, developers, DBAs –High skill, DIY everything
51
Takeaways More > Bigger ORACLE is still cool
52
Thank you! maxym@amazon.com Twitter: maxymkh@ Blog: http://intermediatesql.com maxym@amazon.comhttp://intermediatesql.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.