Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Distributed Databases (DDBs) Chap. 25. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.

Similar presentations


Presentation on theme: "1 Distributed Databases (DDBs) Chap. 25. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations."— Presentation transcript:

1 1 Distributed Databases (DDBs) Chap. 25

2 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations Multiple CPU's – each has DBMS, but data distributed

3 3 Advantages of DDBs Distributed nature of some DB applications (bank branches) Increased reliability and availability if site failure - also replicate data at > 1 site Data sharing but also local control Improved performance - smaller DBs exist at each site Easier expansion

4 4 Disadvantages: Increased complexity Additional functions needed: global vs. local queries –access remote sites, transmit queries and data keep track of data and replication execution strategies if data at > 1 site –which copy to access –maintain consistency of copies recover from site crashes

5 5 Architectures – parallel vs distributed Distributed Systems goal: to offer local DB autonomy at geographically distributed locations Parallel Systems goal: to construct a faster centralized computer –Improve performance through parallelization –Distribution of data governed by performance –Processing, I/O simultaneously

6 6 Parallel DBSs Shared-memory multiprocessor –get N times as much work with N CPU's access –MIMD, SIMD - equal access to same data, massively parallel Parallel shared nothing –data split among CPUs, each has own CPU, divide work for transactions, communicate over high speed networks LANs - homogeneous machines CPU + memory - called a site

7 7 Distributed DBSs (DDBS) Distributed DB - share nothing –lower communication rates –WAN –heterogeneous machines –Homogeneous DDBS homogeneous – same DBMSs –Heterogeneous DDBS different DBMSs - need ODBC, standard SQL

8 Slide 25- 8

9 9 Heterogeneous distributed DBSs HDDBs –Data distributed and each site has own DBMS ORACLE at one site, DB2 at another, etc. –need ODBC, standard SQL –usually transaction manager responsible for cooperation among sites – must coordinate distributed transaction –need data conversion and to access data at other sites

10 10 Federated DB - FDBS federated DB is a multidatabase that is autonomous collection of cooperating DBSs that are heterogeneous preexisting DBs form new database Each DB specifies import/export schema (view) –keeps a partial view of total schema Each DB has its own local users, local transparency and DBA – appears centralized for local autonomous users – appears distributed for global users

11 Must change classroom Wed. –SEC3439 11

12 12 FDBSs - Data Integration Fully-integrated, logical composite of all constituent databases Potential problems –Incompatible data types or query syntax –Semantically equivalent by differently named parts of schemas –Use views for data integration reference –Five level schema architecture Component – translate local schema into common database model

13 DDBs Client-Server –Client sends request for service (strict – fixed roles) –3-tier architecture P2P –Every site can act as server to store part of DB and as client to request service 13

14 Slide 25- 14

15 15 DDBS Issues in DDBS in slides that follow

16 16 Data fragments Can distribute a whole relation at a site or Data fragments – logical units of the DB assigned for storage at various sites – horizontal fragmentation - subset of tuples in the relation (select) – vertical fragmentation - keeps only certain attributes of relation (project) need a PK

17 17 Fragments cont’d horizontal fragments: – complete - set of fragments whose conditions include every tuple – if disjoint - tuples only member of 1 fragment salary < 5000 and dno=4 vertical fragments – –Complete can be obtained by: L1 U L2 U... Ln - attributes of R where Li intersect Lj = PK(R)

18 18 Replication Full vs. partial replication Which copy to access Improves performance for global queries but updates a problem Ensure consistency of replicated copies of data

19 19 Example replication/fragmentation Example of fragments for company DB: site 1 - company headquarters gets entire DB site 2, 3 – horizontal fragments based on dept. no.

20 Slide 25- 20

21 Slide 25- 21

22 22 To process a query Must use data dictionary that includes info on data distribution among servers Parse user query –decomposed into independent site queries –each site query sent to appropriate server site –site processes local query, sends result to result site –result site combines results of subqueries Ensure atomicity

23 23 Additional Issues in DDBSs Distributed transaction if data distributed Transaction manager (TM) knows where data is stored TM decides which site starts local components of transaction Distributed query processing

24 24 Distributed Query Processing If Query (read-only) - no 2PC needed If horizontally fragmented – decompose query into subqueries that can be processed in parallel –one site (result site) accepts all results and puts together Must also do order by, count

25 Query Parallelism Decompose query into parts that can be executed in parallel at several sites –Intra query parallelism If shared nothing & horizontally fragmented: Select name, phone from account where age > 65 –Decompose into K different queries –Result site accepts all and puts together (order by, count) 25

26 Query Parallelism What if a join and table is fragmented? –Difficult problem if table fragments at different sites –Must get all values of join attributes at one site –Then broadcast to relevant sites value of join attribute If site 1 has values 1-10 and site 2 has 11-20, only sent to those sites Result tuples returned, join performed –Example: A |X|B Similar to semi-join |X A1 B1 A2 B2 A3 B3 each at a different site A1,A2,A3 sent to S1 S1 sends FK to B1,B2,B3 sites B1,B2,B3 site send tuples where PK=A1,A2,A3 join performed at S1 26

27 27 Distributed Query Processing If the data is not fragmented –process whatever operations you can locally (select, project) Query can involve joins between data at multiple sites Must transfer data over network Transfer of data highest cost –Algorithms to minimize amount of data transferred (NP-hard)

28 28 Distributed Query Processing Assume R1 |X| R2 → Site 3 with R1 at S1 and R2 at S2 Can do: a)R1 → S3 and R2 → S3, do join at S3 b)R1 → S2, execute join at S2, result to S3 c)R2 → S1, execute join at S1, result to S3

29 Semi-join |X Send join attribute from Si to Sj Select tuples at Sj who match keys sent, send those tuples back to Si Perform join at Si 29

30 30 Example using Semi-join  dname, lname Department |X| mgr=ssn Employee Assume Department is R1 and Employee is R2: 1) At S2: E‘ ← (  ssn Emp) //project join attribute Send to S1 2) At S1: E’’ ← (  mgrssn, dname (Department |X| mgrssn=ssn E')) join with department and send only needed attributes Send to S2  3) At S3:  dname, lname (E'' |X| mgrssn=ssn Employee) join with employee and send final attributes needed Send to S3

31 31 Distributed Query Processing Henver & Yao Cost: –time to compute results –time to transmit data Local data requires only processing time Assume transmission time is most costly Goal is to minimize transmission time

32 32 Distributed Query Processing Assume you have: R3=R1|X| att=att R2 R1 is at S1, R2 at S2 and result relation R3 sent to result site S Amount of data to transmit: –Assume R1 is 1M bytes –R2 is 250K bytes –R3 is 50k bytes –Using semi-join:  att R1 is 100K  att R2 is 50K  att R1|X|R2 is 10K  att R2|X|R1 is 25K

33 33 Distributed Query Processing Show all possibilities to previous problem and determine best distributed query processing strategy

34 Why is Checkpoints/recovery important? Checkpoints/recovery HPC – IBM used different chkpt / recovery schemes Took several days to recover from a crash because different schemes used, no prior knowledge Needs to be studied 34

35 35 Concurrency Control and Recovery in Distributed Databases Distributed transactions inherit ACID properties problems: – failure at individual sites and communication links – distributed deadlock (usually timeout) – distributed commit (use 2PC) – concurrency control with multiple copies of data

36 36 Distributed concurrency control using locking Centralized vs. distributed solutions Definition: distinguished copy –Locks associated with this copy

37 37 Distributed Locking - Primary site technique Primary site technique Distinguished copy of all data at this site All locks kept at this site, all requests sent here Site acts as coordinator for all DB items If write lock, DBMS must update all copies If read lock from primary site, can access local copy disadv: all locking requests sent to same site - bottleneck if failure at primary site? must abort and restart all Ts

38 38 Distributed Locking - Primary site with backup Primary site with backup all lock information at both sites, so failure is less of a problem disadv: slower - 2 sites to communicate with

39 39 Distributed Locking - Primary copy technique Primary copy technique - distribute responsibility distinguished copies of different items at different sites (can also choose a new coordinator if a failure - election to choose self as new site)

40 40 Distributed Locking - Voting Voting - no distinguished copy lock request sent to all sites with copy of data if granted lock by majority of copies, hold lock and inform sites with copies lock is granted if don't get a majority, timeout and cancel request sent to all sites disadv: lots of message traffic

41 41 Distributed Commit - Chap. 23.6 if updating data at different sites for same transaction, when to commit? difficult part - commit together so have atomicity states of basic transactions: –active - can enter other 2 states –committed - cannot leave this state –aborted - cannot leave this state

42 42 States commit Active abort

43 43 Commit problems To commit a distributed transaction need a coordinator (originator) site (S1 in our example) Given transaction T with with components T1 and T2 – where T1 executes at site S1 – T2 executes at site S2 a) if S1 commits locally, sends commit message to S2 – BUT if S2 crashes, T2 aborts before completes subtask – not atomic b) if both T1 and T2 ready to commit S1 sends message to commit – BUT if T2 crashes during commit, not atomic if can’t recover

44 44 To solve problem How to solve these problems? Need an additional state ready state when T is done and ready to commit, it goes to the ready state in the ready state, a transaction can go to the commit or abort state place prepare log on log buffer, forced to log file - can reconstruct the transaction

45 45 States commit ActiveReady to commit abort

46 46 Use Two-Phase Commit (2PC) Used by commercial DBS to achieve coordinated commit of distributed transactions Need a coordinator - usually a site that is participating in the transaction Only needed if updates

47 47 2PC Phase 1 – – When each subquery concludes, signal coordinator –If coordinator doesn’t hear from everyone (time out) send abort –Else coordinator receives done from all sends messages to all sites asking them to prepare site transactions Each site writes changes to log file if log successful –Send “ready to commit” and enter prepared state else if fails or cannot commit for other reasons –send “cannot commit”

48 48 2PC Phase 2 – –If coordinator does not receive prepare to commit from all, coordinator sends abort –Otherwise all prepare requests were successful responses, so coordinator sends commit Sites record commit in log, update DB subquery concluded ready to commit (prepared state) commit commit recorded

49 49 Solved by 2PC? a) if S1 commits locally, sends commit to S2 – BUT if S2 crashes, T2 aborts and it is not atomic With 2PC –if T1 completes Transaction T1 is prepared –If T2 crashes T2 is unsuccessful T1 can still abort from prepared state

50 50 Solved by 2PC? b) If both T1 and T2 ready to commit, S1 sends message to commit – BUT if T2 crashes before commit completed, not atomic With 2PC –if both T1 and T2 successfully complete Transaction T2 and T1 are both prepared T1 able to commit. If T2 crashes during commit can recover prepared state and commit – uses log

51 51 Any problems with 2PC? A blocking protocol (block until commit or rollback received) In most practical applications, blocking caused by 2PC are rare so most systems use 2PC However, when could 2PC block?

52 52 2PC blocking However, when could 2PC block? –Problem can occur when site in prepared state –Site voted to commit –Cannot commit until receive commit from coordinator –Cannot change vote and abort –Participant is blocked until receive message from coordinator – what if coordinator fails? (or coordinator can block while waiting for replies from participants – what about time out?) Solution? –Divide second commit phase into prepare-to-commit and commit

53 3PC subquery concluded ready to commit (prepared state) prepare to commit commit commit recorded In prepare to commit –Sets maximum time-out period (if coordinator crashes and don’t hear from coordinator) –Allows sites to communicate with each other –Can still commit or abort and release locks 53


Download ppt "1 Distributed Databases (DDBs) Chap. 25. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations."

Similar presentations


Ads by Google