Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures.

Similar presentations


Presentation on theme: "Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures."— Presentation transcript:

1 Information Resources Management April 17, 2001

2 Agenda n Administrivia n Database Architectures

3 Administrivia n Homework #8

4 Database Architectures n Centralized n Client-Server n Parallel - single site n Distributed - multiple sites

5 Database Architectures Centralized (Parallel) Distributed Client-Server Function Data

6 Centralized n PC, Mini, or Mainframe n Single Database n Single Database Manager n One or More Users n Data and Function in One Place

7 Client-Server n PCs to Mainframes to Minis n PC to PC n Mainframe to Mainframe n Use Desktop Processing Power n Better User Interface n Greater Functionality n Retain Centralized Control of Data

8 Client-Server: Basic Model ServerClient Request Result

9 Servers n Supercomputer n Mainframe n Mini n PC Server n All retain all data

10 Client-Server Architecture Data Function Server (Back-End) Client (Front-End) Thin Client Fat Client

11 Functionality n Presentation n I/O Processing n Validation n Business Rules n Application Logic n Data Management n Validation n Error Handling

12 “Thin” Client n Presentation Services Only n Accept Input n Format Output n Display n Server does all processing

13 “Fat” Client n Presentation n Validation n Application Logic - Programs n Data Management n Send SQL to Server n Server is just DBMS

14 “In Between” Client n Client n Presentation n Some Application Logic n Server n Some Applicaton Logic n Data Management and Services

15 Benefits of Client-Server n Use Local Processing Power n Better User Interface n Some Functionality if System Down n Use Sunk Costs of PCs n Support Reengineering n Support Intranets n Flexibility, Scalability, Customizeability

16 Challenges of Client-Server n Cost of (Upgraded) PCs n Network Reliance n Distributing Application Updates n Management of Complex System n Problem Identification & Resolution n Application Partitioning

17 Other Client-Server Architectures n Traditional is Two-Tiered (client-server) n Three-Tiered n Client-Application Server-DB Server n (PC - Mini - Mainframe) n (PC - PC Server - Mainframe) n Beyond Three n PC - PC Server - Web Server - Mini - Mainframe

18 Client-Server vs. Distributed n Client-Server: Application Distribution n Distributed: Data Distribution Often, “client-server” is used to refer to either application distribution or data distribution or both.

19 Middleware n What if n Multiple databases (sources) need to be accessed from a single client? n Different kinds of clients? n Mix of clients and servers? n Want to take advantage of existing base of applications (legacy systems)?

20 Middleware n Fat Clients just send SQL transactions n Other types of transactions may be needed based on the server (system)

21 Middleware Software that shields applications from the complexity of the operating environment. Client Middleware System (Legacy) System (Legacy)

22 Types of Middleware n Transaction Process (TP) Monitor n Database Middleware n Remote Procedure Call (RPC) n Message-Oriented Middleware (MOM) n Object-Request Brokers n (CORBA - ORB)

23 TP Monitor n Synchronous - sender must wait n Queuing n Message Delivery n Insured Delivery n Either Direction

24 Database Middleware n Variety of Clients/Platforms n Variety of Servers/DBMSs/Platforms n Specific to DB transactions (SQL)

25 Message-Oriented Middleware (MOM) n Asynchronous - clients do not wait n Queues & Queue Management/Recovery n Message Delivery n Insured Delivery n Either Direction (like email or EDI only transactions)

26 Advantages of Middleware n Leverage sunk costs (legacy systems) n Reduce development cost n Reduce development time n Increase responsiveness n Improve overall systems management n Consolidate diffuse information

27 Challenges of Middleware n Cost n Session management - Transaction state n Security n Network reliance n Diversity of systems - lack of standards n Constant technology change n Availability of talent n Middleware Management

28 Parallel and Distributed n Client-Server is an attempt to improve performance n Reduce time to execute a transaction n Parallel n Reduce time to get the data n Distributed

29 Parallel Systems n Single site for data n Very Large databases n Operations performed simultaneously

30 Parallel Database Architecures n Shared Memory n Shared Disk n Shared Nothing n Hierarchical

31 Shared Memory P P P M

32 n Advantages n Extremely efficient communications n Disadvantages n Max of 32/64 processors n Bus becomes bottleneck

33 Shared Disk P P P M M M

34 n Advantages n No bus bottleneck n Fault tolerance provided n Disadvantages n Disk access becomes bottleneck

35 Shared Nothing P P P M M M

36 n Advantages n No disk bottleneck n Highly scaleable n Disadvantages n High communication overhead/cost n Between processors n To another processor’s data

37 Hierarchical P P P P P M M M

38 Hierarchical n Advantages n Best of all worlds n Disadvantages n Worst of all worlds n Some high communcation overhead/cost n Between subsystems n Complexity

39 Distributed Databases n Client-Server - distribute functionality n What about distributing data?

40 Distributed Databases n Overview n Distributed Storage n Distributed Queries n Distributed Transactions n Multidatabase (Middleware)

41 Distributed Databases n Multiple locations n Single logical database n Several physical databases n Network connections

42 Advantages n Sharing across locations n Local control n Availability

43 Challenges n Development costs n People & Equipment n Testing n Problem identification & resolution n Technical expertise n Network dependence n Increased processing overhead

44 Distributed Data Storage n Replication n Fragmentation n Both

45 Replication n Data is repeated n Spectrum of options available n Temporary replication of specific rows n Replicate infrequently changed data n Replicate by site n Central site - all / each local site - their data only n Full replication n Everything everywhere

46 Concerns with Replication n Availability needed n Amount of parallelism in reads n Overhead of updates n Keeping replicas updated n Conflicting updates

47 Fragmentation n Partitioning n Divide data into subsets based on need n Have to be able to pull back together to get original tables

48 Fragmentation n Horizontal n by rows n specified conditions n Vertical n by column n each requires primary key (or created key) n Mixed n by row and column

49 Fragmentation & Replication n Repeat as necessary: n Replicate fragments n Fragment replicas n Don’t lose track of what you have and where it is!

50 Network Transparency n Distributing data should not require that the user know where or how it’s been distributed. n The database should be seen as a single entity no matter how fragmented and replicated it becomes.

51 Network Transparency n Some DBMSs are starting to provide this level of functionality so transparency exists even at the program level, but in many cases this “transparency” must be programmed into the applications. n It must always be designed into the database.

52 Distributed Queries n How do you query data that is everywhere?

53 Effeciency vs. Overhead n Splitting the query apart n Keeping track of the data/locations n Making sure everything gets executed n Putting the results back together n Generating network traffic n Handling partial results

54 Distributed Queries n Full replication can avoid the overhead n Huge increase in update overhead n Parallel execution no longer possible n Additional costs of replication

55 Example n 5 sites - NY, Pgh, Chicago, Dallas, Los Angeles n Data fragmented by site - no replication n Query (in Pgh): SELECT Name, Max (Salary) from Employee

56 Option 1 - High Bandwidth 1. Have all sites send their full employee tables to Pgh. 2. Build a temporary employee table. 3. Run the query against this table.

57 Option 2 - Not so High Bandwidth 1. Examine the query and determine it can be run separately at each location and the results combined. 2. Submit just the query to each location. 3. Wait for the results from each city. 4. As results return, build a temporary table (5 rows only). 5. Find the max using the temporary table.

58 Distributed Transactions n Transaction Types n Coordinators n Commit Protocols n Concurrency Controls n Deadlocks

59 Transaction Types n Local - transaction only needs local data n Global - transaction uses non-local data n My global becomes someone else’s local n Either type of transaction must still have ACID properties - global is the concern

60 System Structure n Things to do: 1. Process local transactions (transaction manager) 2. Process and track global transactions (transaction coordinator)

61 Global Processing 1. Recognize as global 2. Break up transaction 3. Distribute pieces 4. Assemble results 5. Coordinate termination 6. Handle problems

62 Coordinator of Coordinators n Coordinate among sites n Detect problems n Attempt to fix n Share status with others

63 Coordinator Failure n Backup Coordinator n receives all messages - maintains state n monitors coordinator n automatically takes over if coordinator down n avoids delays - increases overhead n Election n highest pre-assigned number

64 Commit Protocols n Two-Phase n Three-Phase n All sites must commit or all sites have to rollback n Replicated data only

65 Two-Phase Commit n Phase 1 n Send PREPARE to all sites n Sites respond READY or ABORT n Phase 2 n If all sites READY, n COMMIT locally - Send COMMITs n If not READY or time expires n ROLLBACK locally - Send ROLLBACK

66 Two-Phase Commit Coordinator Site Site requests commit

67 Two-Phase Commit - Phase 1 Coordinator Site Send PREPARE - all sites

68 Two-Phase Commit - Phase 1 Coordinator Site Sites respond READY

69 Two-Phase Commit - Phase 2 Coordinator Site COMMIT locally

70 Two-Phase Commit - Phase 2 Coordinator Site Send COMMIT - all sites

71 Two-Phase Commit - Phase 1 Coordinator Site Site responds ABORT or does not respond

72 Two-Phase Commit - Phase 2 Coordinator Site ROLLBACK locally

73 Two-Phase Commit - Phase 2 Coordinator Site Send ROLLBACK - all sites

74 Site Failure - Recovery n COMMIT and ROLLBACK as normal n If READY only n Check with coordinator or other sites n Either COMMIT or ROLLBACK n If no one found, ROLLBACK

75 Coordinator Failure n Ask the sites n If one has COMMIT, then REDO n If one has ROLLBACK, then UNDO n If one doesn’t have READY, UNDO n If all READY only n Coordinator must decide n Sites must wait and locks are held n “Blocking” occurs

76 Three-Phase Commit n Phase 1 n Sent PREPARE n Sites respond READY or ABORT n Phase 2 n If all sites READY, send PRECOMMIT n Else, ROLLBACK n Sites must ACKNOWLEDGE n Phase 3 n If at least K sites ACKNOWLEDGE, send COMMIT

77 Coordinator Failure n Three-Phase Commit prevents blocking n If coordinator fails n New coordinator is selected n Sites queried to determine status n New coordinator resumes

78 Network Partitioning n Network split creates two separate networks n Each “half” selects a coordinator n Coordinators make independent decisions n Result could be different decisions n Resolution of network problem may create need to resolve database problems

79 Concurrency Control n Single Lock Manager n Multiple Lock Managers

80 Single Lock Manager n One site for all locking n All other sites must go to it n Can read from anywhere n Updates must be to all copies n Advantages: Simple, Easy deadlock detection n Disadvantages: Bottleneck, Vulnerability

81 Simple Multiple Lock Mgrs n Each site locks a unique partition of the data n non-replicated data n Advantages: Fairly simple, reduced bottlenecks n Disadvantages: Complicated deadlock detection

82 Majority Protocol n Each site locks its own data n replication possible n Request owner for lock on data that isn’t local n When multiple owners, n/2 + 1 (majority) must provide the lock n Advantages: No bottlenecks n Disadvantages: More messages sent, Complicated deadlock detection, More deadlocks (each gets 1/2)

83 Biased Protocol n Reduced form of Majority Protocol n For a READ, only need any single lock n For a WRITE, need all locks n Advantages: No bottle necks, Reduced traffic n Disadvantages: Update traffic, Deadlocks

84 Primary Copy n Site designated to hold “primary” copy n Multiple sites n Replicated Data n All locks through that site n Advantages: Fairly simple, reduced bottlenecks n Disadvantages: Vulnerability, Complicated deadlock detection

85 Other Than Locking n Timestamps n Centralized generation n Local generation n Timestamp tests determine ability to read or write

86 Deadlocks & Distributed Data n Centralized n One Site n Distributed n Centralized - same advantages and disadvantages as other centralized control (database or locking)

87 Distributed Deadlock Detection n Each site tracks all transactions accessing its own data n Dummy transaction for transactions that originated here but are executing elsewhere n If deadlock found that includes dummy transaction n Must send deadlock information to other sites n They check for deadlock n May have to pass on to another site

88 Homework #9 n Continuuing with the Carnegie Library n Client/Server n Distrributed Database


Download ppt "Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures."

Similar presentations


Ads by Google