Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures.

Information Resources Management April 17, 2001

Agenda n Administrivia n Database Architectures

Administrivia n Homework #8

Database Architectures n Centralized n Client-Server n Parallel - single site n Distributed - multiple sites

Database Architectures Centralized (Parallel) Distributed Client-Server Function Data

Centralized n PC, Mini, or Mainframe n Single Database n Single Database Manager n One or More Users n Data and Function in One Place

Client-Server n PCs to Mainframes to Minis n PC to PC n Mainframe to Mainframe n Use Desktop Processing Power n Better User Interface n Greater Functionality n Retain Centralized Control of Data

Client-Server: Basic Model ServerClient Request Result

Servers n Supercomputer n Mainframe n Mini n PC Server n All retain all data

Client-Server Architecture Data Function Server (Back-End) Client (Front-End) Thin Client Fat Client

Functionality n Presentation n I/O Processing n Validation n Business Rules n Application Logic n Data Management n Validation n Error Handling

“Thin” Client n Presentation Services Only n Accept Input n Format Output n Display n Server does all processing

“Fat” Client n Presentation n Validation n Application Logic - Programs n Data Management n Send SQL to Server n Server is just DBMS

“In Between” Client n Client n Presentation n Some Application Logic n Server n Some Applicaton Logic n Data Management and Services

Benefits of Client-Server n Use Local Processing Power n Better User Interface n Some Functionality if System Down n Use Sunk Costs of PCs n Support Reengineering n Support Intranets n Flexibility, Scalability, Customizeability

Challenges of Client-Server n Cost of (Upgraded) PCs n Network Reliance n Distributing Application Updates n Management of Complex System n Problem Identification & Resolution n Application Partitioning

Other Client-Server Architectures n Traditional is Two-Tiered (client-server) n Three-Tiered n Client-Application Server-DB Server n (PC - Mini - Mainframe) n (PC - PC Server - Mainframe) n Beyond Three n PC - PC Server - Web Server - Mini - Mainframe

Client-Server vs. Distributed n Client-Server: Application Distribution n Distributed: Data Distribution Often, “client-server” is used to refer to either application distribution or data distribution or both.

Middleware n What if n Multiple databases (sources) need to be accessed from a single client? n Different kinds of clients? n Mix of clients and servers? n Want to take advantage of existing base of applications (legacy systems)?

Middleware n Fat Clients just send SQL transactions n Other types of transactions may be needed based on the server (system)

Middleware Software that shields applications from the complexity of the operating environment. Client Middleware System (Legacy) System (Legacy)

Types of Middleware n Transaction Process (TP) Monitor n Database Middleware n Remote Procedure Call (RPC) n Message-Oriented Middleware (MOM) n Object-Request Brokers n (CORBA - ORB)

TP Monitor n Synchronous - sender must wait n Queuing n Message Delivery n Insured Delivery n Either Direction

Database Middleware n Variety of Clients/Platforms n Variety of Servers/DBMSs/Platforms n Specific to DB transactions (SQL)

Message-Oriented Middleware (MOM) n Asynchronous - clients do not wait n Queues & Queue Management/Recovery n Message Delivery n Insured Delivery n Either Direction (like email or EDI only transactions)

Advantages of Middleware n Leverage sunk costs (legacy systems) n Reduce development cost n Reduce development time n Increase responsiveness n Improve overall systems management n Consolidate diffuse information

Challenges of Middleware n Cost n Session management - Transaction state n Security n Network reliance n Diversity of systems - lack of standards n Constant technology change n Availability of talent n Middleware Management

Parallel and Distributed n Client-Server is an attempt to improve performance n Reduce time to execute a transaction n Parallel n Reduce time to get the data n Distributed

Parallel Systems n Single site for data n Very Large databases n Operations performed simultaneously

Parallel Database Architecures n Shared Memory n Shared Disk n Shared Nothing n Hierarchical

Shared Memory P P P M

n Advantages n Extremely efficient communications n Disadvantages n Max of 32/64 processors n Bus becomes bottleneck

Shared Disk P P P M M M

n Advantages n No bus bottleneck n Fault tolerance provided n Disadvantages n Disk access becomes bottleneck

Shared Nothing P P P M M M

n Advantages n No disk bottleneck n Highly scaleable n Disadvantages n High communication overhead/cost n Between processors n To another processor’s data

Hierarchical P P P P P M M M

Hierarchical n Advantages n Best of all worlds n Disadvantages n Worst of all worlds n Some high communcation overhead/cost n Between subsystems n Complexity

Distributed Databases n Client-Server - distribute functionality n What about distributing data?

Distributed Databases n Overview n Distributed Storage n Distributed Queries n Distributed Transactions n Multidatabase (Middleware)

Distributed Databases n Multiple locations n Single logical database n Several physical databases n Network connections

Advantages n Sharing across locations n Local control n Availability

Challenges n Development costs n People & Equipment n Testing n Problem identification & resolution n Technical expertise n Network dependence n Increased processing overhead

Distributed Data Storage n Replication n Fragmentation n Both

Replication n Data is repeated n Spectrum of options available n Temporary replication of specific rows n Replicate infrequently changed data n Replicate by site n Central site - all / each local site - their data only n Full replication n Everything everywhere

Concerns with Replication n Availability needed n Amount of parallelism in reads n Overhead of updates n Keeping replicas updated n Conflicting updates

Fragmentation n Partitioning n Divide data into subsets based on need n Have to be able to pull back together to get original tables

Fragmentation n Horizontal n by rows n specified conditions n Vertical n by column n each requires primary key (or created key) n Mixed n by row and column

Fragmentation & Replication n Repeat as necessary: n Replicate fragments n Fragment replicas n Don’t lose track of what you have and where it is!

Network Transparency n Distributing data should not require that the user know where or how it’s been distributed. n The database should be seen as a single entity no matter how fragmented and replicated it becomes.

Network Transparency n Some DBMSs are starting to provide this level of functionality so transparency exists even at the program level, but in many cases this “transparency” must be programmed into the applications. n It must always be designed into the database.

Distributed Queries n How do you query data that is everywhere?

Effeciency vs. Overhead n Splitting the query apart n Keeping track of the data/locations n Making sure everything gets executed n Putting the results back together n Generating network traffic n Handling partial results

Distributed Queries n Full replication can avoid the overhead n Huge increase in update overhead n Parallel execution no longer possible n Additional costs of replication

Example n 5 sites - NY, Pgh, Chicago, Dallas, Los Angeles n Data fragmented by site - no replication n Query (in Pgh): SELECT Name, Max (Salary) from Employee

Option 1 - High Bandwidth 1. Have all sites send their full employee tables to Pgh. 2. Build a temporary employee table. 3. Run the query against this table.

Option 2 - Not so High Bandwidth 1. Examine the query and determine it can be run separately at each location and the results combined. 2. Submit just the query to each location. 3. Wait for the results from each city. 4. As results return, build a temporary table (5 rows only). 5. Find the max using the temporary table.

Distributed Transactions n Transaction Types n Coordinators n Commit Protocols n Concurrency Controls n Deadlocks

Transaction Types n Local - transaction only needs local data n Global - transaction uses non-local data n My global becomes someone else’s local n Either type of transaction must still have ACID properties - global is the concern

System Structure n Things to do: 1. Process local transactions (transaction manager) 2. Process and track global transactions (transaction coordinator)

Global Processing 1. Recognize as global 2. Break up transaction 3. Distribute pieces 4. Assemble results 5. Coordinate termination 6. Handle problems

Coordinator of Coordinators n Coordinate among sites n Detect problems n Attempt to fix n Share status with others

Coordinator Failure n Backup Coordinator n receives all messages - maintains state n monitors coordinator n automatically takes over if coordinator down n avoids delays - increases overhead n Election n highest pre-assigned number

Commit Protocols n Two-Phase n Three-Phase n All sites must commit or all sites have to rollback n Replicated data only

Two-Phase Commit n Phase 1 n Send PREPARE to all sites n Sites respond READY or ABORT n Phase 2 n If all sites READY, n COMMIT locally - Send COMMITs n If not READY or time expires n ROLLBACK locally - Send ROLLBACK

Two-Phase Commit Coordinator Site Site requests commit

Two-Phase Commit - Phase 1 Coordinator Site Send PREPARE - all sites

Two-Phase Commit - Phase 1 Coordinator Site Sites respond READY

Two-Phase Commit - Phase 2 Coordinator Site COMMIT locally

Two-Phase Commit - Phase 2 Coordinator Site Send COMMIT - all sites

Two-Phase Commit - Phase 1 Coordinator Site Site responds ABORT or does not respond

Two-Phase Commit - Phase 2 Coordinator Site ROLLBACK locally

Two-Phase Commit - Phase 2 Coordinator Site Send ROLLBACK - all sites

Site Failure - Recovery n COMMIT and ROLLBACK as normal n If READY only n Check with coordinator or other sites n Either COMMIT or ROLLBACK n If no one found, ROLLBACK

Coordinator Failure n Ask the sites n If one has COMMIT, then REDO n If one has ROLLBACK, then UNDO n If one doesn’t have READY, UNDO n If all READY only n Coordinator must decide n Sites must wait and locks are held n “Blocking” occurs

Three-Phase Commit n Phase 1 n Sent PREPARE n Sites respond READY or ABORT n Phase 2 n If all sites READY, send PRECOMMIT n Else, ROLLBACK n Sites must ACKNOWLEDGE n Phase 3 n If at least K sites ACKNOWLEDGE, send COMMIT

Coordinator Failure n Three-Phase Commit prevents blocking n If coordinator fails n New coordinator is selected n Sites queried to determine status n New coordinator resumes

Network Partitioning n Network split creates two separate networks n Each “half” selects a coordinator n Coordinators make independent decisions n Result could be different decisions n Resolution of network problem may create need to resolve database problems

Concurrency Control n Single Lock Manager n Multiple Lock Managers

Single Lock Manager n One site for all locking n All other sites must go to it n Can read from anywhere n Updates must be to all copies n Advantages: Simple, Easy deadlock detection n Disadvantages: Bottleneck, Vulnerability

Simple Multiple Lock Mgrs n Each site locks a unique partition of the data n non-replicated data n Advantages: Fairly simple, reduced bottlenecks n Disadvantages: Complicated deadlock detection

Majority Protocol n Each site locks its own data n replication possible n Request owner for lock on data that isn’t local n When multiple owners, n/2 + 1 (majority) must provide the lock n Advantages: No bottlenecks n Disadvantages: More messages sent, Complicated deadlock detection, More deadlocks (each gets 1/2)

Biased Protocol n Reduced form of Majority Protocol n For a READ, only need any single lock n For a WRITE, need all locks n Advantages: No bottle necks, Reduced traffic n Disadvantages: Update traffic, Deadlocks

Primary Copy n Site designated to hold “primary” copy n Multiple sites n Replicated Data n All locks through that site n Advantages: Fairly simple, reduced bottlenecks n Disadvantages: Vulnerability, Complicated deadlock detection

Other Than Locking n Timestamps n Centralized generation n Local generation n Timestamp tests determine ability to read or write

Deadlocks & Distributed Data n Centralized n One Site n Distributed n Centralized - same advantages and disadvantages as other centralized control (database or locking)

Distributed Deadlock Detection n Each site tracks all transactions accessing its own data n Dummy transaction for transactions that originated here but are executing elsewhere n If deadlock found that includes dummy transaction n Must send deadlock information to other sites n They check for deadlock n May have to pass on to another site

Homework #9 n Continuuing with the Carnegie Library n Client/Server n Distrributed Database

Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures.

Similar presentations

Presentation on theme: "Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures.

Similar presentations

Presentation on theme: "Information Resources Management April 17, 2001. Agenda n Administrivia n Database Architectures."— Presentation transcript:

Similar presentations

About project

Feedback