Lecture 11 Distributed Databases and Cloud computing

Lecture 11 Distributed Databases and Cloud computing

Definitions Distributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications link. Decentralized Database: A collection of independent databases on non-networked computers.

Reasons for Distributed Database
Local business units want control over data. Consolidate data across local databases for integrated decision making. Reduce telecommunications costs. Reduce the risk of telecommunications failures.

Distributed Database Options
Homogeneous - Same DBMS at each node. Autonomous - Independent DBMSs. Non-autonomous - Central , coordinating DBMS. Heterogeneous - Different DBMSs at different nodes. Gateways - Simple paths are created to other databases without the benefits of one logical database.

Distributed database environments

Homogeneous, Non-Autonomous Database

Homogeneous, Non-Autonomous Database
Data is distributed across all the nodes. Same DBMS at each node. All data is managed by the distributed DBMS (no exclusively local data.) All access is through one, global schema. The global schema is the union of all the local schema.

Focus on The Following Heterogeneous Environment

Focus on The Following Heterogeneous Environment
Data distributed across all the nodes. Different DBMS may be used at each node. Local access is done using the local DBMS and schema. Remote access is done using the global schema.

Objectives and Trade-offs
Location Transparency - User does not have to know the location of the data. Local Autonomy - Local site can operate with its database when central site is down. Synchronous Distributed Database - All copies of the same data are always identical. Asynchronous Distributed Database - Some data inconsistency is tolerated.

Advantages of Distributed Database
Increased reliability and availability. Local control over data. Modular growth. Lower communication costs. Faster response for certain queries.

Disadvantages of Distributed Database
Software cost and complexity. Processing overhead. Data integrity exposure. Slower response for certain queries.

Options for Distributing a Database
Data replication. Horizontal partitioning. Vertical partitioning. Combinations of the above.

Data Replication Advantages - Reliability. Fast response.
May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals.) De-couples nodes (transactions proceed even if some nodes are down.) Reduced network traffic at prime time (if updates can be delayed.)

Data Replication Disadvantages -
Additional requirements for storage space. Additional time for update operations. Complexity and cost of updating. Integrity exposure of getting incorrect data if replicated data is not updated simultaneously. Therefore, better when used for non-volatile data.

Types of Data Replication
Snapshot Replication - Changes are periodically sent to a master site which sends an updated snapshot out to the other sites. Near Real-Time Replication - Broadcast update orders without requiring confirmation. Pull Replication - Each site controls when it wants updates.

Issues in Data Replication Use
Data timeliness. Useful if DBMS cannot reference data from more than one node. Batched updates can cause performance problems. Updates complicated with heterogeneous DBMSs or database design. Telecommunications speeds may limit mass updates.

Horizontal Partitioning
Different records of a file at different sites. Advantages - Data stored close to where it is used. Local access optimization. Security. Disadvantages Accessing data across partitions. No data replication.

Vertical Partitioning
Different columns of a file at different sites. Advantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins.

Factors in Choice of Distributed Strategy
Funding, autonomy, security. Site data referencing patterns. Growth and expansion needs. Technological capabilities. Costs of managing complex technologies. Need for reliable service.

Cloud computing Cloud computing is the latest evolution of Internet-based computing. The potential benefits of cloud computing are overwhelming. However, attaining these benefits requires that each aspect of the cloud platform support the key design principles of the cloud model. One of the core design principles is dynamic scalability, or the ability to provision and decommission servers on demand. Unfortunately, the majority of today’s database servers are incapable of satisfying this requirement.

Key Benefits of Cloud Computing:
Lower costs: All resources, including expensive networking equipment, servers, IT personnel, etc. are shared, resulting in reduced costs, especially for small to mid-sized applications and prototypes. Dynamic scalability: Most applications experience spikes in traffic. Instead of over-buying your own equipment to accommodate these spikes, many cloud services can smoothly and efficiently scale to handle these spikes with a more cost-effective pay-as-you-go model. Simplified maintenance: upgrades are rapidly deployed across the shared infrastructure, as are backups. Large scale testing: Cloud computing makes large scale prototyping and load testing much easier. You can easily spawn 1,000 servers in the cloud to load test your application and then release them as soon as you are done. Faster development: Cloud computing platforms provide many of the core services that, under traditional development models, would normally be built in house. These services, plus templates and other tools can significantly accelerate the development cycle.

Evolving Cloud Database Requirements
Cloud database usage patterns are evolving, and business adoption of these technologies accelerates that evolution. Initially, cloud databases serviced consumer applications. These early applications put a priority on read access, because the ratio of reads to writes was very high. Delivering high-performance read access was the primary purchase criteria. However, this is changing. Consumer-centric cloud database applications have been evolving with the adoption of Web 2.0 technologies. User generated content, particularly in the form of social networking, have placed somewhat more emphasis on updates. Reads still outnumber writes in terms of the ratio, but the gap is narrowing. With support for transactional business applications, this gap between database updates and reads is further shrinking. Business applications also demand that the cloud database be ACID compliant: providing Atomicity, Consistency, Isolation and Durability.

The Achilles Heel of Cloud Databases
Dynamic scalability—one of the core principles of cloud computing—has proven to be a particular problem for databases. The reason is simple; most databases use a shared-nothing architecture. The shared-nothing architecture relies on splitting (partitioning) the data into separate silos of data, one per server.

Are Replicated Tables the Answer?
Since data partitioning and cloud databases are inherently incompatible, Amazon, Facebook and Google have taken another approach to solve the cloud database challenge. They have created a persistence engine—technically not a database—that abandons typical ACID compliance in favor replicated tables of data that store and retrieve information while supporting dynamic or elastic scalability. Facebook offers BigTable, Amazon has SimpleDB and Facebook is working on Cassandra. However, they are not a replacement for a real database, and they do not address corporate cloud computing requirements.

The Shared-Disk Database Architecture is Ideal for Cloud Databases
The database architecture called shared-disk, which eliminates the need to partition data, is ideal for cloud databases. Shared-disk databases allow clusters of low-cost servers to use a single collection of data, typically served up by a Storage Area Network (SAN) or Network Attached Storage (NAS). All of the data is available to all of the servers, there is no partitioning of the data. As a result, if you are using two servers, and your query takes .5 seconds, you can dynamically add another server and the same query might now take .35 seconds. In other words, shared-disk databases support elastic scalability. The shared-disk DBMS architecture has other important advantages—in addition to elastic scalability—that make it very appealing for deployment in the cloud.

Conclusion Whether you are assembling, managing or developing on a cloud computing platform, you need a cloud-compatible database. Shared-nothing databases require data partitioning, which is structurally incompatible with dynamic scalability, a core foundation of cloud computing. The shared-disk database architecture, on the other hand, does support elastic scalability. It also supports other cloud objectives such as lower costs for hardware, maintenance, tuning and support.

Lecture 11 Distributed Databases and Cloud computing

Similar presentations

Presentation on theme: "Lecture 11 Distributed Databases and Cloud computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 11 Distributed Databases and Cloud computing

Similar presentations

Presentation on theme: "Lecture 11 Distributed Databases and Cloud computing"— Presentation transcript:

Similar presentations

About project

Feedback