2 Phase Commit Protocol In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment.

Slides:

Advertisements

Similar presentations

What is a Database By: Cristian Dubon.

Advertisements

Distributed databases

Chapter 13 (Web): Distributed Databases

Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.

Overview Distributed vs. decentralized Why distributed databases

Chapter 12 Distributed Database Management Systems

NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.

Distributed Databases

12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.

Database Design – Lecture 16

Modern Databases NoSQL and NewSQL Willem Visser RW334.

NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University

NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.

Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.

Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.

1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.

INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.

Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.

CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak

Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.

Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:

Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,

SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.

Neo4j: GRAPH DATABASE 27 March, 2017

CPSC-310 Database Systems

CS 405G: Introduction to Database Systems

NO SQL for SQL DBA Dilip Nayak & Dan Hess.

NoSQL: Graph Databases

and Big Data Storage Systems

Business System Development

Cloud Computing and Architecuture

CS122B: Projects in Databases and Web Applications Winter 2017

Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.

Database Management System

MongoDB Er. Shiva K. Shrestha ME Computer, NCIT

Every Good Graph Starts With

Chapter 9 Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition

Modern Databases NoSQL and NewSQL

Distribution and components

CMPE 280 Web UI Design and Development October 17 Class Meeting

Database System Implementation CSE 507

NOSQL databases and Big Data Storage Systems

Chapter 15 QUERY EXECUTION.

Massively Parallel Cloud Data Storage Systems

1 Demand of your DB is changing Presented By: Ashwani Kumar

MANAGING DATA RESOURCES

NOSQL and CAP Theorem.

NoSQL Databases An Overview

Database management concepts

Teaching slides Chapter 8.

Intro to NoSQL Databases

NoSQL Databases Antonino Virgillito.

Overview of big data tools

Intro to NoSQL Databases

Database management concepts

relational thoughts on NoSql

Chapter 8 Advanced SQL.

Database Management Systems

Database Systems: Design, Implementation, and Management Tenth Edition

CMPE 280 Web UI Design and Development March 14 Class Meeting

NoSQL databases An introduction and comparison between Mongodb and Mysql document store.

Course Instructor: Supriya Gupta Asstt. Prof

Intro to NoSQL Databases

Working with GEOLocation Data

INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.

Presentation transcript:

2 Phase Commit Protocol In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction (it is a specialized type of consensus protocol). The protocol achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely utilized. To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol.

Phases of 2 Phase Commit Protocol In a "normal execution" of any single distributed transaction, i.e., when no failure occurs, which is typically the most frequent situation, the protocol consists of two phases: 1. The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and 2. The commit phase, in which, based on voting of the cohorts, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the cohorts. The cohorts then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable).

Query transformation Distributed query optimization is an Oracle feature that reduces the amount of data transfer required between sites when a transaction retrieves data from remote tables referenced in a distributed SQL statement. Distributed query optimization uses Oracle's cost-based optimization to find or generate SQL expressions that extract only the necessary data from remote tables, process that data at a remote site or sometimes at the local site, and send the results to the local site for final processing. This operation reduces the amount of required data transfer when compared to the time it takes to transfer all the table data to the local site for processing. Using various cost-based optimizer hints such as DRIVING_SITE, NO_MERGE, and INDEX, you can control where Oracle processes the data and how it accesses the data.

Join Processing The join operation is one of the fundamental relational database query operations. It facilitates the retrieval of information from two different relations based on a CartesIan product of the two relations. The Join is one of the most difficult operations to implement efficiently, as no predefine links between relations are required to exist (as they are with network and hierarchical systems), The join is the only relational algebra operation that allows the combining of related tuples from relations on different attribute schemes. Since it is executed frequently and IS expensive, much research effort has been applied to the optimization of join processing.

Semi Join A “semi-join” between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs. Suppose you have the DEPT and EMP tables in the SCOTT schema and you want a list of departments with at least one employee. You could write the query with a conventional join: SELECT D.deptno, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno ORDER BY D.deptno;

Semi Join Unfortunately, if a department has 400 employees then that department will appear in the query output 400 times. You could eliminate the duplicate rows by using the DISTINCT keyword, but you would be making Oracle do more work than necessary. Really what you want to do is specify a semi-join between the DEPT and EMP tables instead of a conventional join: SELECT D.deptno, D.dname FROM dept D WHERE EXISTS ( SELECT 1 FROM emp E WHERE E.deptno = D.deptno ) ORDER BY D.deptno;

Semi join The above query will list the departments that have at least one employee. Whether a department has one employee or 100, the department will appear only once in the query output. Moreover, Oracle will move on to the next department as soon as it finds the first employee in a department, instead of finding all of the employees in each department

NOSQL NoSQL encompasses a wide variety of different database technologies that were developed in response to a rise in the volume of data stored about users, objects and products, the frequency in which this data is accessed, and performance and processing needs. Relational databases, on the other hand, were not designed to cope with the scale and agility challenges that face modern applications, nor were they built to take advantage of the cheap storage and processing power available today.

Data types of NOSQL • Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. • Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB. • Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak and Voldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality. • Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

Benefits of NOSQL When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address: • Large volumes of structured, semi-structured, and unstructured data • Agile sprints, quick iteration, and frequent code pushes • Object-oriented programming that is easy to use and flexible • Efficient, scale-out architecture instead of expensive, monolithic architecture

SQL vs NOSQL SQL databases are primarily called as Relational Databases (RDBMS); whereas NoSQL database are primarily called as non-relational or distributed database. SQL databases are table based databases whereas NoSQL databases are document based, key-value pairs, graph databases or wide-column stores. This means that SQL databases represent data in form of tables which consists of n number of rows of data whereas NoSQL databases are the collection of key-value pair, documents, graph databases or wide-column stores which do not have standard schema definitions which it needs to adhered to. SQL databases have predefined schema whereas NoSQL databases have dynamic schema for unstructured data. SQL databases are vertically scalable whereas the NoSQL databases are horizontally scalable. SQL databases are scaled by increasing the horse-power of the hardware. NoSQL databases are scaled by increasing the databases servers in the pool of resources to reduce the load.

SQL vs NOSQL SQL databases uses SQL ( structured query language ) for defining and manipulating the data, which is very powerful. In NoSQL database, queries are focused on collection of documents. Sometimes it is also called as UnQL (Unstructured Query Language). The syntax of using UnQL varies from database to database. SQL database examples: MySql, Oracle, Sqlite, Postgres and MS-SQL. NoSQL database examples: MongoDB, BigTable, Redis, RavenDb, Cassandra, Hbase, Neo4j and CouchDb For complex queries: SQL databases are good fit for the complex query intensive environment whereas NoSQL databases are not good fit for complex queries. On a high-level, NoSQL don’t have standard interfaces to perform complex queries, and the queries themselves in NoSQL are not as powerful as SQL query language. For the type of data to be stored: SQL databases are not best fit for hierarchical data storage. But, NoSQL database fits better for the hierarchical data storage as it follows the key-value pair way of storing data similar to JSON data. NoSQL database are highly preferred for large data set (i.e for big data). Hbase is an example for this purpose. For scalability: In most typical situations, SQL databases are vertically scalable. You can manage increasing load by increasing the CPU, RAM, SSD, etc, on a single server. On the other hand, NoSQL databases are horizontally scalable. You can just add few more servers easily in your NoSQL database infrastructure to handle the large traffic. For high transactional based application: SQL databases are best fit for heavy duty transactional type applications, as it is more stable and promises the atomicity as well as integrity of the data. While you can use NoSQL for transactions purpose, it is still not comparable and sable enough in high load and for complex transactional applications.