CMSC Cluster Computing Basics

Slides:



Advertisements
Similar presentations
Introduction to Data Center Computing Derek Murray October 2010.
Advertisements

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
NoSQL Databases: MongoDB vs Cassandra
Oct 1999SRDS 991 On Diffusing Updates in a Byzantine Environment Dahlia Malkhi Yishay Mansour Michael K. Reiter.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Distributed Storage System Survey
PCAP Project: Probabilistic CAP and Adaptive Key-value Stores
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Distributed Computing Cloud Computing : Module 2.
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Paxos A Consensus Algorithm for Fault Tolerant Replication.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 21: Designing.
Geo-distributed Messaging with RabbitMQ
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
CSE 486/586 Distributed Systems Consistency --- 3
Part 1. Managing replicated server groups These questions pertain to managing server groups with replication, as in e.g., Chubby, Dynamo, and the classical.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Distributed databases A brief introduction with emphasis on NoSQL databases Distributed databases1.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Look Mom! – NoSQL Charles Nurse | DotNetNuke Corp.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
Cassandra The Fortune Teller
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
CS 440 Database Management Systems
CS 525 Advanced Distributed Systems Spring 2013
A free and open-source distributed NoSQL database
Trade-offs in Cloud Databases
Distributed Systems – Paxos
NOSQL.
Lecturer : Dr. Pavle Mogin
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
Strong Consistency & CAP Theorem
Database Concepts.
Strong Consistency & CAP Theorem
Christian Stark and Odbayar Badamjav
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 4 Storage Prof. Zhang Gang School of.
NOSQL databases and Big Data Storage Systems
Strong Consistency & CAP Theorem
EECS 498 Introduction to Distributed Systems Fall 2017
CS 525 Advanced Distributed Systems Spring 2018
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
CS 440 Database Management Systems
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
PERSPECTIVES ON THE CAP THEOREM
IS 651: Distributed Systems Fault Tolerance
CAP Theorem and Consistency Models
Replication and Availability in Distributed Systems
Transaction Properties: ACID vs. BASE
CS639: Data Management for Data Science
Strong Consistency & CAP Theorem
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Consistency --- 3
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
CS639: Data Management for Data Science
Sisi Duan Assistant Professor Information Systems
Presentation transcript:

CMSC 34702-1 Cluster Computing Basics Junchen Jiang The University of Chicago October 8, 2018

MapReduce: Simplified Data Processing on Large Clusters  The Google File System Bigtable: A Distributed Storage System for Structured Data Cassandra - A Decentralized Structured Storage System

Consistency, Availability, Partition Tolerance x x Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y x Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y y Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y get() y y Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value x get() Availability Every request must result in a response x x Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value y get() Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers y x Replica 1 Replica 2

Cassandra: Gossip-based consensus protocol Consistency Any read must return the last written value set(y) x get() Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers

Bigtable: Paxos-based consensus protocol (Chubby) Consistency Any read must return the last written value set(y) Availability Every request must result in a response Chubby Master Chubby Slave Partition Tolerance Network can lose any messages between servers Chubby Slave Chubby Slave Service is unavailable until a quorum is reached

Is it possible to achieve all three simultaneously? Cassandra Bigtable Impossible Consistency Any read must return the last written value Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers Unfortunately, No. (CAP Theorem) (Eric Brewer. https://people.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf)

This Class: Stream Processing