CMSC Cluster Computing Basics

Slides:

Advertisements

Similar presentations

Introduction to Data Center Computing Derek Murray October 2010.

Advertisements

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.

NoSQL Databases: MongoDB vs Cassandra

Oct 1999SRDS 991 On Diffusing Updates in a Byzantine Environment Dahlia Malkhi Yishay Mansour Michael K. Reiter.

Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.

Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.

IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.

Google Distributed System and Hadoop Lakshmi Thyagarajan.

Distributed Storage System Survey

PCAP Project: Probabilistic CAP and Adaptive Key-value Stores

CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.

Distributed Computing Cloud Computing : Module 2.

1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.

© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.

High Throughput Computing on P2P Networks Carlos Pérez Miguel

Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.

Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 

CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.

Paxos A Consensus Algorithm for Fault Tolerant Replication.

From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 21: Designing.

Geo-distributed Messaging with RabbitMQ

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.

NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

CSE 486/586 Distributed Systems Consistency --- 3

Part 1. Managing replicated server groups These questions pertain to managing server groups with replication, as in e.g., Chubby, Dynamo, and the classical.

{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.

Distributed databases A brief introduction with emphasis on NoSQL databases Distributed databases1.

Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.

Look Mom! – NoSQL Charles Nurse | DotNetNuke Corp.

CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.

Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,

Cassandra The Fortune Teller

CSE 486/586 Distributed Systems Case Study: Amazon Dynamo

CS 440 Database Management Systems

CS 525 Advanced Distributed Systems Spring 2013

A free and open-source distributed NoSQL database

Trade-offs in Cloud Databases

Distributed Systems – Paxos

Lecturer : Dr. Pavle Mogin

CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.

Strong Consistency & CAP Theorem

Database Concepts.

Strong Consistency & CAP Theorem

Christian Stark and Odbayar Badamjav

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 4 Storage Prof. Zhang Gang School of.

NOSQL databases and Big Data Storage Systems

Strong Consistency & CAP Theorem

EECS 498 Introduction to Distributed Systems Fall 2017

CS 525 Advanced Distributed Systems Spring 2018

آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95

CS 440 Database Management Systems

Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.

PERSPECTIVES ON THE CAP THEOREM

IS 651: Distributed Systems Fault Tolerance

CAP Theorem and Consistency Models

Replication and Availability in Distributed Systems

Transaction Properties: ACID vs. BASE

CS639: Data Management for Data Science

Strong Consistency & CAP Theorem

Implementing Consistency -- Paxos

CSE 486/586 Distributed Systems Consistency --- 3

CSE 486/586 Distributed Systems Case Study: Amazon Dynamo

CS639: Data Management for Data Science

Sisi Duan Assistant Professor Information Systems

Presentation transcript:

CMSC 34702-1 Cluster Computing Basics Junchen Jiang The University of Chicago October 8, 2018

MapReduce: Simplified Data Processing on Large Clusters The Google File System Bigtable: A Distributed Storage System for Structured Data Cassandra - A Decentralized Structured Storage System

Consistency, Availability, Partition Tolerance x x Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y x Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y y Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y get() y y Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value x get() Availability Every request must result in a response x x Replica 1 Replica 2

Consistency, Availability, Partition Tolerance Any read must return the last written value y get() Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers y x Replica 1 Replica 2

Cassandra: Gossip-based consensus protocol Consistency Any read must return the last written value set(y) x get() Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers

Bigtable: Paxos-based consensus protocol (Chubby) Consistency Any read must return the last written value set(y) Availability Every request must result in a response Chubby Master Chubby Slave Partition Tolerance Network can lose any messages between servers Chubby Slave Chubby Slave Service is unavailable until a quorum is reached

Is it possible to achieve all three simultaneously? Cassandra Bigtable Impossible Consistency Any read must return the last written value Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers Unfortunately, No. (CAP Theorem) (Eric Brewer. https://people.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf)

This Class: Stream Processing