Introduction to Distributed Systems 闫宏飞 北京大学信息科学技术学院 7/22/2010.

Slides:



Advertisements
Similar presentations
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Advertisements

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
IS333, Ch. 26: TCP Victor Norman Calvin College 1.
Networks and Distributed Systems CSE 490h This presentation contains content licensed under the Creative Commons Attribution 2.5 License.
Lecture 3 – Networks and Distributed Systems CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Transaction Processing IS698 Min Song. 2 What is a Transaction?  When an event in the real world changes the state of the enterprise, a transaction is.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Cloud Computing Lecture #1 Parallel and Distributed Computing Jimmy Lin The iSchool University of Maryland Monday, January 28, 2008 This work is licensed.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Gursharan Singh Tatla Transport Layer 16-May
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Network Topologies.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Advances in Language Design
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
1 The Google File System Reporter: You-Wei Zhang.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Module I: Introduction to MapReduce This presentation includes course content.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
1 Next Few Classes Networking basics Protection & Security.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Distributed Computing Seminar Lecture 1: Introduction to Distributed Computing & Systems Background Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet.
Introductory Distributed Systems Hongfei Yan School of EECS, Peking University 6/24/2008 Refer to Aaron Kimball’s slides.
Lecture 3 – Networks and Distributed Systems CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Reliability & Chubby CSE 490H This presentation incorporates content licensed under the Creative Commons Attribution 2.5 License.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
The Relational Model1 Transaction Processing Units of Work.
Lecture 1 – Introduction CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this presentation is licensed.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Mass Data Processing Technology on Large Scale Clusters Summer, 2007, Tsinghua University All course material (slides, labs, etc) is licensed under the.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Networks and Distributed Systems Sarah Diesburg Operating Systems COP 4610.
Lecture 1 – Introduction CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Module II: Student Background Knowledge This presentation includes course content.
18 September 2008CIS 340 # 1 Last Covered (almost)(almost) Variety of middleware mechanisms Gain? Enable n-tier architectures while not necessarily using.
Agenda  Quick Review  Finish Introduction  Java Threads.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
SYSTEMS IMPLEMENTATION TECHNIQUES TRANSACTION PROCESSING DATABASE RECOVERY DATABASE SECURITY CONCURRENCY CONTROL.
Distributed Shared Memory
Operating System Reliability
Operating System Reliability
Chapter 16: Distributed System Structures
Packet Sniffing.
Google Cluster Computing Faculty Training Workshop
Operating System Reliability
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Operating System Reliability
Lecture 18: Coherence and Synchronization
Operating System Reliability
Operating System Reliability
Transactions, Properties of Transactions
Presentation transcript:

Introduction to Distributed Systems 闫宏飞 北京大学信息科学技术学院 7/22/2010

Introduction to Distributed System Parallelization & Synchronization

Parallelization Idea (2)

Parallelization Idea (3)

Parallelization Idea (4)

Process synchronization refers to the coordination of simultaneous threads or processes to complete a task in order to get correct runtime order and avoid unexpected race conditions.race conditions Process synchronization refers to the coordination of simultaneous threads or processes to complete a task in order to get correct runtime order and avoid unexpected race conditions.race conditions

And if you thought I was joking …

What is Wrong With This? Thread 1: void foo() { x++; y = x; } Thread 2: void bar() { y++; x++; } If the initial state is y = 0, x = 6, what happens after these threads finish running?

Multithreaded = Unpredictability When we run a multithreaded program, we don ’ t know what order threads run in, nor do we know when they will interrupt one another. Thread 1: void foo() { eax = mem[x]; inc eax; mem[x] = eax; ebx = mem[x]; mem[y] = ebx; } Thread 2: void bar() { eax = mem[y]; inc eax; mem[y] = eax; eax = mem[x]; inc eax; mem[x] = eax; } Many things that look like “ one step ” operations actually take several steps under the hood:

Multithreaded = Unpredictability This applies to more than just integers: Pulling work units from a queue Reporting work back to master unit Telling another thread that it can begin the “ next phase ” of processing … All require synchronization!

Synchronization Primitives Semaphore Barriers Lock Thread 1: void foo() { sem.lock(); x++; y = x; sem.unlock(); } Thread 2: void bar() { sem.lock(); y++; x++; sem.unlock(); }

Too Much Synchronization? Deadlock Thread A: semaphore1.lock(); semaphore2.lock(); /* use data guarded by semaphores */ semaphore1.unlock(); semaphore2.unlock(); Thread B: semaphore2.lock(); semaphore1.lock(); /* use data guarded by semaphores */ semaphore1.unlock(); semaphore2.unlock(); (Image: RPI CSCI.4210 Operating Systems notes)

And if you thought I was joking …

The Moral: Be Careful! Synchronization is hard Need to consider all possible shared state Must keep locks organized and use them consistently and correctly Knowing there are bugs may be tricky; fixing them can be even worse! Keeping shared state to a minimum reduces total system complexity

Introduction to Distributed System Fundamentals of Networking

TCP UDP IP ROUTER SWITCH HTTP FTP PORT SOCKET Protocol Firewall Gateway

What makes this work? Underneath the socket layer are several more protocols Most important are TCP and IP (which are used hand-in- hand so often, they ’ re often spoken of as one protocol: TCP/IP)

TCP/IP 协议发明人 Cerf 及 Kahn 获 2004' 图灵奖 Vinton G. Cerf Robert E. Kahn

Why is This Necessary? Not actually tube-like “ underneath the hood ” Unlike phone system (circuit switched), the packet switched Internet uses many routes at once

Networking Issues If a party to a socket disconnects, how much data did they receive? Did they crash? Or did a machine in the middle? Can someone in the middle intercept/modify our data? Traffic congestion makes switch/router topology important for efficient throughput

Introduction to Distributed System Distributed Systems

Outline Models of computation Connecting distributed modules Failure & reliability

Models of Computation Instructions Single (SI)Multiple (MI) Data Multiple (MD) SISD single-threaded process MISD pipeline architecture SIMD vector processing MIMD multi-threaded processes Single (SD) Flynn’s Taxonomy

SISD D D D D D D D D D D D D D D Processor Instructions

SIMD D0D0 D0D0 Processor Instructions D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D0D0 D0D0

MIMD D D D D D D D D D D D D D D Processor Instructions D D D D D D D D D D D D D D Processor Instructions

Memory (Instructions and Data) Processor InstructionsData Interface to external world

Memory (Instructions and Data) Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world

Memory (Instructions and Data) Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Memory (Instructions and Data) Network

Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Network Processor Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Processor Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Processor Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Processor

Outline Models of computation Connecting distributed modules Failure & reliability

System Organization Having one big memory would make it a huge bottleneck Eliminates all of the parallelism

CTA: Memory is Distributed

Interconnect Networks Bottleneck in the CTA is transferring values from one local memory to another Interconnect network design very important; several options are available Design constraint: How to minimize interconnect network usage?

“ Massively parallel architectures ” start rising in prominence Message Passing Interface (MPI) and other libraries developed Bandwidth was a big problem For external interconnect networks in particular A Brief History …

A Brief History … 1995-Today Cluster/grid architecture increasingly dominant Special node machines eschewed in favor of COTS technologies Web-wide cluster software Companies like Google take this to the extreme (10,000 node clusters)

More About Interconnects Several types of interconnect possible Bus Crossbar Torus Tree

Interconnect Bus Simplest possible layout Not realistically practical Too much contention Little better than “one big memory”

Crossbar All processors have “input” and “output” lines Crossbar connects any input to any output Allows for very low contention, but lots of wires, complexity Will not scale to many nodes

Toroidal networks Nodes are connected to their logical neighbors Node-node transfer may include intermediaries Reasonable trade-off for space/scalability

Tree Switch nodes transfer data “up” or “down” the tree Hierarchical design keeps “short” transfers fast, incremental cost to longer transfers Aggregate bandwidth demands often very large at top Most natural layout for most cluster networks today

Outline Models of computation Connecting distributed modules Failure & reliability

“ A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable ” -- Leslie Lamport’s research contributions have laid the foundations of the theory of distributed systems. logical clock Byzantine failures Paxos for consensus ….

Reliability Demands Support partial failure Total system must support graceful decline in application performance rather than a full halt Data Recoverability If components fail, their workload must be picked up by still-functioning units Individual Recoverability Nodes that fail and restart must be able to rejoin the group activity without a full group restart

Reliability Demands Consistency Concurrent operations or partial internal failures should not cause externally visible nondeterminism Scalability Adding increased load to a system should not cause outright failure, but a graceful decline Increasing resources should support a proportional increase in load capacity Security The entire system should be impervious to unauthorized access Requires considering many more attack vectors than single-machine systems

Ken Arnold, CORBA designer: “ Failure is the defining difference between distributed and local programming ”

Component Failure Individual nodes simply stop

Data Failure Packets omitted by overtaxed router Or dropped by full receive-buffer in kernel Corrupt data retrieved from disk or net

Network Failure External & internal links can die Some can be routed around in ring or mesh topology Star topology may cause individual nodes to appear to halt Tree topology may cause “ split ” Messages may be sent multiple times or not at all or in corrupted form …

Timing Failure Temporal properties may be violated Lack of “ heartbeat ” message may be interpreted as component halt Clock skew between nodes may confuse version-aware data readers

Byzantine Failure Difficult-to-reason-about circumstances arise Commands sent to foreign node are not confirmed: What can we reason about the state of the system?

Malicious Failure Malicious (or maybe na ï ve) operator injects invalid or harmful commands into system

Correlated Failures Multiple CPUs/hard drives from same manufacturer lot may fail together Power outage at one data center may cause demand overload at failover center

Preparing for Failure Distributed systems must be robust to these failure conditions But there are lots of pitfalls …

The Eight Design Fallacies The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn't change. There is one administrator. Transport cost is zero. The network is homogeneous. -- Peter Deutsch and James Gosling, Sun Microsystems

Dealing With Component Failure Use heartbeats to monitor component availability “ Buddy ” or “ Parent ” node is aware of desired computation and can restart it elsewhere if needed Individual storage nodes should not be the sole owner of data Pitfall: How do you keep replicas consistent?

Dealing With Data Failure Data should be check-summed and verified at several points Never trust another machine to do your data validation! Sequence identifiers can be used to ensure commands, packets are not lost

Dealing With Network Failure Have well-defined split policy Networks should routinely self-discover topology Well-defined arbitration/leader election protocols determine authoritative components Inactive components should gracefully clean up and wait for network rejoin

Dealing With Other Failures Individual application-specific problems can be difficult to envision Make as few assumptions about foreign machines as possible Design for security at each step

TPS: Definition A system that handles transactions coming from several sources concurrently Transactions are “events that generate and modify data stored in an information system for later retrieval” * To be considered a transaction processing system the computer must pass the ACID test. *

Key Features of TPS: ACID “ACID” is the acronym for the features a TPS must support: Atomicity – A set of changes must all succeed or all fail Consistency – Changes to data must leave the data in a valid state when the full change set is applied Isolation – The effects of a transaction must not be visible until the entire transaction is complete Durability – After a transaction has been committed successfully, the state change must be permanent.

Atomicity & Durability What happens if we write half of a transaction to disk and the power goes out?

Logging: The Undo Buffer 1. Database writes to log the current values of all cells it is going to overwrite 2. Database overwrites cells with new values 3. Database marks log entry as committed If db crashes during (2), we use the log to roll back the tables to prior state

Consistency: Data Types Data entered in databases have rigorous data types associated with them, and explicit ranges Does not protect against all errors (entering a date in the past is still a valid date, etc), but eliminates tedious programmer concerns

Consistency: Foreign Keys Database designers declare that fields are indices into the keys of another table Database ensures that target key exists before allowing value in source field

Isolation Using mutual-exclusion locks, we can prevent other processes from reading data we are in the process of writing When a database is prepared to commit a set of changes, it locks any records it is going to update before making the changes

Faulty Locking Locking alone does not ensure isolation! Changes to table A are visible before changes to table B – this is not an isolated transaction

Two-Phase Locking After a transaction has released any locks, it may not acquire any new locks Effect: The lock set owned by a transaction has a “growing” phase and a “shrinking” phase

Relationship to Distributed Comp At the heart of a TPS is usually a large database server Several distributed clients may connect to this server at points in time Database may be spread across multiple servers, but must still maintain ACID

Conclusions Parallel systems evolved toward current distributed systems usage Hard to avoid failure Determine what is reasonable to plan for Keep protocols as simple as possible Be mindful of common pitfalls