CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23.

CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan aidhog@gmail.com Lecture III: 2014/03/23

Lab 1.1: Mensaje New deadline: Tuesday 10am −1 (out of 10) for every day late after that

TYPES OF DISTRIBUTED SYSTEMS …

Client–Server Model Client makes request to server Server acts and responds (For example: Email, WWW, Printing, etc.)

Server Client–Server: Three-Tier Server DataLogicPresentation HTTP GET: Total salary of all employees SQL: Query salary of all employees Add all the salaries Create HTML page

Peer-to-Peer: Unstructured Pixie’s new album? (For example: Kazaa, Gnutella)

Peer-to-Peer: Structured (DHT) Circular DHT: – Only aware of neighbours – O(n) lookups Implement shortcuts – Skips ahead – Enables binary-search- like behaviour – O(log(n)) lookups 000 001 010 011 100 101 110 111 Pixie’s new album? 111

Desirable Criteria for Distributed Systems Transparency: – Appears as one machine Flexibility: – Supports more machines, more applications Reliability: – System doesn’t fail when a machine does Performance: – Quick runtimes, quick processing Scalability: – Handles more machines/data efficiently

LIMITATIONS OF DISTRIBUTED SYSTEMS: EIGHT FALLACIES

Eight Fallacies By L. Peter Deutsch (1994) – James Gosling (1997) “Essentially everyone, when they first build a distributed application, makes the following eight assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences.” — L. Peter Deutsch Each fallacy is a false statement!

What might these fallacies of distributed computing be based on our experience?

1. The network is reliable Machines fail, connections fail, firewall eats messages flexible routing retry messages acknowledgements!

2. Latency is zero M1: Store X M1 M2: Copy X from M1 M2 There are significant communication delays avoid “races” local order ≠ remote order acknowledgements minimise remote calls – batch data! avoid waiting – multiple-threads

3. Bandwidth is infinite M1: Copy X (10GB) M1 M2 Limited in amount of data that can be transferred avoid resending data direct connections caching!! M1: Copy X (10GB)

4. The network is secure M1: Send Medical History M1 Network is vulnerable to hackers, eavesdropping, viruses, etc. send sensitive data directly isolate hacked nodes – hack one node ≠ hack all nodes authenticate messages secure connections

5. Topology doesn’t change Message M5 thru M2, M3, M4 How machines are physically connected may change (“churn”)! avoid fixed routing – next-hop routing? abstract physical addresses flexible content structure M2 M3 M4 M5 M1

6. There is one administrator Different machines have different policies! Beware of firewalls! Don’t assume most recent version – Backwards compat.

7. Transport cost is zero It costs time/money to transport data: not just bandwidth (Again) minimise redundant data transfer – avoid shuffling data – caching direct connection compression?

8. The network is homogeneous Devices and connections are not uniform interoperability! – Java vs..NET? route for speed – not hops load-balancing

Eight Fallacies (to avoid) 1.The network is reliable 2.Latency is zero 3.Bandwidth is infinite 4.The network is secure 5.Topology doesn’t change 6.There is one administrator 7.Transport cost is zero 8.The network is homogeneous Severity of fallacies vary in different scenarios! Which fallacies apply/do not apply for: Gigabit ethernet LAN? BitTorrent The Web Laboratorio II

LIMITATIONS OF DISTRIBUTED COMPUTING: CAP THEOREM

But first … ACID For traditional (non-distributed) databases … 1.A tomicity: – Transactions all or nothing: fail cleanly 2.C onsistency: – Doesn’t break constraints/rules 3.I solation: – Parallel transactions act as if sequential 4.D urability – System remembers changes Have you heard of ACID guarantees in a database class?

What is CAP? Three guarantees a distributed sys. could make 1.C onsistency: – All nodes have a consistent view of the system 2.A vailability: – Every read/write is acted upon 3.P artition-tolerance: – The system works even if messages are lost

A Distributed System (Replication)

Consistency There’s 891 users in ‘M’

Availability How many users start with ‘M’ 891

Partition-Tolerance How many users start with ‘M’ 891

The CAP Question Can a distributed system guarantee consistency (all nodes have the same up-to-date view), availability (every read/write is acted upon) and partition-tolerance (the system works even if messages are lost) at the same time? What do you think? Can a distributed system guarantee consistency and availability and partition-tolerance at the same time, or not?

The CAP Answer

The CAP “Proof” How many users start with ‘M’ There’s 891 users in ‘M’ 891 There’s 892 users in ‘M’

The Cap “Proof” (in boring words) Consider machines m 1 and m 2 on either side of a partition: – If an update is allowed on m 2 (Availability), then m 1 cannot see the change: (loses Consistency) – To make sure that m 1 and m 2 have the same, up- to-date view (Consistency), neither m 1 nor m 2 can accept any requests/updates (lose Availability) – Thus, only when m 1 and m 2 can communicate (lose Partition tolerance) can Availability and Consistency be guaranteed

The CAP Theorem A distributed system cannot guarantee consistency (all nodes have the same up-to-date view), availability (every read/write is acted upon) and partition- tolerance (the system works even if messages are lost) at the same time. (“Proof” as shown on previous slide )

The CAP Triangle C AP Choose Two

CAP Systems C AP (No intersection) CA : Guarantees to give a correct response but only while network works fine (Centralised / Traditional) CP : Guarantees responses are correct even if there are network failures, but response may fail (Weak availability) AP : Always provides a “best-effort” response even in presence of network failures (Eventual consistency)

CA System How many users start with ‘M’ There’s 891 users in ‘M’ There’s 892 users in ‘M’ 892

CP System How many users start with ‘M’ There’s 891 users in ‘M’ 891

AP System How many users start with ‘M’ There’s 891 users in ‘M’ 891 There’s 892 users in ‘M’

BASE (AP) B asically A vailable – Pretty much always “up” S oft State – Replicated, cached data E ventual Consistency – Stale data tolerated, for a while Amazon, eBay, Google, DNS …

The CAP Theorem C,A in CAP ≠ C,A in ACID Simplified model – Partitions are rare – Systems may be a mix of CA/CP/AP – C/A/P often continuous in reality! But concept useful/frequently discussed: – How to handle Partitions? Availability? or Consistency?

LABS PREP: AIDAN LEARNS SPANISH

Word Count Help me learn Spanish! What are the top 500 most common words in Spanish

Help me learn Spanish! How should we design the distributed system? (for now it will be in-memory) How can we distribute the word count? How can we call the machines / send the data? How can we merge the word counts? How to implement in the lab?

Distributed Systems have limitations Eight fallacies and what they mean 1.The network is reliable 2.Latency is zero 3.Bandwidth is infinite 4.The network is secure 5.Topology doesn’t change 6.There is one administrator 7.Transport cost is zero 8.The network is homogeneous

Distributed Systems have limitations CAP Theorem A distributed system cannot guarantee consistency (all nodes have the same up-to-date view and will give a correct answer), availability (every request is acted upon) and partition-tolerance (the system works even if messages are lost) at the same time.

CAP Systems C AP (No intersection) CA : Guarantees to give a correct response but only while network works fine (Centralised / Traditional) CP : Guarantees responses are correct even if there are network failures, but response may fail (Weak availability) AP : Always provides a “best-effort” response even in presence of network failures (Eventual consistency)

Design of a Distributed Algorithm How to distribute/split data for processing Embarrassingly parallel execution How to merge data (naively for now) How to help me learn Spanish

Questions?

CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23.

Similar presentations

Presentation on theme: "CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23.

Similar presentations

Presentation on theme: "CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23."— Presentation transcript:

Similar presentations

About project

Feedback