Presentation is loading. Please wait.

Presentation is loading. Please wait.

CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23.

Similar presentations


Presentation on theme: "CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23."— Presentation transcript:

1 CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan aidhog@gmail.com Lecture III: 2014/03/23

2 Lab 1.1: Mensaje New deadline: Tuesday 10am −1 (out of 10) for every day late after that

3 TYPES OF DISTRIBUTED SYSTEMS …

4 Client–Server Model Client makes request to server Server acts and responds (For example: Email, WWW, Printing, etc.)

5 Server Client–Server: Three-Tier Server DataLogicPresentation HTTP GET: Total salary of all employees SQL: Query salary of all employees Add all the salaries Create HTML page

6 Peer-to-Peer: Unstructured Pixie’s new album? (For example: Kazaa, Gnutella)

7 Peer-to-Peer: Structured (DHT) Circular DHT: – Only aware of neighbours – O(n) lookups Implement shortcuts – Skips ahead – Enables binary-search- like behaviour – O(log(n)) lookups 000 001 010 011 100 101 110 111 Pixie’s new album? 111

8 Desirable Criteria for Distributed Systems Transparency: – Appears as one machine Flexibility: – Supports more machines, more applications Reliability: – System doesn’t fail when a machine does Performance: – Quick runtimes, quick processing Scalability: – Handles more machines/data efficiently

9 LIMITATIONS OF DISTRIBUTED SYSTEMS: EIGHT FALLACIES

10 Eight Fallacies By L. Peter Deutsch (1994) – James Gosling (1997) “Essentially everyone, when they first build a distributed application, makes the following eight assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences.” — L. Peter Deutsch Each fallacy is a false statement!

11 What might these fallacies of distributed computing be based on our experience?

12 1. The network is reliable Machines fail, connections fail, firewall eats messages flexible routing retry messages acknowledgements!

13 2. Latency is zero M1: Store X M1 M2: Copy X from M1 M2 There are significant communication delays avoid “races” local order ≠ remote order acknowledgements minimise remote calls – batch data! avoid waiting – multiple-threads

14 3. Bandwidth is infinite M1: Copy X (10GB) M1 M2 Limited in amount of data that can be transferred avoid resending data direct connections caching!! M1: Copy X (10GB)

15 4. The network is secure M1: Send Medical History M1 Network is vulnerable to hackers, eavesdropping, viruses, etc. send sensitive data directly isolate hacked nodes – hack one node ≠ hack all nodes authenticate messages secure connections

16 5. Topology doesn’t change Message M5 thru M2, M3, M4 How machines are physically connected may change (“churn”)! avoid fixed routing – next-hop routing? abstract physical addresses flexible content structure M2 M3 M4 M5 M1

17 6. There is one administrator Different machines have different policies! Beware of firewalls! Don’t assume most recent version – Backwards compat.

18 7. Transport cost is zero It costs time/money to transport data: not just bandwidth (Again) minimise redundant data transfer – avoid shuffling data – caching direct connection compression?

19 8. The network is homogeneous Devices and connections are not uniform interoperability! – Java vs..NET? route for speed – not hops load-balancing

20 Eight Fallacies (to avoid) 1.The network is reliable 2.Latency is zero 3.Bandwidth is infinite 4.The network is secure 5.Topology doesn’t change 6.There is one administrator 7.Transport cost is zero 8.The network is homogeneous Severity of fallacies vary in different scenarios! Which fallacies apply/do not apply for: Gigabit ethernet LAN? BitTorrent The Web Laboratorio II

21 LIMITATIONS OF DISTRIBUTED COMPUTING: CAP THEOREM

22 But first … ACID For traditional (non-distributed) databases … 1.A tomicity: – Transactions all or nothing: fail cleanly 2.C onsistency: – Doesn’t break constraints/rules 3.I solation: – Parallel transactions act as if sequential 4.D urability – System remembers changes Have you heard of ACID guarantees in a database class?

23 What is CAP? Three guarantees a distributed sys. could make 1.C onsistency: – All nodes have a consistent view of the system 2.A vailability: – Every read/write is acted upon 3.P artition-tolerance: – The system works even if messages are lost

24 A Distributed System (Replication)

25 Consistency There’s 891 users in ‘M’

26 Availability How many users start with ‘M’ 891

27 Partition-Tolerance How many users start with ‘M’ 891

28 The CAP Question Can a distributed system guarantee consistency (all nodes have the same up-to-date view), availability (every read/write is acted upon) and partition-tolerance (the system works even if messages are lost) at the same time? What do you think? Can a distributed system guarantee consistency and availability and partition-tolerance at the same time, or not?

29 The CAP Answer

30 The CAP “Proof” How many users start with ‘M’ There’s 891 users in ‘M’ 891 There’s 892 users in ‘M’

31 The Cap “Proof” (in boring words) Consider machines m 1 and m 2 on either side of a partition: – If an update is allowed on m 2 (Availability), then m 1 cannot see the change: (loses Consistency) – To make sure that m 1 and m 2 have the same, up- to-date view (Consistency), neither m 1 nor m 2 can accept any requests/updates (lose Availability) – Thus, only when m 1 and m 2 can communicate (lose Partition tolerance) can Availability and Consistency be guaranteed

32 The CAP Theorem A distributed system cannot guarantee consistency (all nodes have the same up-to-date view), availability (every read/write is acted upon) and partition- tolerance (the system works even if messages are lost) at the same time. (“Proof” as shown on previous slide )

33 The CAP Triangle C AP Choose Two

34 CAP Systems C AP (No intersection) CA : Guarantees to give a correct response but only while network works fine (Centralised / Traditional) CP : Guarantees responses are correct even if there are network failures, but response may fail (Weak availability) AP : Always provides a “best-effort” response even in presence of network failures (Eventual consistency)

35 CA System How many users start with ‘M’ There’s 891 users in ‘M’ There’s 892 users in ‘M’ 892

36 CP System How many users start with ‘M’ There’s 891 users in ‘M’ 891

37 AP System How many users start with ‘M’ There’s 891 users in ‘M’ 891 There’s 892 users in ‘M’

38 BASE (AP) B asically A vailable – Pretty much always “up” S oft State – Replicated, cached data E ventual Consistency – Stale data tolerated, for a while Amazon, eBay, Google, DNS …

39 The CAP Theorem C,A in CAP ≠ C,A in ACID Simplified model – Partitions are rare – Systems may be a mix of CA/CP/AP – C/A/P often continuous in reality! But concept useful/frequently discussed: – How to handle Partitions? Availability? or Consistency?

40 LABS PREP: AIDAN LEARNS SPANISH

41 Word Count Help me learn Spanish! What are the top 500 most common words in Spanish

42 Help me learn Spanish! How should we design the distributed system? (for now it will be in-memory) How can we distribute the word count? How can we call the machines / send the data? How can we merge the word counts? How to implement in the lab?

43 RECAP

44 Distributed Systems have limitations Eight fallacies and what they mean 1.The network is reliable 2.Latency is zero 3.Bandwidth is infinite 4.The network is secure 5.Topology doesn’t change 6.There is one administrator 7.Transport cost is zero 8.The network is homogeneous

45 Distributed Systems have limitations CAP Theorem A distributed system cannot guarantee consistency (all nodes have the same up-to-date view and will give a correct answer), availability (every request is acted upon) and partition-tolerance (the system works even if messages are lost) at the same time.

46 CAP Systems C AP (No intersection) CA : Guarantees to give a correct response but only while network works fine (Centralised / Traditional) CP : Guarantees responses are correct even if there are network failures, but response may fail (Weak availability) AP : Always provides a “best-effort” response even in presence of network failures (Eventual consistency)

47 Design of a Distributed Algorithm How to distribute/split data for processing Embarrassingly parallel execution How to merge data (naively for now) How to help me learn Spanish

48 Questions?


Download ppt "CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23."

Similar presentations


Ads by Google