CC5212-1 P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture II: 2014/03/17.

Slides:



Advertisements
Similar presentations
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture III: 2014/03/23.
Advertisements

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chapter 2 Application Layer Computer Networking: A Top Down Approach, 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April A note on the use.
Technical Architectures
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Advanced Java Class Network Programming. Network Protocols Overview Levels of Abstraction –HTTP protocol: spoken by Web Servers and Web Clients –TCP/IP:
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Based on last years lecture notes, used by Juha Takkinen.
CSc 461/561 CSc 461/561 Peer-to-Peer Streaming. CSc 461/561 Summary (1) Service Models (2) P2P challenges (3) Service Discovery (4) P2P Streaming (5)
Peer-to-Peer (or P2P) From user to user. Peer-to-peer implies that either side can initiate a session and has equal responsibility. Corey Chan Andrew Merfeld.
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
Component-Based Software Engineering Introducing the Bank Example Paul Krause.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Introduction to client/server architecture
Tiered architectures 1 to N tiers. 2 An architectural history of computing 1 tier architecture – monolithic Information Systems – Presentation / frontend,
Applied Architectures Eunyoung Hwang. Objectives How principles have been used to solve challenging problems How architecture can be used to explain and.
P2P File Sharing Systems
System Architecture & Hardware Configurations Dr. D. Bilal IS 592 Spring 2005.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 2: Distributed Systems I Aidan Hogan
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Application Layer Functionality and Protocols.
Introduction of P2P systems
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail  SMTP,
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 2 ARCHITECTURES.
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
Architectures of distributed systems Fundamental Models
PSI Peer Search Infrastructure. Introduction What are P2P Networks? The term "peer-to-peer" refers to a class of systems and applications that employ.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
1 Peer-to-Peer Systems r Application-layer architectures r Case study: BitTorrent r P2P Search and Distributed Hash Table (DHT)
Lecture 22: Client-Server Software Engineering
Liang, Introduction to Java Programming, Seventh Edition, (c) 2009 Pearson Education, Inc. All rights reserved Chapter 43 Remote Method Invocation.
FastTrack Network & Applications (KaZaA & Morpheus)
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
ICS362 – Distributed Systems Dr. Ken Cosh Week 2.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 3: Distributed Systems II Aidan Hogan
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Remote Method Invocation A Client Server Approach.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
1 RMI Russell Johnston Communications II. 2 What is RMI? Remote Method Invocation.
Malugo – a scalable peer-to-peer storage system..
System Architecture & Hardware Configurations Dr. D. Bilal IS 582 Spring 2008.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2016 Lecture 2: Distributed Systems I Aidan Hogan
Liang, Introduction to Java Programming, Fifth Edition, (c) 2005 Pearson Education, Inc. All rights reserved Chapter 29 Remote Method.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2016 Lab 2: Mensaje Aidan Hogan
System Architecture & Hardware Configurations
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 2: Introduction to Distributed Systems Aidan Hogan
System Architecture & Hardware Configurations
CHAPTER 3 Architectures for Distributed Systems
Introduction to client/server architecture
#01 Client/Server Computing
CC Procesamiento Masivo de Datos Otoño Lecture 2: Distributed Systems
Chapter 40 Remote Method Invocation
Architectures of distributed systems Fundamental Models
Architectures of distributed systems Fundamental Models
Chapter 46 Remote Method Invocation
Chapter 46 Remote Method Invocation
Architectures of distributed systems Fundamental Models
CC Procesamiento Masivo de Datos Otoño Lecture 2 Distributed Systems
#01 Client/Server Computing
Presentation transcript:

CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture II: 2014/03/17

MASSIVE DATA NEEDS DISTRIBUTED SYSTEMS …

Monolithic vs. Distributed Systems One machine that’s n times as powerful? n machines that are equally as powerful? vs.

Parallel vs. Distributed Systems Parallel System – often = shared memory Distributed System – often = shared nothing Memory Processor Memory Processor Memory Processor Memory

What is a Distributed System? “A distributed system is a system that enables a collection of independent computers to communicate in order to solve a common goal.”

What is a Distributed System? “An ideal distributed system is a system that makes a collection of independent computers look like one computer (to the user).”

Disadvantages of Distributed Systems Advantages Cost – Better performance/price Extensibility – Add another machine! Reliability – No central point of failure! Workload – Balance work automatically Sharing – Remote access to services Disadvantages Software – Need specialised programs Networking – Can be slow Maintenance – Debugging sw/hw a pain Security – Multiple users Parallelisation – Not always applicable

WHAT MAKES A GOOD DISTRIBUTED SYSTEM?

Distributed System Design Transparency: Abstract/hide: – Access: How different machines are accessed – Location: What machines have what/if they move – Concurrency: Access by several users – Failure: Keep it a secret from the user “An ideal distributed system is a system that makes a collection of independent computers look like one computer (to the user).”

Distributed System Design Flexibility: – Add/remove/move machines – Generic interfaces Reliability: – Fault-tolerant: recover from errors – Security: user authentication – Availability: uptime/total-time

Distributed System Design Performance: – Runtimes (processing) – Latency, throughput and bandwidth (data) Scalability – Network and infrastructure scales – Applications scale – Minimise global knowledge/bottlenecks!

Distributed System Design Transparency? Flexibility? Dependability? Performance? Scalability? Is the “Mensaje” system a good Distributed System? What other configurations would be better (if any)?

DISTRIBUTED SYSTEMS: CLIENT–SERVER ARCHITECTURE

Client–Server Model Client makes request to server Server acts and responds (For example: , WWW, Printing, etc.)

Client–Server: Thin Client Few computing resources for client: I/O – Server does the hard work (For example: PHP-heavy websites, SSH, , etc.)

Client–Server: Fat Client Fuller computing resources for client: I/O – Server sends data: computing done client-side (For example: Javascript-heavy websites, multimedia, etc.)

Client–Server: Mirror Servers User goes to any machine (replicated/mirror)

Client–Server: Proxy Server User goes to “forwarding” machine (proxy)

Server Client–Server: Three-Tier Server DataLogicPresentation HTTP GET: Total salary of all employees SQL: Query salary of all employees Add all the salaries Create HTML page

Client–Server: n-Tier Server Slide from Google’s Jeff Dean: Presentation Logical (multiple tiers) Data

DISTRIBUTED SYSTEMS: PEER-TO-PEER ARCHITECTURE

Peer-to-Peer (P2P) Client–Server Clients interact directly with a “central” server Peer-to-Peer Peers interact directly amongst themselves

Peer-to-Peer: Unstructured Ricky Martin’s new album? (For example: Kazaa, Gnutella)

Peer-to-Peer: Unstructured Pixie’s new album? (For example: Kazaa, Gnutella)

Peer-to-Peer: Structured (Central) In central server, each peer registers – Content – Address Peer requests content from server Peers connect directly Central point-of-failure (For example: Napster … central directory was shut down) Ricky Martin’s new album?

Peer-to-Peer: Structured (Hierarchical) Super-peers and peers

Peer-to-Peer: Structured (DHT) Distributed Hash Table (key,value) pairs key based on hash Query with key Insert with (key,value) Peer indexes key range Hash: 000 Hash: 111 (For example: Bittorrent’s Tracker)

Peer-to-Peer: Structured (DHT) Circular DHT: – Only aware of neighbours – O(n) lookups Implement shortcuts – Skips ahead – Enables binary-search- like behaviour – O(log(n)) lookups Pixie’s new album? 111

Peer-to-Peer: Structured (DHT) Handle peers leaving (churn) – Keep n successors New peers – Fill gaps – Replicate

Comparison of P2P Systems 2) Unstructured Search requires flooding (n lookups) Connections → n 2 No central point of failure Peers control their data Peers control neighbours 3) Structured Search follows structure (log(n) lookups) Connections →log(n) No central point of failure Peers assigned data Peers assigned neighbours For Peer-to-Peer, what are the benefits of (1) central directory vs. (2) unstructured, vs. (3) structured? 1) Central Directory Search follows directory (1 lookup) Connections → 1 Central point of failure Peers control their data No neighbours

P2P vs. Client–Server Client–Server Data lost in failure/deletes Search easier/faster Network often faster (to websites on backbones) Often central host – Data centralised – Remote hosts control data – Bandwidth centralised – Dictatorial – Can be taken off-line Peer-to-Peer May lose rare data (churn) Search difficult (churn) Network often slower (to conventional users) Multiple hosts – Data decentralised – Users (often) control data – Bandwidth decentralised – Democratic – Difficult to take off-line What the benefits of Peer-to-Peer vs. Client–Server?

DISTRIBUTED SYSTEMS: HYBRID EXAMPLE (BITTORRENT)

BitTorrent: Search Server BitTorrent Search (Server) “ ricky martin ” Client–Server

BitTorrent: Tracker BitTorrent Peer Tracker (DHT)

BitTorrent: File-Sharing

BitTorrent: Hybrid Uploader 1.Creates torrent file 2.Uploads torrent file 3.Announces on tracker 4.Monitors for downloaders 5.Connects to downloaders 6.Sends file parts Downloader 1.Searches torrent file 2.Downloads torrent file 3.Announces to tracker 4.Monitors for peers/seeds 5.Connects to peers/seeds 6.Sends & receives file parts Local / Client–Server / Structured P2P / Direct P2P (Torrent Search Engines target of law-suits)

DISTRIBUTED SYSTEMS: IN THE REAL WORLD

Real-World Architectures: Hybrid Often hybrid! – Architectures herein are simplified/idealised – No clear black-and-white (just good software!) – For example, BitTorrent mixes different paradigms – But good to know the paradigms

Physical Location: Cluster Computing Machines (typically) in a central, local location; e.g., a local LAN in a server room

Physical Location: Cluster Computing

Physical Location: Cloud Computing Machines (typically) in a central, remote location; e.g., a server farm like Amazon EC2

Physical Location: Cloud Computing

Physical Location: Grid Computing Machines in diverse locations

Physical Location: Grid Computing

Physical Locations Cluster computing: – Typically centralised, local Cloud computing: – Typically centralised, remote Grid computing: – Typically decentralised, remote

LABS PREP: JAVA RMI OVERVIEW

Why is Java RMI Important? We can use it to build distributed systems! We can process lots of data through it!

What is Java RMI? RMI = Remote Method Invocation Remote Procedure Call (RPC) for Java Ancestor of CORBA (in Java) Stub / Skeleton model (TCP/IP) Client Stub Network Server Skeleton

What is Java RMI? Stub (Client): – Sends request to skeleton: marshalls/serialises and transfers arguments – Demarshalls/deserialises response and ends call Skeleton (Server): – Passes call from stub onto the server implementation – Passes the response back to the stub Client Stub Network Server Skeleton

Stub/Skeleton Same Interface! ClientServer

Server Implements Skeleton Server

Registry Server Registry Server (typically) has a Registry: a Map Adds skeleton implementations with key (a string) SkelImpl1“sk1” “sk2”SkelImpl2 “sk2”SkelImpl3

Server Creates/Connects to Registry OR Server

Server Registers Skeleton Implementation As a Stub Server

Registry Client Connecting to Registry Client connects to registry (port, hostname/IP)! Retrieves skeleton/stub with key Client Network SkelImpl1“sk1” “sk2”SkelImpl2 “sk3”SkelImpl3 “sk2” SkelImpl2 Stub2

Client Connecting to Registry Client

Server Client Calls Remote Methods Client has stub, calls method, serialises arguments Server does processing Server returns answer; client deserialises result Client Network SkelImpl2Stub2 concat (“a”,”b”) “ab”

Client Calls Remote Methods Client

Java RMI: Remember … 1.Remote calls are pass-by-value, not pass-by- reference (objects not modified directly) 2.Everything passed and returned must be Serialisable ( implement Serializable ) 3.Every stub/skel method must throw a remote exception ( throws RemoteException ) 4.Server implementation can only throw RemoteException

RECAP

Topics Covered What is a (good) Distributed System? Client–Server model – Fat/thin client – Mirror/proxy servers – Three-tier Peer-to-Peer (P2P) model – Central directory – Unstructured – Structured (Hierarchical/DHT) – BitTorrent

Topics Covered Physical locations: – Cluster (local, centralised) vs. – Cloud (remote, centralised) vs. – Grid (remote, decentralised) Java RMI: – Client–Server model: Stub/Skeleton – Registry – Calling remote methods

Questions?