Distributed systems [Fall 2009] G22.3033-001 Lec 1: Course Introduction & Lab Intro.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Multiple Processor Systems
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Introduction Dr. Ying Lu CSCE455/855 Distributed Operating Systems.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Lecture 8 Epidemic communication, Server implementation.
Distributed (storage) systems G Lec 1: Course Introduction & Lab Intro.
DISTRIBUTED COMPUTING
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Module 12: Designing an AD LDS Implementation. AD LDS Usage AD LDS is most commonly used as a solution to the following requirements: Providing an LDAP-based.
Networked File System CS Introduction to Operating Systems.
Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / st October, 2007.
Module 12: Designing High Availability in Windows Server ® 2008.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Distributed File Systems
Distributed systems [Fall 2014] G Lec 1: Course Introduction.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Operating Systems Lecture 1 Jinyang Li. Class goals Understand how an OS works by studying its: –Design principles –Implementation realities Gain some.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.
 Course Overview Distributed Systems IT332. Course Description  The course introduces the main principles underlying distributed systems: processes,
Unit 9: Distributing Computing & Networking Kaplan University 1.
Distributed systems [Fall 2015] G Lec 1: Course Introduction.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Introduction to CS739: Distribution Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Distributed systems [Fall 2012] G Lec 1: Course Introduction & Lab Intro.
Distributed Systems and Security: An Introduction Brad Karp and Steve Hailes UCL Computer Science CS Z03 / nd October, 2006.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
COP 4600 Operating Systems Fall 2010 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:30-4:30 PM.
COT 4600 Operating Systems Fall 2010 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:30-4:30 PM.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
COMPSCI 110 Operating Systems
Distributed Operating Systems Spring 2004
CMSC 621: Advanced Operating Systems Advanced Operating Systems
Distributed systems [Fall 2010] G
Distributed Operating Systems
Maximum Availability Architecture Enterprise Technology Centre.
Advanced Operating Systems
Distributed systems [Fall 2016] G Jinyang Li
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Chapter 15: File System Internals
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Presentation transcript:

Distributed systems [Fall 2009] G Lec 1: Course Introduction & Lab Intro

Know your staff Instructor: Prof. Jinyang Li (me) –Office Hour: Tue 5-6pm (715 Bway Rm 708) TA: Bonan Min –Office Hour: Tue 3-4pm (715 Bway Rm 705)

Important addresses Class webpage: –Check regularly for announcements, reading questions Sign up for class mailing list –We will announcements using this list –You can also the entire class for questions, share information, find project member etc. Staff mailing list includes just me and Bonan

This class will teach you … Basic tools of distributed systems –Abstractions, algorithms, implementation techniques –System designs that worked Build a real system! –Synthesize ideas from many areas to build working systems Your (and my) goal: address new (unsolved) system challenges

Who should take this class? Pre-requisite: –Undergrad OS –Programming experience in C or C++ If you are not sure, do Lab1 asap.

Course readings No official textbook Lectures are based on (mostly) research papers –Check webpage for schedules Useful reference books –Principles of Computer System Design. (Saltzer and Kaashoek) –Distributed Systems (Tanenbaum and Steen) –Advanced Programming in the UNIX environment (Stevens) –UNIX Network Programming (Stevens)

Course structure Lectures –Read assigned papers before class –Answer reading questions, hand-in answers in class –Participate in class discussion 8 programming Labs –Build a networked file system with detailed guidance! Project (workload is equivalent to 2 labs) –Extend the lab file system in any way you like!

How are you evaluated? Class participation 10% –Participate in discussions, hand in answers Labs 45% Project 15% –In teams of 2 people –Demo in last class –Short paper (<=4 pages) Quizzes 30% –mid-term and final (90 minutes each)

Questions?

What are distributed systems? Examples? Multiple hosts A network cloud Hosts cooperate to provide a unified service

Why distributed systems? for ease-of-use Handle geographic separation Provide users (or applications) with location transparency: –Web: access information with a few “clicks” –Network file system: access files on remote servers as if they are on a local disk, share files among multiple computers

Why distributed systems? for availability Build a reliable system out of unreliable parts –Hardware can fail: power outage, disk failures, memory corruption, network switch failures… –Software can fail: bugs, mis-configuration, upgrade … –To achieve availability, replicate data/computation on many hosts with automatic failover

Why distributed systems? for scalable capacity Aggregate resources of many computers –CPU: Dryad, MapReduce, Grid computing –Bandwidth: Akamai CDN, BitTorrent –Disk: Frangipani, Google file system

Why distributed systems? for modular functionality Only need to build a service to accomplish a single task well. –Authentication server –Backup server.

Challenges System design –What is the right interface or abstraction? –How to partition functions for scalability? Consistency –How to share data consistently among multiple readers/writers? Fault Tolerance –How to keep system available despite node or network failures?

Challenges (continued) Different deployment scenarios –Clusters –Wide area distribution –Sensor networks Security –How to authenticate clients or servers? –How to defend against or audit misbehaving servers? Implementation –How to maximize concurrency? –What’s the bottleneck? –How to reduce load on the bottleneck resource?

A word of warning A distributed system is a system in which I can’t do my work because some computer that I’ve never even heard of has failed.” -- Leslie Lamport

Topics in this course

Case Study: Distributed file system Server(s) Client 1 Client 2Client 3 A distributed file system provides: location transparent file accesses sharing among multiple clients $ echo “test” > f2 $ ls /dfs f1 f2 $ ls /dfs f1 f2 $ cat f2 test

A simple distributed FS design A single server stores all data and handles clients’ FS requests. Client 1 Client 2Client 3

Topic: System Design What is the right interface? –possible interfaces of a storage system Disk File system Database Can the system handle a large user population? –E.g. all NYU students and faculty How to store peta-bytes of data? –Has to use more than 1 server –Idea: partition users across different servers

Topic: Consistency When C1 moves file f1 from /d1 to /d2, do other clients see intermediate results? What if both C1 and C2 want to move f1 to different places? To reduce network load, cache data at C1 –If C1 updates f1 to f1’, how to ensure C2 reads f1’ instead of f1?

Topic: Fault Tolerance How to keep the system running when some file server is down? –Replicate data at multiple servers How to update replicated data? How to fail-over among replicas? How to maintain consistency across reboots?

Topic: Security Adversary can manipulate messages –How to authenticate? Adversary may compromise machines –Can the FS remain correct despite a few compromised nodes? –How to audit for past compromises? Which parts of the system to trust? –System admins? Physical hardware? OS? Your software?

Topic: Implementation The file server should serve multiple clients concurrently –Keep (multiple) CPU(s) and network busy while waiting for disk Concurrency challenge in software: –Server threads need to modify shared state: Avoid race conditions –One thread’s request may dependent on another Avoid deadlock and livelock

Intro to programming Lab: Yet Another File System (yfs)

YFS is inspired by Frangipani Frangipani goals: –Aggregate many disks from many servers –Incrementally scalable –Automatic load balancing –Tolerates and recovers from node, network, disk failures

Frangipani Design Petal virtual disk Petal virtual disk lock server lock server Frangipani File server Frangipani File server Frangipani File server Client machines server machines

Frangipani Design Frangipani File server Petal virtual disk aggregate disks into one big virtual disk interface: put(addr, data), get(addr) replicated for fault tolerance Incrementally scalable with more servers serve file system requests use Petal to store data incrementally scalable with more servers ensure consistent updates by multiple servers replicated for fault tolerance lock server

Frangipani security Simple security model: –Runs as a cluster file system –All machines and software are trusted!

Frangipani server implements FS logic Application program: creat(“/d1/f1”, 0777) Frangipani server: 1.GET root directory’s data from Petal 2.Find inode # or Petal address for dir “/d1” 3.GET “/d1”s data from Petal 4.Find inode # or Petal address of “f1” in “/d1” 5.If not exists alloc a new block for “f1” from Petal add “f1” to “/d1”’s data, PUT modified “/d1” to Petal

Concurrent accesses cause inconsistency App: creat(“/d1/f1”, 0777) Server S1: … GET “/d1” Find file “f1” in “/d1” If not exists … PUT modified “/d1” time App: creat(“/d1/f2”, 0777) Server S2: … GET “/d1” Find file “f2” in “/d1” If not exists … PUT modified “/d1” What is the final result of “/d1”? What should it be?

Solution: use a lock service to synchronize access App: creat(“/d1/f1”, 0777) Server S1:.. GET “/d1” Find file “f1” in “/d1” If not exists … PUT modified “/d1” time App: creat(“/d1/f2”, 0777) Server S2: … GET “/d1” … LOCK(“/d1”) UNLOCK(“/d1”) LOCK(“/d1”)

Putting it together Frangipani File server Petal virtual disk lock server Petal virtual disk lock server Frangipani File server create (“/d1/f1”) 1. LOCK “/d1” 3. PUT “/d1” 2. GET “/d1” 4. UNLOCK “/d1”

NFS (or AFS) architecture Simple clients –Relay FS calls to the server –LOOKUP, CREATE, REMOVE, READ, WRITE … NFS server implements FS functions NFS client NFS server

NFS messages for reading a file

Why use file handles in NSF msg, not file names? What file does client 1 read? –Local UNIX fs: client 1 reads dir2/f –NFS using filenames: client 1 reads dir1/f –NFS using file handles: client 1 reads dir2/f File handles refer to actual file object, not names

Frangipani vs. NFS FrangipaniNFS Scale storage Scale serving capacity Fault tolerance Add Petal nodes Buy more disks Add Frangipani Manually partition servers FS namespace among multiple servers Data is replicated Use RAID On multiple Petal Nodes

User-level kernel App FUSE syscall YFS: simplified Frangipani Extent server yfs server lock server yfs server Single extent server to store data Communication using remote procedure calls (RPC)

Lab schedule L1: lock server –Programming w/ threads –RPC semantics L2: basic file server L3: Sharing of files L4: Locking L5: Caching locks L6: Caching extend server+consistency L7: Paxos L8: Replicated lock server L9: Project: extend yfs Due every week Due every other week

L1: lock server Lock service consists of: –Lock server: grant a lock to clients, one at a time –Lock client: talk to server to acquire/release locks Correctness: –At most one lock is granted to any client Additional Requirement: –acquire() at client does not return until lock is granted