IT344 – Operating Systems Winter 2011, Dale Rowe.

IT344 – Operating Systems Winter 2011, Dale Rowe

IT344: Operating Systems Winter 2011 19: Distributed Systems
Dale Rowe 265C CTB Office Hours: Wed 10:00 – 11:00 Thurs 9:00 – 11:00 Fri 10:00 – 12:00 TA: Wade Boden (Tues: 9-11, Thurs 9-11, Fri 10-2)

What is a “distributed system”?
Very broad definition loosely-coupled to tightly-coupled Nearly all systems today are distributed in some way they use they access files over a network they access printers over a network they’re backed up over a network they share other physical or logical resources they cooperate with other people on other machines they access the web they receive video, audio, etc. 11/20/2018

Distributed systems are now a requirement
Economics dictate that we buy small computers Everyone needs to communicate We need to share physical devices (printers) as well as information (files, etc.) Many applications are by their nature distributed (bank teller machines, airline reservations, ticket purchasing) To solve the largest problems, we will need to get large collections of small machines to cooperate together (parallel programming) 11/20/2018

Loosely-coupled systems
Each system is a completely autonomous independent system, connected to others on the network 11/20/2018

Even today, most distributed systems are loosely-coupled
Independent OS’ No implicit trust between computers May have some shared resources Each system has a unique view of the others (dependencies) Communication times are long and may be infrequent 11/20/2018

Closely-coupled systems
A closely coupled system looks to the user as if it were a centralized timesharing system, except that it’s constructed out of a distributed collection of hardware and software components 11/20/2018

A distributed system becomes more “closely-coupled” as it
appears more uniform in nature runs a “single” operating system has a single security domain shares all logical resources (e.g., files) shares all physical resources (CPUs, memory, disks, printers, etc.)

Tightly-coupled systems
A “tightly-coupled” system usually refers to a multiprocessor system runs a single copy of the OS with a single job queue has a single address space usually has a single bus or backplane to which all processors and memories are connected has very low communication latency processors communicate through shared memory 11/20/2018

Some issues in distributed systems
Transparency (how visible is the distribution) Security Reliability Performance Scalability Programming models Communication models 11/20/2018

Distributed File Systems
The most common distributed services: printing Files Computation Basic idea of distributed file systems support network-wide sharing of files and devices (disks) Generally provide a “traditional” view a centralized shared local file system But with a distributed implementation read blocks from remote hosts, instead of from local disks 11/20/2018

Issues What is the basic abstraction Naming remote file system?
open, close, read, write, … remote disk? read block, write block Naming how are files named? are those names location transparent? is the file location visible to the user? are those names location independent? do the names change if the file moves? do the names change if the user moves? 11/20/2018

Caching Sharing and coherency caching exists for performance reasons
where are file blocks cached? on the file server? on the client machine? both? Sharing and coherency what are the semantics of sharing? what happens when a cached block/file is modified how does a node know when its cached blocks are out of date? 11/20/2018

Replication Performance
replication can exist for performance and/or availability can there be multiple copies of a file in the network? if multiple copies, how are updates handled? what if there’s a network partition and clients work on separate copies? Performance what is the cost of remote operation? what is the cost of file sharing? how does the system scale as the number of clients grows? what are the performance limitations: network, CPU, disks, protocols, data copying? 11/20/2018

Example: SUN Network File System (NFS)
The Sun Network File System (NFS) has become a common standard for distributed UNIX file access NFS runs over LANs (even over WANs – slowly) Basic idea allow a remote directory to be “mounted” (spliced) onto a local directory Gives access to that remote directory and all its descendants as if they were part of the local hierarchy Pretty much exactly like a “local mount” or “link” on UNIX except for implementation and performance … no, we didn’t really learn about these, but they’re obvious  11/20/2018

Just as, on a local system, I might link as
For instance: I mount /u4/teng on Node1 onto /students/foo on Node2 users on Node2 can then access this directory as /students/foo if I had a file /u4/teng/myfile, users on Node2 see it as /students/foo/myfile Just as, on a local system, I might link /groups/it344/www/10wi/ as /u4/teng/it344 to allow easy access to my web data from class home directory 11/20/2018

NFS implementation NFS defines a set of RPC operations for remote file access: searching a directory reading directory entries manipulating links and directories reading/writing files Every node may be both a client and server 11/20/2018

NFS defines new layers in the Unix file system
The virtual file system (VFS) provides a standard interface, using v-nodes as file handles. A v-node describes either a local or remote file. System Call Interface Virtual File System UFS NFS RPCs to other (server) nodes (local files) (remote files) RPC requests from remote clients, and server responses buffer cache / i-node table 11/20/2018

NFS caching / sharing On an open, the client asks the server whether its cached blocks are up to date. Once a file is open, different clients can write it and get inconsistent data. Modified data is flushed back to the server every 30 seconds. 11/20/2018

Example: CMU’s Andrew File System (AFS)
Developed at CMU to support all of its student computing Consists of workstation clients and dedicated file server machines (differs from NFS) Workstations have local disks, used to cache files being used locally (originally whole files, subsequently 64K file chunks) (differs from NFS) Andrew has a single name space – your files have the same names everywhere in the world (differs from NFS) Andrew is good for distant operation because of its local disk caching: after a slow startup, most accesses are to local disk 11/20/2018

AFS caching/sharing Need for scaling required reduction of client-server message traffic Once a file is cached, all operations are performed locally On close, if the file has been modified, it is replaced on the server The client assumes that its cache is up to date, unless it receives a callback message from the server saying otherwise on file open, if the client has received a callback on the file, it must fetch a new copy; otherwise it uses its locally-cached copy (differs from NFS) 11/20/2018

Example: Berkeley Sprite File System
Unix file system developed for diskless workstations with large memories at UCB (differs from NFS, AFS) Considers memory as a huge cache of disk blocks memory is shared between file system and VM Files are permanently stored on servers servers have a large memory that acts as a cache as well Several workstations can cache blocks for read-only files If a file is being written by more than 1 machine, client caching is turned off – all requests go to the server (differs from NFS, AFS) 11/20/2018

Other Approaches Serverless Highly Available Mostly Read Only
Xfs, Farsite Highly Available GFS Mostly Read Only WWW State, not Files SQL Server BigTable 11/20/2018

Example: Google File System (GFS)
Independence Small Scale Many users Many programs Cooperation Large Scale Few users Few programs 11/20/2018 © 2007 Gribble, Lazowska, Levy, Zahorjan

“Google” circa 1997 (google.stanford.edu)
11/20/2018

Google (circa 1999) 11/20/2018

Google data center (circa 2000)
11/20/2018

Google new data center 2001 11/20/2018

Google data center (3 days later)
11/20/2018

Google Data Center Now

GFS: Google File System
Why did Google build its own FS? Google has unique FS requirements Huge read/write bandwidth Reliability over thousands of nodes Mostly operating on large data blocks Need efficient distributed operations Unfair advantage Google has control over applications, libraries and operating system 11/20/2018

GFS Idealogy Huge amount of data
Ability to efficiently access data w/ low locality, typical query reads 100s MB of data Large quantity of Cheap machines: performance vs performance/$, performance/W Replication: scalability and h/w failure BW more important than latency Component failures are the norm rather than the exception Atomic append operation so that multiple clients can append concurrently 11/20/2018 32

GFS Usage @ Google 200+ clusters
Filesystem clusters of 1000s of machines Pools of clients 4+ PB Filesystems 40 GB/s read/write load (in the presence of frequent HW failures) 11/20/2018

Files in GFS Files are huge by traditional standards
Most files are mutated by appending new data rather than overwriting existing data Once written, the files are only read, and often only sequentially. Appending becomes the focus of performance optimization and atomicity guarantees 11/20/2018 35

GFS Setup … Master manages metadata
Misc. servers GFS Master Client Masters Replicas GFS Master Client Client C0 C1 C1 C0 C5 … C5 C2 C5 C3 C2 Chunkserver 1 Chunkserver 2 Chunkserver N Master manages metadata Data transfers happen directly between clients/chunkservers Files broken into chunks (typically 64 MB) 11/20/2018

Architecture GFS cluster consists of a single master and multiple chunk servers and is accessed by multiple clients. Each of these is typically a commodity Linux machine running a user-level server process. Files are divided into fixed-size chunks identified by an immutable and globally unique 64 bit chunk handle For reliability, each chunk is replicated on multiple chunk servers master maintains all file system metadata. The master periodically communicates with each chunk server in HeartBeat (timer) messages to give it instructions and collect its state Neither the client nor the chunk server caches file data eliminating cache coherence issues. Clients do cache metadata, however. 11/20/2018 37

Architecture 11/20/2018 38

Read Process Single master vastly simplifies design
Clients never read and write file data through the master. Instead, a client asks the master which chunk servers it should contact. Using the fixed chunk size, the client translates the file name and byte offset specified by the application into a chunk index within the file It sends the master a request containing the file name and chunk index. The master replies with the corresponding chunk handle and locations of the replicas. The client caches this information using the file name and chunk index as the key. The client then sends a request to one of the replicas, most likely the closest one. The request specifies the chunk handle and a byte range within that chunk 11/20/2018 39

Specifications Chunk Size = 64 MB
Chunks stored as plain Unix files on chunk server. A persistent TCP connection to the chunk server over an extended period of time (reduce network overhead) cache all the chunk location information to facilitate small random reads. Master keeps the metadata in memory Disadvantages – Small files become Hotspots. Solution – Higher replication for such files. 11/20/2018 40

Microsoft Data Center 4.0 http://www.youtube.com/watch?v=PPnoKb9fTkA
11/20/2018

Data center container Microsoft $500M Chicago data center (2009)
> 2000 servers/container (40 ft) 150 containers 11 diesel generators, each 2.8 megawatts 12 chillers, each 1260 tons 11/20/2018

Data center container Google IBM HP … 11/20/2018

11/20/2018

Cloud Computing Platforms
11/20/2018

Client/server computing
Mail server/service File server/service Print server/service Compute server/service Game server/service Music server/service Web server/service etc. 11/20/2018

Peer-to-peer (p2p) systems
Napster Gnutella (LimeWire) example technical challenge: self-organizing overlay network technical advantage of Gnutella? er … legal advantage of Gnutella? Data source: Digital Music News Research Group 11/20/2018

Summary There are a number of issues to deal with:
what is the basic abstraction naming caching sharing and coherency replication performance No right answer! Different systems make different tradeoffs! 11/20/2018

Performance is always an issue
always a tradeoff between performance and the semantics of file operations (e.g., for shared files). Caching of file blocks is crucial in any file system maintaining coherency is a crucial design issue. Newer systems are dealing with issues such as disconnected operation for mobile computers 11/20/2018

Service Oriented Architecture
How do you allow hundreds of developers to work on a single website?

Amazon.com: The Beginning
Initially, one web server (Obidos) and one database Internet Obidos Database Details: Front end consists of a web server (Apache) and “business logic” (Obidos)

Amazon: Success Disaster!
Use redundancy to scale-up, improve availability Obidos Database Obidos Amazon.com Internet Obidos Load balancer Database As the site was growing, the was a need/demand for more features. Obidos Obidos

Obidos Obidos was a single monolithic C application that comprised most of Amazon.com’s functionality During scale-up, this model began to break down Lots of things become harder with more developers working on a single executable

Problem #1: Branch Management
Merging code across branches becomes untenable HelloWorld.c release development Blue changes depend on Red changes (which may depend on other changes…)

Problem #2: Debugging On a failure, we would like to inspect what happened “recently” But, the change log contains numerous updates from many groups Bigger problem: lack of isolation Change by one group can impact others

Problem #3: Linker Failure
Obidos grew so large that standard build tools were failing

Service-Oriented Architecture (1)
First, decompose the monolithic web site into a set of smaller modules Called services Examples: Recommendation service Price service Catalogue service And MANY others

Sidebar: Modularity Information hiding (Parnas 1972): The purpose of a module is to hide secrets Benefits of modularity Groups can work independently Less “synchronization overhead” Ease of change We are free to change the hidden secrets Ease of comprehension Can study the system at a high level of abstraction public interface List { } // This can be an array, a linked-list, // or something else

Systems and Information Hiding
There is often a tension between performance and information hiding In OS’s, performance often wins: struct buffer { // DO NOT MOVE these fields! // They are accessed by inline assembly that // assumes the current ordering. struct buffer* next; struct buffer* prev; int size; … }

Service Oriented Architectures (2)
Modularity + a network Services live on disjoint sets of machines Services communicate using RPC Remote procedure call

Remote Procedure Call RPC exposes a programming interface across machines: interface PriceService { float getPrice(long uniqueID); } PriceImpl getPrice() Server Client

SOA, Visualized Recommendation Shopping Price Cart Website Catalogue
All services reside on separate machines All invocations are remote procedure calls

Benefits of SOA Modularity and service isolation Better visibility
This extends all the way down to the OS, programming language, build tools, etc. Better visibility Administrators can monitor the interactions between services Better resource accounting Who is using which resources?

Performance Issues A webpage can require dozens of service calls
RPC system must be high performance Metrics of interest: Throughput Latency Both average and the variance

SLAs Service performance is dictated by contracts called Service Level Agreements e.g., Service Foo must Have 4 9’s of availability Have a median latency of 50 ms Have a 3 9’s latency of 200 ms

Amazon and Web Services
Sleds.com Amazon.com Front-end website Order Processing Shopping Carts Catalogue Allow third-parties to use some (but not all) of the Amazon platform

Searching on a Web Site

Searching Through a Web Service
class Program { static void Main(string[] args) { AWSECommerceService service = new AWSECommerceService(); ItemSearch request = new ItemSearch(); request.SubscriptionId = "0525E2PQ81DD7ZTWTK82"; request.Request = new ItemSearchRequest[1]; request.Request[0] = new ItemSearchRequest(); request.Request[0].ResponseGroup = new string[] { "Small" }; request.Request[0].SearchIndex = "Books"; request.Request[0].Author = "Tom Clancy"; ItemSearchResponse response = service.ItemSearch(request); Console.WriteLine(response.Items[0].Item.Length + " books written by Tom Clancy found."); }

Other Web Services Google Amazon infrastructure services (cloud)
Calendar Maps Charts Amazon infrastructure services (cloud) Simple storage (disk) Elastic compute cloud (virtual machines) SimpleDB Facebook Ebay …

IT344 – Operating Systems Winter 2011, Dale Rowe.

Similar presentations

Presentation on theme: "IT344 – Operating Systems Winter 2011, Dale Rowe."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IT344 – Operating Systems Winter 2011, Dale Rowe.

Similar presentations

Presentation on theme: "IT344 – Operating Systems Winter 2011, Dale Rowe."— Presentation transcript:

Similar presentations

About project

Feedback