Download presentation
Presentation is loading. Please wait.
Published byJayson Alvis Modified over 10 years ago
1
Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net
2
Cloud computing
4
Distributed system
5
Outline Concepts and Terminology What is Distributed Distributed data & objects Distributed execution Three tier architectures Transaction concepts
6
What’s a Distributed System? Centralized: everything in one place stand-alone PC or Mainframe Distributed: some parts remote distributed users distributed execution distributed data
7
Transparency in Distributed Systems Make distributed system as easy to use and manage as a centralized system Give a Single-System Image Location transparency: hide fact that object is remote hide fact that object has moved hide fact that object is partitioned or replicated Name doesn’t change if object is replicated, partitioned or moved.
8
Naming- The basics Objects have Globally Unique Identifier (GUIDs) location(s) = address(es) name(s) addresses can change objects can have many names Names are context dependent: (Jim @ KGB Jim @ CIA) Many naming systems UNC: \\node\device\dir\dir\dir\object Internet: http://node.domain.root/dir/dir/dir/object LDAP: ldap://ldap.domain.root/o=org,c=US,cn=dir guid Jim Address James
9
Name Servers in Distributed Systems Name servers translate names + context to address (+ GUID) Name servers are partitioned (subtrees of name space) Name servers replicate root of name tree Name servers form a hierarchy Distributed data from hell: high read traffic high reliability & availability autonomy root North South Southern names Northern names root
10
Autonomy in Distributed Systems Owner of site (or node, or application, or database) Wants to control it If my part is working, must be able to access & manage it (reorganize, upgrade, add user,…) Autonomy is Essential Difficult to implement. Conflicts with global consistency examples: naming, authentication, admin…
11
Security The Basics Authentication server subject + Authenticator => (Yes + token) | No Security matrix: who can do what to whom Access control list is column of matrix “who” is authenticated ID In a distributed system, “who” and “what” and “whom” are distributed objects subject Object Permissions
12
Security in Distributed Systems Security domain: nodes with a shared security server. Security domains can have trust relationships: A trusts B: A “believes” B when it says this is Jim@B Security domains form a hierarchy. Delegation: passing authority to a server when A asks B to do something (e.g. print a file, read a database) B may need A’s authority Autonomy requires: each node is an authenticator each node does own security checks Internet Today: no trust among domains (fire walls, many passwords) trust based on digital signatures
13
Clusters The Ideal Distributed System. Cluster is distributed system BUT single location manager security policy relatively homogeneous communications is high bandwidth low latency low error rate Clusters use distributed system techniques for load distribution storage execution growth fault tolerance
14
Cluster: Shared What? Shared Memory Multiprocessor Multiple processors, one memory all devices are local DEC or SGI or Sequent 16x nodes Shared Disk Cluster an array of nodes all shared common disks VAXcluster + Oracle Shared Nothing Cluster each device local to a node ownership may change Tandem, SP2, Wolfpack
15
Outline Concepts and Terminology Why Distribute Distributed data & objects Partitioned Replicated Distributed execution Three tier architectures Transaction concepts
16
Partitioned Data Break file into disjoint groups Exploit data access locality Put data near consumer Less network traffic Better response time Better availability Owner controls data autonomy Spread Load data or traffic may exceed single store Orders N.A. S.A. Europe Asia
17
How to Partition Data? How to Partition by attribute or random or by source or by use Problem: to find it must have Directory (replicated) or Algorithm Encourages attribute-based partitioning N.A. S.A. Europe Asia
18
Naming- The basics Objects have Globally Unique Identifier (GUIDs) location(s) = address(es) name(s) addresses can change objects can have many names Names are context dependent: (Jim @ KGB Jim @ CIA) Many naming systems UNC: \\node\device\dir\dir\dir\object Internet: http://node.domain.root/dir/dir/dir/object LDAP: ldap://ldap.domain.root/o=org,c=US,cn=dir guid Jim Address James
19
Autonomy in Distributed Systems Owner of site (or node, or application, or database) Wants to control it If my part is working, must be able to access & manage it (reorganize, upgrade, add user,…) Autonomy is Essential Difficult to implement. Conflicts with global consistency examples: naming, authentication, admin…
20
Security The Basics Authentication server subject + Authenticator => (Yes + token) | No Security matrix: who can do what to whom Access control list is column of matrix “who” is authenticated ID In a distributed system, “who” and “what” and “whom” are distributed objects subject Object Permissions
21
Security in Distributed Systems Security domain: nodes with a shared security server. Security domains can have trust relationships: A trusts B: A “believes” B when it says this is Jim@B Security domains form a hierarchy. Delegation: passing authority to a server when A asks B to do something (e.g. print a file, read a database) B may need A’s authority Autonomy requires: each node is an authenticator each node does own security checks Internet Today: no trust among domains (fire walls, many passwords) trust based on digital signatures
22
Clusters The Ideal Distributed System. Cluster is distributed system BUT single location manager security policy relatively homogeneous communications is high bandwidth low latency low error rate Clusters use distributed system techniques for load distribution storage execution growth fault tolerance
23
Cluster: Shared What? Shared Memory Multiprocessor Multiple processors, one memory all devices are local DEC or SGI or Sequent 16x nodes Shared Disk Cluster an array of nodes all shared common disks VAXcluster + Oracle Shared Nothing Cluster each device local to a node ownership may change Tandem, SP2, Wolfpack
24
Partitioned Data Break file into disjoint groups Exploit data access locality Put data near consumer Less network traffic Better response time Better availability Owner controls data autonomy Spread Load data or traffic may exceed single store Orders N.A. S.A. Europe Asia
25
How to Partition Data? How to Partition by attribute or random or by source or by use Problem: to find it must have Directory (replicated) or Algorithm Encourages attribute-based partitioning N.A. S.A. Europe Asia
26
Replicated Data Place fragment at many sites Pros: +Improves availability +Disconnected (mobile) operation +Distributes load +Reads are cheaper Cons: N times more updates N times more storage Placement strategies: Dynamic: cache on demand Static: place specific Catalog
27
Updating Replicated Data When a replica is updated, how do changes propagate? Master copy, many slave copies (SQL Server) always know the correct value (master) change propagation can be transactional as soon as possible periodic on demand Symmetric, and anytime (Access) allows mobile (disconnected) updates updates propagated ASAP, periodic, on demand non-serializable colliding updates must be reconciled. hard to know “real” value
28
Outline Concepts and Terminology Why Distribute Distributed data & objects Partitioned Replicated Distributed execution remote procedure call queues Three tier architectures Transaction concepts
29
Distributed Execution Threads and Messages Thread is Execution unit (software analog of cpu+memory) Threads execute at a node Threads communicate via Shared memory (local) Messages (local and remote) threads shared memory messages
30
Peer-to-Peer or Client-Server Peer-to-Peer is symmetric: Either side can send Client-server client sends requests server sends responses simple subset of peer-to-peer request response
31
Connection-less or Connected Connection-less request contains client id client context work request client authenticated on each message only a single response message e.g. HTTP, NFS v1 Connected (sessions) open - request/reply - close client authenticated once Messages arrive in order Can send many replies (e.g. FTP) Server has client context (context sensitive) e.g. Winsock and ODBC HTTP adding connections
32
Remote Procedure Call: The key to transparency Object may be local or remote Methods on object work wherever it is. Local invocation y = pObj->f(x); f() x val y = val; return val;
33
Remote Procedure Call: The key to transparency Remote invocation Obj Local? x val y = val; f() return val; y = pObj->f(x); marshal un marshal un marshal x proxy un marshal pObj->f(x) marshal x stub Obj Local? x val f() return val; val
34
Transaction Object Request Broker (ORB) Registers Servers Manages pools of servers Connects clients to servers Does Naming, request-level authorization, Provides transaction coordination (new feature) Old names: Transaction Processing Monitor, Web server, NetWare Object-Request Broker
35
Outline Concepts and Terminology Why Distributed Distributed data & objects Distributed execution remote procedure call queues Three tier architectures what why Transaction concepts
36
Client/Server Interactions All can be done with RPC Request-Response response may be many messages Conversational server keeps client context Dispatcher three-tier: complex operation at server Queued de-couples client from server allows disconnected operation CS CS C S S S C S S
37
Queued Request/Response Time-decouples client and server Three Transactions Almost real time, ASAP processing Communicate at each other’s convenience Allows mobile (disconnected) operation Disk queues survive client & server failures Client Server Submit Perform Response
38
Why Queued Processing? Prioritize requests ambulance dispatcher favors high-priority calls Manage Workflows Deferred processing in mobile apps OrderBuildShipInvoicePay
39
Google Cloud computing techniques
40
The Google File System
41
The Google File System (GFS) A scalable distributed file system for large distributed data intensive applications Multiple GFS clusters are currently deployed. The largest ones have: 1000+ storage nodes 300+ TeraBytes of disk storage heavily accessed by hundreds of clients on distinct machines
42
Introduction Shares many same goals as previous distributed file systems performance, scalability, reliability, etc GFS design has been driven by four key observation of Google application workloads and technological environment
43
Intro: Observations 1 1. Component failures are the norm constant monitoring, error detection, fault tolerance and automatic recovery are integral to the system 2. Huge files (by traditional standards) Multi GB files are common I/O operations and blocks sizes must be revisited
44
Intro: Observations 2 3. Most files are mutated by appending new data This is the focus of performance optimization and atomicity guarantees 4. Co-designing the applications and APIs benefits overall system by increasing flexibility
45
The Design Cluster consists of a single master and multiple chunkservers and is accessed by multiple clients
46
The Master Maintains all file system metadata. names space, access control info, file to chunk mappings, chunk (including replicas) location, etc. Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state
47
The Master Helps make sophisticated chunk placement and replication decision, using global knowledge For reading and writing, client contacts Master to get chunk locations, then deals directly with chunkservers Master is not a bottleneck for reads/writes
48
Chunkservers Files are broken into chunks. Each chunk has a immutable globally unique 64-bit chunk- handle. handle is assigned by the master at chunk creation Chunk size is 64 MB Each chunk is replicated on 3 (default) servers
49
Clients Linked to apps using the file system API. Communicates with master and chunkservers for reading and writing Master interactions only for metadata Chunkserver interactions for data Only caches metadata information Data is too large to cache.
50
Chunk Locations Master does not keep a persistent record of locations of chunks and replicas. Polls chunkservers at startup, and when new chunkservers join/leave for this. Stays up to date by controlling placement of new chunks and through HeartBeat messages (when monitoring chunkservers)
51
Operation Log Record of all critical metadata changes Stored on Master and replicated on other machines Defines order of concurrent operations Changes not visible to clients until they propigate to all chunk replicas Also used to recover the file system state
52
System Interactions: Leases and Mutation Order Leases maintain a mutation order across all chunk replicas Master grants a lease to a replica, called the primary The primary choses the serial mutation order, and all replicas follow this order Minimizes management overhead for the Master
53
System Interactions: Leases and Mutation Order
54
Atomic Record Append Client specifies the data to write; GFS chooses and returns the offset it writes to and appends the data to each replica at least once Heavily used by Google ’ s Distributed applications. No need for a distributed lock manager GFS choses the offset, not the client
55
Atomic Record Append: How? Follows similar control flow as mutations Primary tells secondary replicas to append at the same offset as the primary If a replica append fails at any replica, it is retried by the client. So replicas of the same chunk may contain different data, including duplicates, whole or in part, of the same record
56
Atomic Record Append: How? GFS does not guarantee that all replicas are bitwise identical. Only guarantees that data is written at least once in an atomic unit. Data must be written at the same offset for all chunk replicas for success to be reported.
57
Replica Placement Placement policy maximizes data reliability and network bandwidth Spread replicas not only across machines, but also across racks Guards against machine failures, and racks getting damaged or going offline Reads for a chunk exploit aggregate bandwidth of multiple racks Writes have to flow through multiple racks tradeoff made willingly
58
Chunk creation created and placed by master. placed on chunkservers with below average disk utilization limit number of recent “ creations ” on a chunkserver with creations comes lots of writes
59
Detecting Stale Replicas Master has a chunk version number to distinguish up to date and stale replicas Increase version when granting a lease If a replica is not available, its version is not increased master detects stale replicas when a chunkservers report chunks and versions Remove stale replicas during garbage collection
60
Garbage collection When a client deletes a file, master logs it like other changes and changes filename to a hidden file. Master removes files hidden for longer than 3 days when scanning file system name space metadata is also erased During HeartBeat messages, the chunkservers send the master a subset of its chunks, and the master tells it which files have no metadata. Chunkserver removes these files on its own
61
Fault Tolerance: High Availability Fast recovery Master and chunkservers can restart in seconds Chunk Replication Master Replication “ shadow ” masters provide read-only access when primary master is down mutations not done until recorded on all master replicas
62
Fault Tolerance: Data Integrity Chunkservers use checksums to detect corrupt data Since replicas are not bitwise identical, chunkservers maintain their own checksums For reads, chunkserver verifies checksum before sending chunk Update checksums during writes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.