Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Similar presentations


Presentation on theme: "Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."— Presentation transcript:

1 Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net

2 Cloud computing

3

4 Distributed system

5 Outline Concepts and Terminology What is Distributed Distributed data & objects Distributed execution Three tier architectures Transaction concepts

6 What’s a Distributed System? Centralized: everything in one place stand-alone PC or Mainframe Distributed: some parts remote distributed users distributed execution distributed data

7 Transparency in Distributed Systems Make distributed system as easy to use and manage as a centralized system Give a Single-System Image Location transparency: hide fact that object is remote hide fact that object has moved hide fact that object is partitioned or replicated Name doesn’t change if object is replicated, partitioned or moved.

8 Naming- The basics Objects have Globally Unique Identifier (GUIDs) location(s) = address(es) name(s) addresses can change objects can have many names Names are context dependent: (Jim @ KGB Jim @ CIA) Many naming systems UNC: \\node\device\dir\dir\dir\object Internet: http://node.domain.root/dir/dir/dir/object LDAP: ldap://ldap.domain.root/o=org,c=US,cn=dir guid Jim Address James

9 Name Servers in Distributed Systems Name servers translate names + context to address (+ GUID) Name servers are partitioned (subtrees of name space) Name servers replicate root of name tree Name servers form a hierarchy Distributed data from hell: high read traffic high reliability & availability autonomy root North South Southern names Northern names root

10 Autonomy in Distributed Systems Owner of site (or node, or application, or database) Wants to control it If my part is working, must be able to access & manage it (reorganize, upgrade, add user,…) Autonomy is Essential Difficult to implement. Conflicts with global consistency examples: naming, authentication, admin…

11 Security The Basics Authentication server subject + Authenticator => (Yes + token) | No Security matrix: who can do what to whom Access control list is column of matrix “who” is authenticated ID In a distributed system, “who” and “what” and “whom” are distributed objects subject Object Permissions

12 Security in Distributed Systems Security domain: nodes with a shared security server. Security domains can have trust relationships: A trusts B: A “believes” B when it says this is Jim@B Security domains form a hierarchy. Delegation: passing authority to a server when A asks B to do something (e.g. print a file, read a database) B may need A’s authority Autonomy requires: each node is an authenticator each node does own security checks Internet Today: no trust among domains (fire walls, many passwords) trust based on digital signatures

13 Clusters The Ideal Distributed System. Cluster is distributed system BUT single location manager security policy relatively homogeneous communications is high bandwidth low latency low error rate Clusters use distributed system techniques for load distribution storage execution growth fault tolerance

14 Cluster: Shared What? Shared Memory Multiprocessor Multiple processors, one memory all devices are local DEC or SGI or Sequent 16x nodes Shared Disk Cluster an array of nodes all shared common disks VAXcluster + Oracle Shared Nothing Cluster each device local to a node ownership may change Tandem, SP2, Wolfpack

15 Outline Concepts and Terminology Why Distribute Distributed data & objects Partitioned Replicated Distributed execution Three tier architectures Transaction concepts

16 Partitioned Data Break file into disjoint groups Exploit data access locality Put data near consumer Less network traffic Better response time Better availability Owner controls data autonomy Spread Load data or traffic may exceed single store Orders N.A. S.A. Europe Asia

17 How to Partition Data? How to Partition by attribute or random or by source or by use Problem: to find it must have Directory (replicated) or Algorithm Encourages attribute-based partitioning N.A. S.A. Europe Asia

18 Naming- The basics Objects have Globally Unique Identifier (GUIDs) location(s) = address(es) name(s) addresses can change objects can have many names Names are context dependent: (Jim @ KGB Jim @ CIA) Many naming systems UNC: \\node\device\dir\dir\dir\object Internet: http://node.domain.root/dir/dir/dir/object LDAP: ldap://ldap.domain.root/o=org,c=US,cn=dir guid Jim Address James

19 Autonomy in Distributed Systems Owner of site (or node, or application, or database) Wants to control it If my part is working, must be able to access & manage it (reorganize, upgrade, add user,…) Autonomy is Essential Difficult to implement. Conflicts with global consistency examples: naming, authentication, admin…

20 Security The Basics Authentication server subject + Authenticator => (Yes + token) | No Security matrix: who can do what to whom Access control list is column of matrix “who” is authenticated ID In a distributed system, “who” and “what” and “whom” are distributed objects subject Object Permissions

21 Security in Distributed Systems Security domain: nodes with a shared security server. Security domains can have trust relationships: A trusts B: A “believes” B when it says this is Jim@B Security domains form a hierarchy. Delegation: passing authority to a server when A asks B to do something (e.g. print a file, read a database) B may need A’s authority Autonomy requires: each node is an authenticator each node does own security checks Internet Today: no trust among domains (fire walls, many passwords) trust based on digital signatures

22 Clusters The Ideal Distributed System. Cluster is distributed system BUT single location manager security policy relatively homogeneous communications is high bandwidth low latency low error rate Clusters use distributed system techniques for load distribution storage execution growth fault tolerance

23 Cluster: Shared What? Shared Memory Multiprocessor Multiple processors, one memory all devices are local DEC or SGI or Sequent 16x nodes Shared Disk Cluster an array of nodes all shared common disks VAXcluster + Oracle Shared Nothing Cluster each device local to a node ownership may change Tandem, SP2, Wolfpack

24 Partitioned Data Break file into disjoint groups Exploit data access locality Put data near consumer Less network traffic Better response time Better availability Owner controls data autonomy Spread Load data or traffic may exceed single store Orders N.A. S.A. Europe Asia

25 How to Partition Data? How to Partition by attribute or random or by source or by use Problem: to find it must have Directory (replicated) or Algorithm Encourages attribute-based partitioning N.A. S.A. Europe Asia

26 Replicated Data Place fragment at many sites Pros: +Improves availability +Disconnected (mobile) operation +Distributes load +Reads are cheaper Cons:  N times more updates  N times more storage Placement strategies: Dynamic: cache on demand Static: place specific Catalog

27 Updating Replicated Data When a replica is updated, how do changes propagate? Master copy, many slave copies (SQL Server) always know the correct value (master) change propagation can be transactional as soon as possible periodic on demand Symmetric, and anytime (Access) allows mobile (disconnected) updates updates propagated ASAP, periodic, on demand non-serializable colliding updates must be reconciled. hard to know “real” value

28 Outline Concepts and Terminology Why Distribute Distributed data & objects Partitioned Replicated Distributed execution remote procedure call queues Three tier architectures Transaction concepts

29 Distributed Execution Threads and Messages Thread is Execution unit (software analog of cpu+memory) Threads execute at a node Threads communicate via Shared memory (local) Messages (local and remote) threads shared memory messages

30 Peer-to-Peer or Client-Server Peer-to-Peer is symmetric: Either side can send Client-server client sends requests server sends responses simple subset of peer-to-peer request response

31 Connection-less or Connected Connection-less request contains client id client context work request client authenticated on each message only a single response message e.g. HTTP, NFS v1  Connected (sessions)  open - request/reply - close  client authenticated once  Messages arrive in order  Can send many replies (e.g. FTP)  Server has client context (context sensitive)  e.g. Winsock and ODBC  HTTP adding connections

32 Remote Procedure Call: The key to transparency Object may be local or remote Methods on object work wherever it is. Local invocation y = pObj->f(x); f() x val y = val; return val;

33 Remote Procedure Call: The key to transparency Remote invocation Obj Local? x val y = val; f() return val; y = pObj->f(x); marshal un marshal un marshal x proxy un marshal pObj->f(x) marshal x stub Obj Local? x val f() return val; val

34 Transaction Object Request Broker (ORB) Registers Servers Manages pools of servers Connects clients to servers Does Naming, request-level authorization, Provides transaction coordination (new feature) Old names: Transaction Processing Monitor, Web server, NetWare Object-Request Broker

35 Outline Concepts and Terminology Why Distributed Distributed data & objects Distributed execution remote procedure call queues Three tier architectures what why Transaction concepts

36 Client/Server Interactions All can be done with RPC Request-Response response may be many messages Conversational server keeps client context Dispatcher three-tier: complex operation at server Queued de-couples client from server allows disconnected operation CS CS C S S S C S S

37 Queued Request/Response Time-decouples client and server Three Transactions Almost real time, ASAP processing Communicate at each other’s convenience Allows mobile (disconnected) operation Disk queues survive client & server failures Client Server Submit Perform Response

38 Why Queued Processing? Prioritize requests ambulance dispatcher favors high-priority calls Manage Workflows Deferred processing in mobile apps OrderBuildShipInvoicePay

39 Google Cloud computing techniques

40 The Google File System

41 The Google File System (GFS) A scalable distributed file system for large distributed data intensive applications Multiple GFS clusters are currently deployed. The largest ones have: 1000+ storage nodes 300+ TeraBytes of disk storage heavily accessed by hundreds of clients on distinct machines

42 Introduction Shares many same goals as previous distributed file systems performance, scalability, reliability, etc GFS design has been driven by four key observation of Google application workloads and technological environment

43 Intro: Observations 1 1. Component failures are the norm constant monitoring, error detection, fault tolerance and automatic recovery are integral to the system 2. Huge files (by traditional standards) Multi GB files are common I/O operations and blocks sizes must be revisited

44 Intro: Observations 2 3. Most files are mutated by appending new data This is the focus of performance optimization and atomicity guarantees 4. Co-designing the applications and APIs benefits overall system by increasing flexibility

45 The Design Cluster consists of a single master and multiple chunkservers and is accessed by multiple clients

46 The Master Maintains all file system metadata. names space, access control info, file to chunk mappings, chunk (including replicas) location, etc. Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state

47 The Master Helps make sophisticated chunk placement and replication decision, using global knowledge For reading and writing, client contacts Master to get chunk locations, then deals directly with chunkservers Master is not a bottleneck for reads/writes

48 Chunkservers Files are broken into chunks. Each chunk has a immutable globally unique 64-bit chunk- handle. handle is assigned by the master at chunk creation Chunk size is 64 MB Each chunk is replicated on 3 (default) servers

49 Clients Linked to apps using the file system API. Communicates with master and chunkservers for reading and writing Master interactions only for metadata Chunkserver interactions for data Only caches metadata information Data is too large to cache.

50 Chunk Locations Master does not keep a persistent record of locations of chunks and replicas. Polls chunkservers at startup, and when new chunkservers join/leave for this. Stays up to date by controlling placement of new chunks and through HeartBeat messages (when monitoring chunkservers)

51 Operation Log Record of all critical metadata changes Stored on Master and replicated on other machines Defines order of concurrent operations Changes not visible to clients until they propigate to all chunk replicas Also used to recover the file system state

52 System Interactions: Leases and Mutation Order Leases maintain a mutation order across all chunk replicas Master grants a lease to a replica, called the primary The primary choses the serial mutation order, and all replicas follow this order Minimizes management overhead for the Master

53 System Interactions: Leases and Mutation Order

54 Atomic Record Append Client specifies the data to write; GFS chooses and returns the offset it writes to and appends the data to each replica at least once Heavily used by Google ’ s Distributed applications. No need for a distributed lock manager GFS choses the offset, not the client

55 Atomic Record Append: How? Follows similar control flow as mutations Primary tells secondary replicas to append at the same offset as the primary If a replica append fails at any replica, it is retried by the client. So replicas of the same chunk may contain different data, including duplicates, whole or in part, of the same record

56 Atomic Record Append: How? GFS does not guarantee that all replicas are bitwise identical. Only guarantees that data is written at least once in an atomic unit. Data must be written at the same offset for all chunk replicas for success to be reported.

57 Replica Placement Placement policy maximizes data reliability and network bandwidth Spread replicas not only across machines, but also across racks Guards against machine failures, and racks getting damaged or going offline Reads for a chunk exploit aggregate bandwidth of multiple racks Writes have to flow through multiple racks tradeoff made willingly

58 Chunk creation created and placed by master. placed on chunkservers with below average disk utilization limit number of recent “ creations ” on a chunkserver with creations comes lots of writes

59 Detecting Stale Replicas Master has a chunk version number to distinguish up to date and stale replicas Increase version when granting a lease If a replica is not available, its version is not increased master detects stale replicas when a chunkservers report chunks and versions Remove stale replicas during garbage collection

60 Garbage collection When a client deletes a file, master logs it like other changes and changes filename to a hidden file. Master removes files hidden for longer than 3 days when scanning file system name space metadata is also erased During HeartBeat messages, the chunkservers send the master a subset of its chunks, and the master tells it which files have no metadata. Chunkserver removes these files on its own

61 Fault Tolerance: High Availability Fast recovery Master and chunkservers can restart in seconds Chunk Replication Master Replication “ shadow ” masters provide read-only access when primary master is down mutations not done until recorded on all master replicas

62 Fault Tolerance: Data Integrity Chunkservers use checksums to detect corrupt data Since replicas are not bitwise identical, chunkservers maintain their own checksums For reads, chunkserver verifies checksum before sending chunk Update checksums during writes

63


Download ppt "Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."

Similar presentations


Ads by Google