Device-Mapper Remote Replication Target LinuxTag Berlin 2008

Slides:



Advertisements
Similar presentations
More on File Management
Advertisements

Processes Management.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Abhinav Kamra Computer Science, Columbia University 2.1 Operating System Concepts Silberschatz, Galvin and Gagne  2002 Chapter 2: Computer-System Structures.
Module – 12 Remote Replication
Section 3 : Business Continuity Lecture 29. After completing this chapter you will be able to:  Discuss local replication and the possible uses of local.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
General System Architecture and I/O.  I/O devices and the CPU can execute concurrently.  Each device controller is in charge of a particular device.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Chapter 2: Computer-System Structures
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Page 1 of John Wong CTO Twin Peaks Software Inc. Mirror File System A Multiple Server File System.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
Silberschatz, Galvin and Gagne  Operating System Concepts Six Step Process to Perform DMA Transfer.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
Virtual Machine Movement and Hyper-V Replica
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Device-mapper RAID 4/5 target (dm-raid45) Linux-Kongress Nürnberg 2006 Heinz Mauelshagen Consulting Development Engineer Linux-Kongress 2006.
Device-Mapper Remote Replication Target LinuxCon Portland, OR 2009 Heinz Mauelshagen Consulting Development Engineer.
Device-Mapper Remote Replication Target Linux-Kongress Hamburg 2008 Heinz Mauelshagen Consulting Development Engineer.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Introduction to Operating Systems Concepts
Day 28 File System.
File-System Management
Introduction to Kernel
Module 12: I/O Systems I/O hardware Application I/O Interface
Jonathan Walpole Computer Science Portland State University
Chapter 2: Computer-System Structures(Hardware)
Chapter 2: Computer-System Structures
Processes and threads.
Process Management Process Concept Why only the global variables?
Chapter 3: Process Concept
Chapter 9: Virtual Memory
Chapter 12: Mass-Storage Structure
Operating System I/O System Monday, August 11, 2008.
Device-mapper RAID tool Linux-Kongress Hamburg 2005
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Chapter 3: Windows7 Part 2.
Computer-System Architecture
Module 2: Computer-System Structures
Chapter 3: Windows7 Part 2.
Operating System Concepts
Selecting a Disk-Scheduling Algorithm
CS703 - Advanced Operating Systems
Overview Continuation from Monday (File system implementation)
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Module 2: Computer-System Structures
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 2: Computer-System Structures
Chapter 2: Computer-System Structures
Module 2: Computer-System Structures
Chapter 2: Operating-System Structures
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment. Phone:
Module 2: Computer-System Structures
Mr. M. D. Jamadar Assistant Professor
Module 12: I/O Systems I/O hardwared Application I/O Interface
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Device-Mapper Remote Replication Target LinuxTag Berlin 2008 Heinz Mauelshagen Consulting Development Engineer

Top data replication basics use case examples device-mapper architecture/overview device-mapper replicator architecture/overview mapping table syntax mapping table example dmsetup tool project status/future directions URLs

Data Replication Basics serves disaster recovery purposes by allowing to keep copies of production data in multiple separate locations can occur at various possible levels (application, filesystem, block device, ...) and in active-active (read/write) access in multiple locations) and active-passive forms; this presentation focuses on block device active-passive replication continuous data protection (CDP) copies production data when it's being written to one or more secondary sites data is replicated to one or more remote sites in order to satisfy different recovery objectives requirements fail-over from a primary (i.e. active) site to one of the secondary (i.e. passive) sites holding up-to-date data (almost) no non-redundant periods of time as with traditional, infrequent backups on a daily/weekly/... schedule can be synchronous (application needs to wait for replication of written data to finish) or asynchronous (application receives completion when local write completed)

Data Replication Basics (continued...) synchronous replication requires low-latency, high-bandwidth interconnects to minimize application write delays (e.g. 10 kilometers already case 67s because of the speed of light vs. 10-20s to local cache; think about long-distance replication Berlin -> New York!) on long-distance links, latencies can not be made negligible and bandwidth can not be arbitrary high because of technical and budget constraints because of that, the practical use case on long distance links is asynchronous replication special business requirements may cause synchronous replication on short links replication needs to cope with (temporary) failed remote device access replication properties are adjustable to address different recovery point objectives (RPOs); e.g. 100MB maximum fallbehind in asynchronous replication, hence causing a switch to synchronous replication when that threshold's being reached In order to provide consistent replicas at any point in time, it is mandatory to support grouping of devices and ensure write-ordering-fidelity for the whole group (e.g. N database devices being written to); replication can break at any given point in time, nonetheless allowing for application recovery

Example 1: Asynchronous Replication Paris NYC Primary Secondary Mount only on Failure of Primary A B Primary owned by Single machine or Cluster

Example 2: “Follow the Sun” Paris SFO Primary Secondary Tokyo 00 GMT - 08 GMT 08 GMT - 16 GMT 16 GMT - 24 GMT A B

Example 3: Sync Mirror + Async Replication Mirrored Writes Primary Secondary A B Async Mount only on Failure of Primary Sync Mirror Replicated Sync Mirror + Async Replication

Device-Mapper Architecture (1) Device-Mapper aka dm is the generic mapping runtime platform in the Linux kernel (since 2.5) can be used by multiple applications which require block device mapping (eg, dmraid, LVM2, kpartx or EVMS) doesn't “know” about any application concept of Logical Volumes etc. manages Mapped-Devices aka mds (create, remove, ...) with their device nodes and mappings (e.g. sector N/device A -> sector M/device B) defines mapping of mds via text formatted Mapping-Tables (load, unload, reload); format: <start> <length> <target> [<target_parameter>...] pluggable Mapping-Targets do the actual mapping (e.g. linear, mirror, striped, snapshot, multipath, zero, error, replicator, ...) with ctr,dtr,map,status,... interface functions maps to arbitrary block devices (e.g. (i)SCSI)

Device-Mapper Architecture (2) Mapping-Targets (e.g. dm-replicator) are dynamically loadable and register with the dm core mds can be stacked in order to build complex mappings (e.g. replication on top of a mirror) dm kernel provides dirty logging (i.e. keeping state about pending writes on regions or out-of-sync regions; dm-dirty-log), low-level io (dm-io) routines and a copy service (dm- kopyd) user space library (libdevmapper) to be interfaced by Device/Volume Management applications and a test tool dmsetup library creates device nodes to access mds in /dev/mapper/

Device-Mapper Architecture (3) Examples of Mapping-Tables which can be activated using the dmsetup tool: “0 1024 linear /dev/sda1 40 1024 2048 striped 2 64 /dev/sda1 1064 /dev/sdb1 0” maps sectors 0-1023 of md to /dev/sda1 at offset sector 40 using the “linear” target and sectors 1024-2071 onto 2 stripes with a stride size of 64 onto device /dev/sda1, sector offset 1064 and onto device /dev/sdb1, sector offset 0 “0 1024 zero 1024 1000 error” creates a “zero” mapping for sectors 0-1023 and an “error” mapping for sectors 1024- 2023 “0 83886080 mirror core 1 1024 2 /dev/sda1 64 /dev/sdb1 64” creates a “mirror” mapping for sectors 0-83886079 using the core dirty log with a region size of 1024 sectors and 2 mirror legs on device /dev/sda1, sector offset 64 and device /dev/sdb1, same offset

Device-Mapper Architecture Overview dmsetup lvm2 ... libdm Userspace Kernelpace dm-core target target target ... low-level device

dm-replicator Architecture (1) falls appart into 3 modules: replicator core, log and slink module replicator core module satisfies the device-mapper core interfaces to e.g. construct/destruct a mapping, status and the actual map function to carry ios out a log interface abstracts storing the written data together with their replication state; different replication log plugins are possible and a “default” type with ring buffer logic has been implemented in case the ring buffer runs full, old entries are being dropped and state is being kept to resynchronize the data using the dm dirty log subsystem; during resynchronization there's no write-ordering guarantees an slink interface abstracts accessing devices on site links; different slinks plugins are possible and a “local” type has been implemented to allow for device access via device nodes; other slink modules for different transports can follow replicator core doesn't know how to store data in the replication log, it hands that task to the replication log plugin replicator log doesn't need to know how to access devices, it hands such accesses to the slink plugin

dm-replicator Architecture (2) sync/async and fallbehind parameters are kept by the slink module and can be requested from it each of the replicator modules (core, log and slink) run one or more kernel threads for various purposes the core has one to handle the queue of incoming ios to throttle depending on log contention the log has one to process incoming ios to redirect them as entries to the ring buffer together with metadata keeping state to which site links it needs copying to, hence calling the slink plugin to do the copy the slink has 2 to carry out copies to devices being requested by the log module and to test (temporary) failed devices in order to inform the log module when it can resubmit any failed ios there's additional kernel threads in the dm-kcopyd and dm-io modules being called by the slink module to do the low-level copies and ios

Device-Mapper Replicator Architecture Overview dmsetup libdm Userspace Kernelpace dm-core replicator core replicator log replicator slink low-level device

Mapping-Table Syntax (1) <start> <length> replicator \ replog_type #replog_params <replog_params>... \ \ slink_type #slink_params <slink_params>... \ #dev_params <dev_param>... \ dirtylog_type #dirtylog_params <dirtylog_params>...

Mapping-Table Syntax (2) replog_type = "default" selects the default ring buffer log <#replog_params> = 2-4 replog_params = dev_path dev_start [auto/create/open [size]] dev_path = device path of replication log backing store dev_start = offset to REPLOG header create = create a new replication log or overwrite a given one open = open a given replication log or fail auto = open any given replication log or create a new one size = size to on create

Mapping-Table Syntax (3) slink_type = "local" to handle local device nodes #slink_params = 1-3 <slink_params> = slink# [slink_policy [fall_behind]] slink# = used to tie the host+dev_path to a particular SLINK; 0 is used for the local link+local devices (LDs) and 1-M are for remote links+remote devices(RDs) slink_policy = policy to set on the slink (async/sync) fall_behind = threshold to switch from asynchronous to synchronous mode (ios=N, size=N[kmgpt], timeout=N[tsmhd])

Mapping-Table Syntax (4) #dev_params = 2 <dev_params> = dev# dev_path dev#= unsigned int stored in the REPLOG to associate to a dev_path dev_path = device path of device to replicate to dirtylog_type = "-"/"core"/"disk" #dirtylog_params = 0-3 (1-2 for core dirty log type, 3 for disk dirty log only) dirtylog_params = [dirty_log_path] region_size [[no]sync]

Mapping-Table Example (1 LD + 1 RD, async) 0 209715200 replicator \ default 4 /dev/local_vg/replog_store 32 create 2097152 \ \ local 2 0 async ios=0 \ 2 0 /dev/local_vg/lvol0 \ - 0 local 2 1 async ios=100 \ 2 0 /dev/remote_vg/lvol0 \ disk 3 32768 /dev/local_vg/slink1_lvol0_dirty_log sync

Mapping-Table Example (2nd LD + RD, async) 0 104857600 replicator \ default 4 /dev/local_vg/replog_store 32 open \ \ local 2 0 async ios=0 \ 2 0 /dev/local_vg/lvol1 \ - 0 local 2 1 async ios=100 \ 2 0 /dev/remote_vg/lvol1 \ disk 3 16384 /dev/local_vg/slink1_lvol1_dirty_log sync

dmsetup tool Syntax: # dmsetup create mapped_device file # dmsetup status mapped_device # dmsetup remove mapped_device ... (say we filed the 1LD+1RD example in r1_def with local_vg and remote_vg configured): # dmsetup create r1 r1_def # mount /dev/mapper/r1 /mnt/r1 ...do_some_fs_io... # umount /mnt/r1 # fsck -fy /dev/mapper/r1

Status and Future Directions dm-replicator in code review and testing to go upstream ASAP LVM2 design to support replication being worked on; aims to replicate groups of Logical Volumes

URLs http://sources.redhat.com/dm (Device-Mapper tool+library, ...) http://people.redhat.com/heinzm/sw/dm/dm-replicator http://www.redhat.com/mailman/listinfo/dm-devel to subscribe to dm-devel@redhat.com http://www.redhat.com/mailman/listinfo/lvm-devel to subscribe to lvm-devel@redhat.com

Q&A