Zhilin Huang zhilhuan@cisco.com Disk Hot Swap Zhilin Huang zhilhuan@cisco.com.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Backup and Recovery Copyright System Managers LLC 2008 all rights reserved.
Advertisements

Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
OpenVMS System Management A different perspective by Andy Park TrueBit b.v.
Chapter 7: Configuring Disks. 2/24 Objectives Learn about disk and file system configuration in Vista Learn how to manage storage Learn about the additional.
MIS 431 Chapter 61 Chapter 6 – Managing Disk and Data Storage MIS 431 Created Spring 2006.
Backup and Recovery Part 1.
Simplify your Job – Automatic Storage Management Angelo Session id:
NEC Computers SAS - Confidential - Oct RAID General Concept 1 RAID General Concept Auteur : Franck THOMAS.
GDC Workshop Session 1 - Storage 2003/11. Agenda NAS Quick installation (15 min) Major functions demo (30 min) System recovery (10 min) Disassembly (20.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
MCTS Guide to Microsoft Windows Vista Chapter 4 Managing Disks.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Stretching A Wolfpack Cluster Of Servers For Disaster Tolerance Dick Wilkins Program Manager Hewlett-Packard Co. Redmond, WA
(ITI310) By Eng. BASSEM ALSAID SESSION 3: Using RAID Technology In Windows 2008 Server SAT 07-Nov-2015.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
This courseware is copyrighted © 2016 gtslearning. No part of this courseware or any training material supplied by gtslearning International Limited to.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
DBMS ● What are they? ● Why used ● Examples? – Oracle – Access – MySQL – Postgres – SQLServer – Sqlite.
Planning for Application Recovery
CompTIA Server+ Certification (Exam SK0-004)
EXT in Detail High-Performance Database Research Center
Data Virtualization Demoette… Logging in CIS
© 2002, Cisco Systems, Inc. All rights reserved.
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter Objectives In this chapter, you will learn:
Physics validation database
Nomadic File Systems Uri Moszkowicz 05/02/02.
High Availability Linux (HA Linux)
Session 3 Memory Management
Cisco Data Virtualization
Self Healing and Dynamic Construction Framework:
(ITI310) SESSION 3: Using RAID Technology In Windows 2008 Server.
Reusing old features to build new ones
CSC 4250 Computer Architectures
Modeling Page Replacement Algorithms
Key Terms By: Kelly, Jackson, & Merle
Storage Virtualization
Database Performance Tuning and Query Optimization
Introduction of Week 6 Assignment Discussion
Hands-On Microsoft Windows Server nd Edition
Introduction of Week 3 Assignment Discussion
Displaying Form Validation Info
RAID RAID Mukesh N Tekwani
Fault Tolerance Distributed Web-based Systems
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
Selecting a Disk-Scheduling Algorithm
Modeling Page Replacement Algorithms
Chapter 9: Virtual Memory
Interpret the execution mode of SQL query in F1 Query paper
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 11 Database Performance Tuning and Query Optimization
Performing Database Recovery
THE GOOGLE FILE SYSTEM.
RAID RAID Mukesh N Tekwani April 23, 2019
Storage Management Lecture 7.
CSE 542: Operating Systems
File System Performance
Instructor Materials Chapter 5: Windows Installation
ECE 352 Digital System Fundamentals
Chapter 13: I/O Systems.
MapReduce: Simplified Data Processing on Large Clusters
CSE 542: Operating Systems
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Module 12: I/O Systems I/O hardwared Application I/O Interface
Virtual Memory 1 1.
Presentation transcript:

Zhilin Huang zhilhuan@cisco.com Disk Hot Swap Zhilin Huang zhilhuan@cisco.com

Motivation Traffic Server is designed to be tolerant to disk failures, but there is no corresponding disk recovery mechanism available w/o restart service. In the production systems, restarting Traffic Server service in a cache node is service impacting. Simplicity of implementation Low risk Feature required from our customer (Service Provider) We are supporting Cisco Video CDN products for SP. There are some feature gaps. Our next generation of CDN product will be build on top of TC and ATS Thanks for the support from the open source community, and we are willing to contribute back to the community.

Prerequisites The ATS startup logic will not be changed. As the current design, “storage.config”, “volume.config” will only be loaded at ATS startup. Only Raw disk recovery is supported. For a disk to be a candidate for hot-swap, that disk recovered must be listed in “storage.config”, and used by ATS as an operational disk after the process startup done. The replaced disk must have equal or larger size than the old (failed) disk. And the recovery will only use the same storage size as the old disk. We will only reuse the Data Structure already built at startup.

Disk Hot Swap Procedure Offline a bad disk traffic_ctl storage offline <path-of-the-disk> Replaces the disk hardware, and makes sure that the path of the disk is persistent. Online the replaced disk traffic_ctl storage online <path-of-the-disk> Check status of all disks traffic_ctl storage status Linux disk name is not consistent, relies on persistent dev naming feature. If “traffic_ctl storage online” not entered, nothing impacted.

Cache Architect Re-Capture URL hashed, and assigned to a stripe Raw disk, no (os) file system, ATS handled by itself Disk Header, Stripe Meta data cleared

Cache initialization Load and parse “storage.config” (by Store::read_config), “volume.config” (by ConfigVolumes::read_config_file). Open the raw disk file, read the disk header. Calculate size of each volume and stripe based on configuration and storage size. Then create stripes for each disk (by CacheDisk::create_volume). Read stripe metadata (via Vol::init). After the cache initialized ready, ram cache will be created (via CacheProcessor::cacheInitialized).

Disk Failure

Disk Recovery Check: disk is a candidate? Initialize meta data traffic_ctl storage online <path> path match in gdisks disk is marked bad DiskHotSwapper::handle_disk_hotswap open, fstat propagate new fd Clear disk and strip headers DiskHotSwapper::mark_storage_online reset cache stats SET_DISK_OKAY rebuild_host_table Initialize meta data Mark good A new class DiskHotSwapper will be defined. It is a subclass of Continuation to support asynchronous I/O. It encapsulates all the operations associated with disk hot swap. A new command “traffic_ctl storage online <path-of-the-disk>” will be provided. And when it is triggered, gdisks will be checked to find a disk matching the path and marked as bad. If found, a DiskHotSwapper instance will be created and scheduled immediately. The default handler will be set as DiskHotSwapper::handle_disk_hotswap (). Call open(), and fstat(). If both are successful, then we have a new operational disk. Get the geometry of the new disk, if it has smaller size than the old one, reject. Use only the same size of the old disk. Close the old fd, and propagate the new fd needs to various data structures like CacheDisk, Vol, aio_reqs. Initiate a series of asynchronous calls to perform disk I/O on the new disk to: Write Disk Header to the start of the disk. Write Vol headers for each stripe of the disk. After IO operation done, call to DiskHotSwapper::mark_storage_online. It is similar to mark_storage_offline: resets a couple of global cache stats; and more importantly, rebuilds the assignment table.

Limitations To avoid implementing complicated logic of recovering disk content, a replacement disk will always be cleared. Therefore, no previous cached contents will be used. Only the same size of the old disk will be used for cache. This will help to reuse the already existing data structures for the old (failed) disk, and avoid the complexity to rebuild those data structures. If the replacement is larger, then after a ATS process restart, the full new disk may be used for cache. This could cause the cached content become invalid.

Disk Status Inspection traffic_ctl storage status Will print to “diags.log” Exmaple: [May 12 22:15:44.138] Server {0x2b1c38391700} NOTE: /dev/disk/by- path/pci-0000:03:00.0-scsi-0:0:1:0: [good] [May 12 22:15:44.138] Server {0x2b1c38391700} NOTE: /dev/disk/by- path/pci-0000:03:00.0-scsi-0:0:2:0: [bad] Add example

Thanks! Status: Basic feature working, need more test. Work done in 5.3.2, need merge back to master before PR to open source