SCSI Mid-layer Eric Youngdale 2nd Annual Linux Storage Management Workshop October 2000.

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

Principles of I/O Hardware I/O Devices Block devices, Character devices, Others Speed Device Controllers Separation of electronic from mechanical components.
Lecture 101 Lecture 10: Kernel Modules and Device Drivers ECE 412: Microcomputer Laboratory.
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
Part IV: Memory Management
Module R2 Overview. Process queues As processes enter the system and transition from state to state, they are stored queues. There may be many different.
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
File Systems.
Allocation Methods - Contiguous
Operating Systems Input/Output Devices (Ch , 12.7; , 13.7)
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Memory Management 2010.
Embedded Real-time Systems The Linux kernel. The Operating System Kernel Resident in memory, privileged mode System calls offer general purpose services.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
Block devices and Linux Linux has a generic block device layer with which all filesystems will interact. SCSI is no different in this regard – it registers.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
1 I/O Management in Representative Operating Systems.
Operating System Organization
12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.
Disk and I/O Management
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
SRP Update Bart Van Assche,.
CS 6560 Operating System Design Lecture 13 Finish File Systems Block I/O Layer.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Chapter 12: Mass-Storage Systems Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 12: Mass-Storage Systems Overview of Mass.
1 Module 12: I/O Systems n I/O hardware n Application I/O Interface n Kernel I/O Subsystem n Transforming I/O Requests to Hardware Operations n Performance.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials– 8 th Edition Chapter 10: File System Implementation.
Device Drivers CPU I/O Interface Device Driver DEVICECONTROL OPERATIONSDATA TRANSFER OPERATIONS Disk Seek to Sector, Track, Cyl. Seek Home Position.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
A Cyclic-Executive-Based QoS Guarantee over USB Chih-Yuan Huang,Li-Pin Chang, and Tei-Wei Kuo Department of Computer Science and Information Engineering.
Chapter 4 Memory Management Virtual Memory.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Bluetooth on CE. Mid - Presentation Roman Zoltsman & Oren Haggai Group /2001 Instructor: Nir Borenshtein HSDSL Lab. Technion.
Interrupt driven I/O. MIPS RISC Exception Mechanism The processor operates in The processor operates in user mode user mode kernel mode kernel mode Access.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
I/O, Devices & Device Drivers I/O subsystem It is the responsibility of the I/O subsystem of an OS to schedule, manage, & control the interactions between.
B LOCK L AYER S UBSYSTEM Linux Kernel Programming CIS 4930/COP 5641.
Processes and Virtual Memory
CSC414 “Introduction to UNIX/ Linux” Lecture 2. Schedule 1. Introduction to Unix/ Linux 2. Kernel Structure and Device Drivers. 3. System and Storage.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Informationsteknologi Wednesday, October 3, 2007Computer Systems/Operating Systems - Class 121 Today’s class Memory management Virtual memory.
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 2.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Computer Security: Chapter 5 Operating Systems Security.
CSCE451/851 Introduction to Operating Systems
Chapter 11: File System Implementation
Chapter 12: File System Implementation
Chapter 12: Mass-Storage Structure
Operating System I/O System Monday, August 11, 2008.
Virtual Memory Chapter 8.
Main Memory Management
Introduction to the Kernel and Device Drivers
Chapter 12: Mass-Storage Systems
Introduction to Linux Device Drivers
Chapter 2: The Linux System Part 2
Main Memory Background Swapping Contiguous Allocation Paging
Overview Continuation from Monday (File system implementation)
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Chapter 33: Virtual Machines
Lecture Topics: 11/1 General Operating System Concepts Processes
Chapter 33: Virtual Machines
Presentation transcript:

SCSI Mid-layer Eric Youngdale 2nd Annual Linux Storage Management Workshop October 2000

Introduction Main point of this talk: –Historical evolution of Linux SCSI. –Explain state of the art in Linux 2.2. –Discuss changes for 2.4. –Discuss pending changes in the 2.5 kernel.

Block devices and Linux Linux has a generic block device layer with which all filesystems will interact. SCSI is no different in this regard – it registers itself with the block device layer so it can receive requests. SCSI also handles character device requests and ioctls that do not originate in the block device layer.

What is the “Mid-Layer”? Linux SCSI support can be viewed as 3 levels. Upper level is device management, such as tape, cdrom, disk, etc. Lower level talks to host adapters. Middle layer is essentially a traffic cop, handing requests from rest of kernel, and dispatching them to the rest of SCSI.

State of the art in Linux-2.2 Error handling handled better for drivers that make use of new error handling code. New error handling code introduced in 2.2. Queue management fundamentally unchanged since the Linux 1.x days. “The Code that Time Forgot”. Lots of dinosaurs running around in the code. Rest of mid-level largely stagnant.

What was wrong in 2.2? The elevator algorithms in 2.2 allowed requests to grow irregardless of the capabilities of the underlying device. All SCSI disks were handled in a single queue. Disk driver had to split requests that had become too large. One set of common logic for verifying requests had not become too large.

What was wrong in 2.2 (cont) Character device requests not in queue. SMP safety was clumsily handled, leading to race conditions and poor performance. Poor scalability. Many drivers continue to use old error handling code.

Queue handling in 2.2 Disk1 Disk Queue Head Disk2 Disk1 Disk3 Disk1

Changes for Linux-2.4 Block device layer was generalized to support a “ request_queue_t ” abstract datatype that represents a queue. Contains function pointers that drivers can use for managing the size of requests inserted into queues. Requests no longer can grow to be too large to be handled at one time.

Changes for 2.4 (cont) No longer any need for splitting requests. No need for ugly logic to scan a queue for a queueable request. SMP locking in mid-layer cleaned up to provide finer granularity.

Changes for 2.4 (cont) A SCSI queuing library was created – a set of functions for queue management that are tailored to different sets of requirements. SCSI was modified to use a single queue for each physical device. Character device requests and ioctls are inserted into the same queue at the tail, and handled the same as other requests.

Queuing library Maintainability is a problem if multiple instances of code can perform similar function. __inline static int __scsi_merge_requests_fn(request_queue_t * q, struct request * req, struct request * next, int use_clustering, int dma_host) { /* * Appropriate contents */ }

Queueing Library (Cont). #define MERGEREQFCT(_FUNCTION, _CLUSTER, _DMA) \ static int _FUNCTION(request_queue_t * q, \ struct request * req, \ struct request * next) \ { \ return __scsi_merge_requests_fn(q, req, next, _CLUSTER, _DMA); \ } MERGEREQFCT(scsi_merge_requests_fn_, 0, 0) MERGEREQFCT(scsi_merge_requests_fn_d, 0, 1) MERGEREQFCT(scsi_merge_requests_fn_c, 1, 0) MERGEREQFCT(scsi_merge_requests_fn_dc, 1, 1)

Changes for 2.4 (cont) In 2.2, there were separate functions and code paths for initializing SCSI for the case of compiled into kernel and loaded via modules. In 2.4, this was cleaned up – redundant code was removed, and the same code is used to initialize for both modules and compiled into kernel.

Upcoming changes for 2.5 All drivers will be forced to use new error handling code. Disk driver will be updated to handle larger number of disks. SMP locking will be cleaned up some more to improve scalability.

Old error handling code Essentially a bad state machine. Has tons of SMP problems that are not easily fixed. Tries to resolve errors while allowing new requests to be queued. Many kernel reliability problems are because of old error handling problems. Needs to be discarded in the worst way.

New error handling code The new error handling code has been available since the kernel. To force driver authors to update their drivers, the old error handling code will simply be removed. Drivers that have not been updated will fail to compile. Orphaned drivers will be handled on a case- by-case basis.

Further SMP cleanups All low-level drivers currently use io_request_lock for SMP safety. This lock is also used by all other block devices on the system to protect their queues. Plans are in the works to switch the block device layer to use a per-queue lock, thereby isolating SCSI from other devices.

SMP Cleanups (cont). Low-level drivers don’t need to protect queue – they don’t have access to it. Each low-level driver should have a separate lock – ideally one per instance of host, but could be a driver-wide lock initially. This should be up to the low-level driver.

SMP Cleanups (cont) Block device layer has a number of arrays, indexed by major/minor: blksize_size[MAJOR(dev)][MINOR(dev)] Access is not protected by any locks. Impossible for block drivers to resize without introducing race condition.

Large numbers of disks Current disk driver allocates 8 majors, allowing for only 128 disks. Plans are in the works to allow disk driver to dynamically allocate major numbers. Would support up to about 4000 disks, when major numbers are exhausted. Possible to go beyond this by using fewer bits for partitions.

Wish list. Implement some SCSI-3 features (larger commands, sense buffers). Improve support for shared busses. Support target-mode. Check module add/remove code for SMP safety, implement locks. Improvements related to high-availability.

Conclusions The major goal of a rewrite of SCSI queuing has been accomplished. A number of architectural problems were resolved at the same time. There are still some interesting tasks still to be addressed for 2.5. See for more info, and for “todo” list.

Contacts Web: The notes for this talk are on the website.