Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Slides:



Advertisements
Similar presentations
Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,
Advertisements

Distributed Multimedia Systems
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Processes and Their Scheduling.
Scheduling in Batch Systems
Project 2 – solution code
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
5: CPU-Scheduling1 Jerry Breecher OPERATING SYSTEMS SCHEDULING.
Computer Organization and Architecture
Chapter 5: CPU Scheduling
Job scheduling Queue discipline.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
E0262 MIS - Multimedia Playback Systems Anandi Giridharan Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India.
Server Resources 6/ INF5070 – Media Storage and Distribution Systems:
Chapter 6: CPU Scheduling
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
MM Process Management Karrie Karahalios Spring 2007 (based off slides created by Brian Bailey)
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Scheduling. Alternating Sequence of CPU And I/O Bursts.
CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Lecture 7: Scheduling preemptive/non-preemptive scheduler CPU bursts
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
Chapter 5: Process Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Basic Concepts Maximum CPU utilization can be obtained.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
ND The research group on Networks & Distributed systems.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
2.5 Scheduling. Given a multiprogramming system, there are many times when more than 1 process is waiting for the CPU (in the ready queue). Given a multiprogramming.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
6.1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 31 – Process Management (Part 1) Klara Nahrstedt Spring 2009.
CS333 Intro to Operating Systems Jonathan Walpole.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 32 – Multimedia OS Klara Nahrstedt Spring 2010.
1 Uniprocessor Scheduling Chapter 3. 2 Alternating Sequence of CPU And I/O Bursts.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Lecturer 5: Process Scheduling Process Scheduling  Criteria & Objectives Types of Scheduling  Long term  Medium term  Short term CPU Scheduling Algorithms.
Real-Time Operating Systems RTOS For Embedded systems.
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
EEE Embedded Systems Design Process in Operating Systems 서강대학교 전자공학과
Chapter 5a: CPU Scheduling
Chapter 2 Scheduling.
Chapter 8 – Processor Scheduling
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
CPU Scheduling G.Anuradha
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Chapter 6: CPU Scheduling
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
COMP755 Advanced Operating Systems
Module 5: CPU Scheduling
Presentation transcript:

Server Resources 12/ INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Overview Resources, real-time, “continuous” media streams, … (CPU) Scheduling Memory management

Resources and Real – Time

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Resources Resource: “A resource is a system entity required by a task for manipulating data” [Steimetz & Narhstedt 95] Characteristics:  active: provides a service, e.g., CPU, disk or network adapter  passive: system capabilities required by active resources, e.g., memory  exclusive: only one process at a time can use it, e.g., CPU  shared: can be used by several concurrent processed, e.g., memory  single: exists only once in the system, e.g., loudspeaker  multiple: several within a system, e.g., CPUs in a multi-processor system

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Real–Time Real-time process: “A process which delivers the results of the processing in a given time-span” Real-time system: “A system in which the correctness of a computation depends not only on obtaining the result, but also upon providing the result on time” Many real-time applications, e.g.:  temperature control in a nuclear/chemical plant driven by interrupts from an external device these interrupts occur irregularly  defense system on a navy boat driven by interrupts from an external device these interrupts occur irregularly  control of a flight simulator execution at periodic intervals scheduled by timer-services which the application requests from the OS ...

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Real–Time Deadline: “A deadline represents the latest acceptable time for the presentation of the processing result” Hard deadlines:  must never be violated  system failure  too late results have no value, e.g., processing weather forecasts means severe (catastrophic) system failure, e.g., processing of an incoming torpedo signal in a navy boat scenario Soft deadlines:  in some cases, the deadline might be missed not too frequently not by much time  result still may have some (but decreasing) value, e.g., a late I-frame in MPEG

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Real–Time and Multimedia Multimedia systems  have periodic processing requirements (e.g., each 33 ms in a 30 fps video)  require large bandwidths (e.g., average of 3.5 Mbps for DVD video only)  typically have soft deadlines (may miss a frame)  are non-critical (user may be annoyed, but …)  need predictability (guarantees)  adapt real-time mechanisms to continuous media  priority-based schemes are of special importance

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Admission and Reservation To prevent overload, admission may be performed:  schedulability test: “are there enough resources available for a new stream?” “can we find a schedule for the new task without disturbing the existing workload?” a task is allowed if the utilization remains < 1  yes – allow new task, allocate/reserve resources  no – reject Resource reservation is analogous to booking (asking for resources)  pessimistic avoid resource conflicts making worst-case reservations potentially under-utilized resources guaranteed QoS  optimistic reserve according to average load high utilization overload may occur  perfect must have detailed knowledge about resource requirements of all processes too expensive to make/takes much time

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Real–Time and Operating Systems The operating system manages local resources (CPU, memory, disk, network card, busses,...) In a real-time, multimedia scenario, support is needed for:  real-time processing  efficient memory management This also means support for proper …  scheduling – high priorities for time-restrictive multimedia tasks  timer support – clock with fine granularity and event scheduling with high accuracy  kernel preemption – avoid long periods where low priority processes cannot be interrupted  memory replacement – prevent code for real-time programs from being paged out  fast switching – both interrupts and context switching should be fast ...

Continuous Media Streams

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Start playback at t 1 Consumed bytes (offset)  variable rate  constant rate Must start retrieving data earlier Data must arrive before consumption time Data must be sent before arrival time Data must be read from disk before sending time Streaming Data t1t1 time data offset consume function arrive function send function read function

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Need buffers to hold data between the functions, e.g., client B(t) = A(t) – C(t), i.e.,  t : A(t) ≥ C(t) Latest start of data arrival is given by min[B(t,t 0,t 1 ) ;  t B(t,t 0,t 1 ) ≥ 0], i.e., the buffer must at all times t have more data to consume Streaming Data time data offset t1t1 consume function arrive function t 0t 0

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems file system communication system application “Continuous Media” and “continuous streams” are ILLUSIONS  retrieve data in blocks from disk  transfer blocks from file system to application  send packets to communication system  split packets into appropriate MTUs ... (intermediate nodes) ... (client)  different optimal sizes  pseudo-parallel processes (run in time slices)  need for scheduling (to have timing and appropriate resource allocation) Streaming Data

(CPU) Scheduling

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling A task is a schedulable entity (a process/thread executing a job, e.g., an packet through the communication system or a disk request through the file system) In a multi-tasking system, several tasks may wish to use a resource simultaneously A scheduler decides which task that may use the resource, i.e., determines order by which requests are serviced, using a scheduling algorithm Each active (CPU, disk, NIC) resources needs a scheduler (passive resources are also “scheduled”, but in a slightly different way) resource requests scheduler

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling Scheduling algorithm classification:  dynamic make scheduling decisions at run-time flexible to adapt considers only actual task requests and execution time parameters large run-time overhead finding a schedule  static make scheduling decisions at off-line (also called pre-run-time) generates a dispatching table for run-time dispatcher at compile time needs complete knowledge of task before compiling small run-time overhead  preemptive currently executing task may be interrupted (preempted) by higher priority processes preempted process continues later at the same state potential frequent contexts switching (almost!?) useless for disk and network cards  non-preemptive running tasks will be allowed to finish its time-slot (higher priority processes must wait) reasonable for short tasks like sending a packet (used by disk and network cards) less frequent switches

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling Preemption:  tasks waits for processing  scheduler assigns priorities  task with highest priority will be scheduled first  preempt current execution if a higher priority (more urgent) task arrives  real-time and best effort priorities (real-time processes have higher priority - if exists, they will run)  to kinds of preemption: preemption points o predictable overhead o simplified scheduler accounting immediate preemption o needed for hard real-time systems o needs special timers and fast interrupt and context switch handling resource requests scheduler preemption

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling Scheduling is difficult and takes time: process 1process 2process 3process 4process NRT process … request round-robin process 1 process 2process 3process 4process N … RT process request priority, non-preemtive delay RT process delay process 1process 2process 3process 4process N … request priority, preemtive p 1 process 2process 3process 4process N … RT process p 1process 2process 3process 4process N … only delay switching and interrupts

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Priorities and Multimedia Multimedia streams need predictable access to resources – high priorities, e.g.: Within each class one could have a second-level scheduler  1 and 2: real-time scheduling and fine grained priorities  3: may use traditional approaches as round-robin 1. multimedia traffic with guaranteed QoS 2. multimedia traffic with predictive QoS 3. other requests may not exist must not starve

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling in Windows 2000 Preemptive kernel Schedules threads individually Time slices given in quantums  3 quantums = 1 clock interval (length of interval may vary)  defaults: Win2000 server: 36 quantums Win2000 workstation (professional) : 6 quantums  may manually be increased between threads (1x, 2x, 4x, 6x)  foreground quantum boost (add 0x, 1x, 2x): active window can get longer time slices (assumed needs fast response)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling in Windows priority levels: Round Robin (RR) within each level Interactive and throughput-oriented:  “Real time” – 16 system levels fixed priority may run forever  Variable – 15 user levels priority may change: thread priority = process priority ± 2 uses much  drops user interactions, I/O completions  increase  Idle/zero-page thread – 1 system level runs whenever there are no other processes to run e.g., clearing memory pages for memory manager Real Time (system thread) Variable (user thread) Idle (system thread)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling in Linux Preemptive kernel Threads and processes used to be equal, but Linux uses (in 2.6) thread scheduling SHED_FIFO  may run forever, no timeslices  may use it’s own scheduling algorithm SHED_RR  each priority in RR  timeslices of 10 ms (quantums) SHED_OTHER  ordinary user processes  uses “nice”-values: 1≤ priority≤40  timeslices of 10 ms (quantums) Threads with highest goodness are selected first:  realtime (FIFO and RR): goodness = priority  timesharing (OTHER): goodness = (quantum > 0 ? quantum + priority : 0) Quantums are reset when no ready process has quantums left (end of epoch): quantum = (quantum/2) + priority default (20) SHED_FIFO SHED_RR SHED_OTHER nice

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Scheduling in AIX Similar to Linux, but has always only used thread scheduling  SHED_FIFO  SHED_RR  SHED_OTHER BUT, SHED_OTHER may change “nice” values  running long (whole timeslices)  penalty – nice increase  interrupted (e.g., I/O) gives initial “nice” value back default SHED_FIFO SHED_RR SHED_OTHER nice

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming on Commodity Operating Systems Scheduling based on CPU usage only (higher priority if using less) Video playout might require much CPU resources and will thus be down-prioritized (only important on client desktop) Server must deliver data according to deadlines Need for other solutions!!

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Multimedia streams are usually periodic (fixed frame rates and audio sample frequencies) Time constraints for a periodic task:  s – starting point (first time the task require processing)  e – processing time  d – deadline  p – period (r – rate (r = 1/p))  0 ≤ e ≤ d (often d ≤ p: we’ll use d = p – end of period, but Σd ≤ Σp is enough)  the kth processing of the task is ready at time s + (k – 1) p must be finished at time s + (k – 1) p + d  the scheduling algorithm must account for these properties Real–Time Scheduling s time e d p

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Real–Time Scheduling Resource reservation  QoS can be guaranteed  relies on knowledge of tasks  no fairness  origin: time sharing operating systems  e.g., earliest deadline first (EDF) and rate monotonic (RM) (AQUA, HeiTS, RT Upcalls,...) Proportional share resource allocation  no guarantees  requirements are specified by a relative share  allocation in proportion to competing shares  size of a share depends on system state and time  origin: packet switched networks  e.g., Scheduler for Multimedia And Real-Time (SMART) (Lottery, Stride, Move-to-Rear List,...)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Earliest Deadline First (EDF) Preemptive scheduling based on dynamic task priorities Task with closest deadline has highest priority  stream priorities vary with time Dispatcher selects the highest priority task Assumptions:  requests for all tasks with deadlines are periodic  the deadline of a task is equal to the end on its period (starting of next)  independent tasks (no precedence)  run-time for each task is known and constant  context switches can be ignored

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Earliest Deadline First (EDF) Example: Task A Task B time Dispatching deadlines priority A > priority B priority A < priority B

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Rate Monotonic (RM) Scheduling Classic algorithm for hard real-time systems with one CPU [Liu & Layland ‘73] Pre-emptive scheduling based on static task priorities Optimal: no other algorithms with static task priorities can schedule tasks that cannot be scheduled by RM Assumptions:  requests for all tasks with deadlines are periodic  the deadline of a task is equal to the end on its period (starting of next)  independent tasks (no precedence)  run-time for each task is known and constant  context switches can be ignored  any non-periodic task has no deadline

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Process priority based on task periods  task with shortest period gets highest static priority  task with longest period gets lowest static priority  dispatcher always selects task requests with highest priority Example: Rate Monotonic (RM) Scheduling priority period length shortest period, highest priority longest period, lowest priority Task 1 p1p1 Dispatching Task 2 p2p2 P 1 < P 2  P 1 highest priority

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems EDF Versus RM It might be impossible to prevent deadline misses in a strict, fixed priority system: Task A Task B Fixed priorities, A has priority, no dropping Fixed priorities, B has priority, no dropping Fixed priorities, A has priority, dropping Fixed priorities, B has priority, dropping time deadline miss Earliest deadline first deadlines waste of time Rate monotonic (as the first) deadline miss RM may give some deadline violations which is avoided by EDF

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems NOTE: this means that EDF is usually more efficient than RM, i.e., if switches are free and EDF uses resources ≤ 1, then RM may need ≤ ln(2) resources to schedule the same workload EDF Versus RM EDF  dynamic priorities changing in time  overhead in priority switching  QoS calculation – maximal throughput:  R i x e i ≤ 1, R – rate, e – processing time RM  static priorities based on periods  may map priority onto fixed OS priorities (like Linux)  QoS calculation:  R i x e i ≤ ln(2), R – rate, e – processing time all streams i

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems SMART (Scheduler for Multimedia And Real–Time applications) Designed for multimedia and real-time applications Principles  priority – high priority tasks should not suffer degradation due to presence of low priority tasks  proportional sharing – allocate resources proportionally and distribute unused resources (work conserving)  tradeoff immediate fairness – real-time and less competitive processes (short-lived, interactive, I/O-bound,...) get instantaneous higher shares  graceful transitions – adapt smoothly to resource demand changes  notification – notify applications of resource changes Proportional shares  no admission control

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Tasks have importance and urgency  urgency – an immediate real-time constraint, short deadline (determine when a task will get resources)  importance – a priority measure expressed by a tuple: [ priority p, biased virtual finishing time bvft ] p is static: supplied by user or assigned a default value bvft is dynamic: o virtual finishing time: degree to which the share was consumed o bias: bonus for interactive tasks Best effort schedule based on urgency and importance  find most important tasks – compare tuple: T 1 > T 2  (p 1 > p 2 )  (p 1 = p 2  bvft 1 > bvft 2 )  sort after urgency (EDF based sorting)  iteratively select task from candidate set as long as schedule is feasible SMART (Scheduler for Multimedia And Real–Time applications)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Evaluation of a Real–Time Scheduling Tests performed  by IBM (1993)  executing tasks with and without EDF  on an 57 MHz, 32 MB RAM, AIX Power 1 Video playback program:  one real-time process read compressed data decompress data present video frames via X server to user  process requires 15 timeslots of 28 ms each per second  42 % of the CPU time

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Evaluation of a Real–Time Scheduling task number laxity (remaining time to deadline) several deadline violations by the non-real-time scheduler the real-time scheduler reaches all its deadlines 3 load processes (competing with the video playback)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Evaluation of a Real–Time Scheduling task number laxity (remaining time to deadline) Varied the number of load processes (competing with the video playback) NB! The EDF scheduler kept its deadlines 4 other processes 16 other processes Only video process

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Evaluation of a Real–Time Scheduling Tests again performed  by IBM (1993)  on an 57 MHz, 32 MB RAM, AIX Power 1 “Stupid” end system program:  3 real-time processes only requesting CPU cycles  each process requires 15 timeslots of 21 ms each per second  31.5 % of the CPU time each  94.5 % of the CPU time required for real-time tasks

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Evaluation of a Real–Time Scheduling 1 load process (competing with the real-time processes) task number laxity (remaining time to deadline) the real-time scheduler reaches all its deadlines

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Evaluation of a Real–Time Scheduling 16 load process (competing with the real-time processes) task number laxity (remaining time to deadline) Regardless of other load, the EDF-scheduler reach its deadlines (laxity almost equal as in 1 load process scenario) process 1 process 2 process 3 NOTE: Processes are scheduled in same order

Memory Management

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Delivery Systems Network bus(es)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems file system communication system application user space kernel space bus(es) Delivery Systems  several disk-to-memory transfers  several in-memory data movements and context switches

Memory Caching

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Memory Caching communication system application disknetwork card expensive file system cache caching possible How do we manage a cache? how much memory to use? how much data to prefetch? which data item to replace? …

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Is Caching Useful in a Multimedia Scenario? High rate data may need lots of memory for caching… Tradeoff: amount of memory, algorithms complexity, gain, … Cache only frequently used data – how? (e.g., first (small) parts of a broadcast partitioning scheme, allow “top-ten” only, …) Buffer vs. Rate 160 Kbps (e.g., MP3) 1.4 Mbps (e.g., uncompressed CD) 3.5 Mbps (e.g., average DVD video) 100 Mbps (e.g., uncompressed HDTV) 100 MB 85 min 20 s9 min 31 s3 min 49 s8 s 1 GB 14 hr 33 min 49 s1 hr 37 min 31 s39 min 01 s1 min 22 s 16 GB 133 hr 01 min 01 s26 hr 00 min 23 s10 hr 24 min 09 s21 min 51 s 32 GB 266 hr 02 min 02 s52 hr 00 min 46 s20 hr 48 min 18 s43 min 41 s Maximum amount of memory (totally) that a Dell Server can manage in 2004 – and all is NOT used for caching

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Need For Special “Multimedia Algorithms” ? Most existing systems use an LRU-variant  keep a sorted list  replace first in list  insert new data elements at the end  if a data element is re-accessed (e.g., new client or rewind), move back to the end of the list Extreme example – video frame playout: LRU buffer l o n g e s t t i m e s i n c e a c c e s s shortest time since access play video (7 frames): rewind and restart playout at 1: playout 2: playout 3: playout 4: In this case, LRU replaces the next needed frame. So the answer is in many cases YES…

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems “Classification” of Mechanisms Block-level caching consider (possibly unrelated) set of blocks  each data element is viewed upon as an independent item  usually used in “traditional” systems  e.g., FIFO, LRU, CLOCK, …  multimedia (video) approaches: Least/Most Relevant for Presentation (L/MRP) … Stream-dependent caching consider a stream object as a whole  related data elements are treated in the same way  research prototypes in multimedia systems  e.g., BASIC DISTANCE Interval Caching (IC) Generalized Interval Caching (GIC) Split and Merge (SAM) SHR

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Least/Most Relevant for Presentation (L/MRP) L/MRP is a buffer management mechanism for a single interactive, continuous data stream  adaptable to individual multimedia applications  preloads units most relevant for presentation from disk  replaces units least relevant for presentation  client pull based architecture [Moser et al. 95] Server request Homogeneous stream e.g., MJPEG video Client Buffer request Continuous Presentation Units (COPU) e.g., MJPEG video frames

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems current presentation point Least/Most Relevant for Presentation (L/MRP) Relevance values are calculated with respect to current playout of the multimedia stream presentation point (current position in file) mode / speed (forward, backward, FF, FB, jump) relevance functions are configurable [Moser et al. 95] COPUs – continuous object presentation units COPU number relevance value X referenced X history playback direction X skipped

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems loaded frames Global relevance value  each COPU can have more than one relevance value bookmark sets (known interaction points) several viewers (clients) of the same  = maximum relevance for each COPU Least/Most Relevant for Presentation (L/MRP) [Moser et al. 95] Relevance Bookmark-Set Referenced-Set History-Set current presentation point S current presentation point S 2 global relevance value

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Least/Most Relevant for Presentation (L/MRP) L/MRP … … gives “few” disk accesses (compared to other schemes) … supports interactivity … supports prefetching  … targeted for single streams (users)  … expensive (!) to execute (calculate relevance values for all COPUs each round) Variations:  Q-L/MRP – extends L/MRP with multiple streams and changes prefetching mechanism (reduces overhead) [Halvorsen et. al. 98]  MPEG-L/MRP – gives different relevance values for different MPEG frames [Boll et. all. 00]

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Interval Caching (IC) Interval caching (IC) is a caching strategy for streaming servers  caches data between requests for same video stream – based on playout intervals between requests  following requests are thus served from the cache filled by preceding stream  sort intervals on length, buffer requirement is data size of interval  to maximize cache hit ratio (minimize disk accesses) the shortest intervals are cached first Video clip 1 S 11 Video clip 1 S 11 S 12 Video clip 1 S 12 S 11 S 13 Video clip 2 S 22 S 21 Video clip 3 S 33 S 31 S 32 S 34 I 11 I 12 I 21 I 31 I 32 I 33 : I 32 I 33 I 21 I 11 I 31 I 12

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Generalized Interval Caching (GIC) Interval caching (IC) does not work for short clips  a frequently accessed short clip will not be cached GIC generalizes the IC strategy  manages intervals for long video objects as IC  short intervals extend the interval definition keep track of a finished stream for a while after its termination define the interval for short stream as the length between the new stream and the position of the old stream if it had been a longer video object the cache requirement is, however, only the real requirement  cache the shortest intervals as in IC Video clip 1 S 11 S 12 I 11 C 11 S 11 Video clip 2 S 22 S 21 I 21

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Generalized Interval Caching (GIC) Open function: form if possible new interval with previous stream; if (NO) {exit} /* don’t cache */ compute interval size and cache requirement; reorder interval list; /* smallest first */ if (not already in a cached interval) { if (space available) {cache interval} else if (larger cached intervals exist and sufficient memory can be released) { release memory from larger intervals; cache new interval; } } Close function if (not following another stream) {exit} /* not served from cache */ delete interval with preceding stream; free memory; if (next interval can be cached in released memory) { cache next interval }

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems wasted buffering LRU vs. L/MRP vs. IC Caching What kind of caching strategy is best (VoD streaming) ?  caching effect movie X S5S5 S4S4 S2S2 S1S1 S3S3 Memory (L/MRP): Memory (IC): loaded page frames global relevance values I1I1 I2I2 I3I3 I4I4 4 streams from disk, 1 from cache 2 streams from disk, 3 from cache Memory (LRU): 4 streams from disk, 1 from cache

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems LRU vs. L/MRP vs. IC Caching What kind of caching strategy is best (VoD streaming) ?  caching effect (IC best)  CPU requirement LRU for each I/O request reorder LRU chain L/MRP for each I/O request for each COPU RV = 0 for each stream tmp = r ( COPU, p, mode ) RV = max ( RV, tmp ) IC for each block consumed if last part of interval release memory element

In-Memory Copy Operations

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems In Memory Copy Operations communication system application disknetwork card expensive file system expensive

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Cost of Data Transfers Data copy operations are expensive  consume CPU, memory, hub, bus and interface resources (proportional to size)  profiling shows that ~40% of CPU time is consumed by copying data  speed-gap between memory and CPU increase  different access times to different banks System calls makes a lot of switches between user and kernel space  ~450 ns in 2000 on 933MHz PentiumIII  ~920 ns in 2005 on 1.7GHz PentiumIV memcpy() - 1.7GHz PentiumIV

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Cost of Data Transfers – Example I First generation router built with 133 MHz Intel Pentium  mean packet size 500 B  interrupt time of 10 µs, word access 50 ns  per packet processing of 200 instructions (1.504 µs)  copy loop: 4 instructions 2 memory accesses ns (per 4 byte)  per packet: processing + copy + interrupt = µs + [(500/4) * 130 ns] + 10 µs = µs  144 Mbps register  memory[read_ptr] memory[write_ptr]  register read_ptr  read_prt + 4 write_ptr  write_prt + 4 counter  counter – 1 if (counter not 0) goto top of loop

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Cost of Data Transfers – Example II Copying in NetBSD v1.5  by UniK/IFI (2000)  copyin(), copyout(), and memcpy()  933 MHz P3 CPU  theoretical max.: 25.6 Gbps  INTEL: larger is better  BUT: max at 2 – 8 KB decrease at larger sizes  caching effects

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Cost of Data Transfers – Example II (cont.) Assume sending 1 GB data  whole operation, reading from disk and sending to network, takes about 10 s  reading 64 KB blocks from disk  µs per copyout()  sending 4 KB packets  1.65 µs per copyin()  in total: read + send = (16384 * µs) + ( * 1.65 µs) = s for copying only

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Cost of Data Transfers THUS; data movement costs should be kept small ccareful management of contiguous media data aavoid unnecessary physical copy operations aapply appropriate buffer management schemes rreduce overhead by removing physical in-memory copy operation, i.e., Z ZZ ZERO-COPY data paths

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems file system communication system application user space kernel space bus(es) data_pointer Basic Idea of Zero–Copy Data Paths

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming Modes Using Copying Traditional applications: Streaming applications: device driver independent abstraction layer(s) HW device readwrite application-specific data modifications user kernel OS device driver independent abstraction layer(s) HW device readwrite user kernel OS

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Application streaming using zero-copy:  read data into kernel buffer and send from there  application responsible for timing  send: explicit send automatic send Kernel streaming using zero-copy:  thread per stream  perform read and write operations  application specifies timing, but it is ensured by the tread  stream is only created – controlled by kernel user kernel OS device driver independent abstraction layer(s) HW device read & send Streaming Modes NOT Using Copying readwritecreate stream user kernel OS device driver independent abstraction layer(s) HW device thread read write

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Zero – Copy (Streaming) Mechanisms Linux: sendfile()  between two descriptors (file and TCP-socket)  bi-directional: disk-network and network-disk  need TCP_CORK AIX: send_file()  only TCP  uni-directional: disk-network INSTANCE (MMBUF-based, in NetBSD v1.5 ):  by UniK/IFI (2000)  uni-directional: disk-network (network-disk ongoing work)  stream_read() and stream_send() (zero-copy 1)  stream_rdsnd() (zero-copy 2) splice(), stream(), IO-Lite, MMBUF, … Kernel streaming using zero-copy Application streaming using zero-copy

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems INSTANCE CPU Time Transfer 1 GB Used disk blocks of 64 KB Used UDP packets of 1–8 KB Results in seconds: Gain larger than expected:  removed other operations as well like buffer cache look-up (simplified the chain of functions)  some packet drop at server saved about 0.2 s time in seconds packet size in KB Removing copyMeasured 1 KB KB KB KB

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems INSTANCE Zero–Copy Transfer Rate  Throughput increase of ~2.7 times per stream (can at least double the number of streams) Zero-copy transfer rate limited by network card and storage system  saturated a 1 Gbps NIC and 32-bit, 33 MHz PCI  reduced processing time by approximately 50 %  huge improvement in number of concurrent streams approx. 12 Mbps approx. 6 Mbps read, write, with copy read, write, no copy read, automatic write, no copy

Existing Linux Data Paths A lot of research has been performed in this area!!!! BUT, what is the status today of commodity operating systems?

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Content Download file system communication system application user space kernel space bus(es)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Content Download: read / send application kernel page cache socket buffer application buffer read send copy DMA transfer  2n copy operations  2n system calls

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Content Download: mmap / send application kernel page cache socket buffer mmap send copy DMA transfer  n copy operations  1 + n system calls

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Content Download: sendfile application kernel page cache socket buffer sendfile gather DMA transfer append descriptor DMA transfer  0 copy operations  1 system calls

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Content Download: Results UDPTCP Tested transfer of 1 GB file on Linux 2.6 Both UDP (with enhancements) and TCP

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming file system communication system application user space kernel space bus(es)

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming: read / send application kernel page cache socket buffer application buffer read send copy DMA transfer  2n copy operations  2n system calls

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming: read / writev application kernel page cache socket buffer application buffer read writev copy DMA transfer  3n copy operations  2n system calls copy  Previous solution one less copy per packet

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming: mmap / send application kernel page cache socket buffer application buffer mmap uncork copy DMA transfer  2n copy operations  1 + 4n system calls copy send cork

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming: mmap / writev application kernel page cache socket buffer application buffer mmap writev copy DMA transfer  2n copy operations  1 + n system calls copy  Previous solution three less calls per packet

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming: sendfile application kernel page cache socket buffer application buffer DMA transfer  n copy operations  4n system calls gather DMA transfer append descriptor copy uncorksendfilesendcork

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP TCP sendfile (content download) Compared to not sending an RTP header over UDP, we get an increase of 29% (additional send call) More copy operations and system calls required  potential for improvements

Enhanced Streaming Data Paths

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Enhanced Streaming: mmap / msend application kernel page cache socket buffer application buffer DMA transfer  n copy operations  1 + 4n system calls gather DMA transfer append descriptor copy msend allows to send data from an mmap ’ed file without copy mmap uncorksend cork msend copy DMA transfer  Previous solution one more copy per packet

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Enhanced Streaming: mmap / rtpmsend application kernel page cache socket buffer application buffer DMA transfer  n copy operations  1 + n system calls gather DMA transfer append descriptor copy mmap uncorksend cork rtpmsend RTP header copy integrated into msend system call  previous solution require three more calls per packet

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Enhanced Streaming: mmap / krtpmsend application kernel page cache socket buffer application buffer DMA transfer  0 copy operations  1 system call gather DMA transfer append descriptor copy krtpmsend  previous solution require one more call per packet An RTP engine in the kernel adds RTP headers rtpmsend RTP engine  previous solution require one more copy per packet

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Enhanced Streaming: rtpsendfile application kernel page cache socket buffer application buffer DMA transfer  n copy operations  n system calls gather DMA transfer append descriptor copy rtpsendfile  existing solution require three more calls per packet uncorksendfilesendcork RTP header copy integrated into sendfile system call

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Enhanced Streaming: krtpsendfile application kernel page cache socket buffer application buffer DMA transfer  0 copy operations  1 system call gather DMA transfer append descriptor copy krtpsendfile  previous solution require one more call per packet An RTP engine in the kernel adds RTP headers rtpsendfile RTP engine  previous solution require one more copy per packet

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Enhanced Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP TCP sendfile (content download) Existing mechanism (streaming) mmap based mechanisms sendfile based mechanisms ~27% improvement ~25% improvement

The End: Summary

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Summary All resources needs to be scheduled Scheduling algorithms for multimedia tasks have to…  … consider real-time requirements  … provide good resource utilization  (… be implementable) Memory management is an important issue  caching  copying is expensive Rule of thumb: watch out for bottlenecks  copying  data touching operations  frequent context switches (system calls)  scheduling of slow devices (disk) ...

2005 Carsten Griwodz & Pål Halvorsen INF5070 – media storage and distribution systems Some References 1. Halvorsen, P.: “Improving I/O Performance of Multimedia Servers”, Thesis for the Dr. Scient. degree at University of Oslo, Unipub forlag, ISSN , No. 161, Oslo, Norway, August Halvorsen, P., Dalseng, T.A., Griwodz, C.: Assessment of Data Path Implementations for Content Download and Streaming, Proc. of 11th Int. Conf. on Distributed Mulrimedia Systems, Banff, Canada, September Liu, C.L., Layland, J.W.: "Scheduling Algorithms for Multi-Programming in a Hard Real-Time Environment“, Journal of the Association for Computing Machinery 20, 1 (January 1973): Nieh, J., Lam, M.S.: “The Design, Implementation and Evaluation of SMART: A Scheduler for Multimedia Applications”, Proc. of 16th ACM Symp. on Operating System Principles (SOSP’97), St. Malo, France, October 1997, pp Plagemann, T., Goebel, V., Halvorsen, P., Anshus, O.: "Operating System Support for Multimedia Systems", The Computer Communications Journal, Elsevier, Vol. 23, No. 3, February 2000, pp Solomon, D.A., Russinovich, M.E.: “Inside Microsoft Windows2000”, 3rd edition, Microsoft Press, Steinmetz, R., Nahrstedt, C.: “Multimedia: Computing, Communications & Applications”, Prentice Hall, Tanenbaum, A.S.: “Modern Operating Systems” (2nd ed.), Prentice Hall, Wolf, L.C., Burke, W., Vogt, C.: “Evaluation of a CPU Scheduling Mechanism for Multimedia Systems”, Software - Practice and Experience, Vol. 26, No. 4, 1996, pp. 375 – Boll, S., Heinlein, C., Klas, W., Wandel, J.: “MPEG-L/MRP: Adaptive Streaming of MPEG Videos for Interactive Internet Applications”, Proceedings of the 6th International Workshop on Multimedia Information System (MIS’00), Chicago, USA, October 2000, pp Halvorsen, P., Goebel, V., Plagemann, T.: “Q-L/MRP: A Buffer Management Mechanism for QoS Support in a Multimedia DBMS”, Proceedings of 1998 IEEE International Workshop on Multimedia Database Management Systems (IW-MMDBMS'98), Dayton, Ohio, USA, August 1998, pp. 162 – Moser, F., Kraiss, A., Klas, W.: “L/MRP: a Buffer Management Strategy for Interactive Continuous Data Flows in a Multimedia DBMS”, Proceedings of the 21th VLDB Conference, Zurich, Switzerland, 1995