+ Advances in the Parallelization of Music and Audio Applications Eric Battenberg, David Wessel & Juan Colmenares.

Slides:

Advertisements

Similar presentations

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Advertisements

Lecture 6: Multicore Systems

Undoing the Task: Moving Timing Analysis back to Functional Models Marco Di Natale, Haibo Zeng Scuola Superiore S. Anna – Pisa, Italy McGill University.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

 Understanding the Sources of Inefficiency in General-Purpose Chips.

Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -

Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.

Windows 2000 and Solaris: Threads and SMP Management Submitted by: Rahul Bhuman.

I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Chapter 13 Embedded Systems

Tessellation OS Architecting Systems Software in a ManyCore World John Kubiatowicz UC Berkeley

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Figure 1.1 Interaction between applications and the operating system.

Network coding on the GPU Péter Vingelmann Supervisor: Frank H.P. Fitzek.

EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.

1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems.

Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.

Copyright Arshi Khan1 System Programming Instructor Arshi Khan.

Operating Systems.

Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.

What is Concurrent Programming? Maram Bani Younes.

Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.

Computer System Architectures Computer System Software

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 2: System Structures.

© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.

Operating Systems CS3502 Fall 2014 Dr. Jose M. Garrido

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

A Real-Time, Parallel GUI Service in Tesselation OS Albert Kim, Juan A. Colmenares, Hilfi Alkaff, and John Kubiatowicz Par Lab, UC Berkeley.

EEL Software development for real-time engineering systems.

Tessellation: Space-Time Partitioning in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz.

Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

OS 2020: Slide 1December 6 th, 2011Swarm Lab Opening Tessellation OS: an OS for the Swarm John Kubiatowicz

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

PRET-OS for Biomedical Devices A Part IV Project.

I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.

Operating System 2 Overview. OPERATING SYSTEM OBJECTIVES AND FUNCTIONS.

Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.

Scott Ferguson Section 1

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

Lecture 2 Page 1 CS 111 Online System Services for OSes One major role of an operating system is providing services – To human users – To applications.

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

Real-Time Musical Applications On An Experimental OS For Multi-Core Processors 林鼎原 Department of Electrical Engineering National Cheng Kung University.

Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.

CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

Background Computer System Architectures Computer System Software.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

1 OPERATING SYSTEMS. 2 CONTENTS 1.What is an Operating System? 2.OS Functions 3.OS Services 4.Structure of OS 5.Evolution of OS.

CHaRy Software Synthesis for Hard Real-Time Systems

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

CS427 Multicore Architecture and Parallel Computing

Operating Systems (CS 340 D)

Introduction to cosynthesis Rabi Mahapatra CSCE617

CS703 - Advanced Operating Systems

Chapter 2 Operating System Overview

Operating System Overview

Presentation transcript:

+ Advances in the Parallelization of Music and Audio Applications Eric Battenberg, David Wessel & Juan Colmenares

+ Overview Parallelism today in the popular interactive music languages Parallel Partitioned Convolution Accelerating Non-Negative Matrix Factorization (NMF) for use in audio source separation and music information retrieval and the importance of Selective, Embedded Just In Time Specialization (SEJITS) Real-time in the Tessellation OS A plea for more flexible I/0 with GPUs

+ Current Support for Parallelism is Copy-Based The widely used languages for music and audio applications are fundamentally sequential in character – this includes Max/MSP, PD, SuperCollider, and CHUCK among others. Limited multithreading One approach to exploiting multi-core processors is to run copies of the applications on separate cores. Max/MSP provides a useful multi-threading mechanism called poly~. PD provides PD~ each instance of which runs in a separate thread inside a PD patch.

+ Partitioned Convolution First real-time app in the Par Lab. Partitioned Convolution – an efficient way to do low-latency filtering with a long (> 1 sec) impulse response. Important in real-time reverb processing for environment simulation. Sound examples: Acoustic Guitar…in a giant mausoleum …convolved with a sine sweep Impulse response

+ Partitioned Convolution Convolution: a way to do linear filtering with a finite impulse response (FIR) filter. Direct convolution: For length L filter, O(L) ops per output point, zero delay. L can be greater than 100,000 samples (> 3 sec of audio) Block FFT Convolution: Only O(log(L)) ops per output point, but delay of L. How can we trade off between complexity and latency? FFT Complex Mult IFFT x x y y H H H = FFT(h)

+ Uniform Partitioned Convolution We would like the latency to be less than 10ms (512 samples) Cut an impulse response up into equal-sized blocks. Then we can use a parallel layout of Block FFT convolvers with delays to implement the filter. The latency is now N, and we still get complexity savings. L N delay(N) x x y y Block FFT Convolver

+ Frequency Delay Line Convolution We can also exploit linearity of the FFT so that only one FFT/IFFT is required. So the parallel Block FFT Convolver above becomes a Frequency Delay Line (FDL) Convolver: delay(N) x x y y FFT Complex Mult IFFT H1 Block FFT Convolver delay(N) + + x x y y Complex Mult H1 Complex Mult H2 Complex Mult H3 FFT IFFT Frequency Delay Line Convolver

+ Multiple FDL Convolution If L is big (e.g. > 100,000) and N is small (e.g. < 1000), our FDL will have 100’s of partitions to handle. We can connect multiple FDL’s in parallel to get the best of both worlds. x x delay(Nx6) delay(4Nx 4) FDL 1 FDL 2 FDL y y x x FDL y y

+ Scheduling Multiple FDLs FDLs are run in separate threads. Each is allowed to compute for a length of time corresponding to its block size. Synchronization is performed at the vertical lines.

+ Auto-Tuning for Real-Time We are not trying to only maximize throughput. We are trying to improve our ability to make real-time guarantees. For now, we estimate a Worst-Case Execution Time (WCET) for each size of FDL. Then we combine the FDLs that are most likely to meet their scheduling deadlines. In the future, we will use a notion of predictability along with more robust scheduling. We are finishing development on a Max/MSP object, Audio Unit plugin, and a portable standalone version of this.

+ Accelerating Non-Negative Matrix Factorization (NMF) NMF is widely used in audio source separation. The idea is to factor the time/frequency representation (spectogram) into source coupled spectral (W) and gain (H) matricies.

+ The Importance of SEJITS in Developing an Information Retrieval (MIR) Application Rather using a domain restricted language developers write in a full blown scripting language such as PYTHON or RUBY. Functions are selected by annotation as performance critical. If efficiency layer implementations of these functions are available appropriate code is generated and JIT compiled. If not the selected function is executed in the scripting language itself. The scripted implementation remains as the portable reference implementation.

With this simple music computer application we expect to initially show that Tessellation can provide acceptable performance and time predictability In cooperation with the OS Group 2nd-level RT scheduler A Cell A 2nd-level RT scheduler B Cell B Initial Cell Sound card Shell F OutputInput Music Program End-to-end Deadline Intermediate Deadline Audio Processing & Synthesis Engine Channe l F Most of the engine’s functionality Filter Parallel version of a partition-based convolution algorithm Audio Input Additional Cells A real-time application in Tessellation

2nd-level Scheduling Cell Tessellation Kernel (Partition Support) (*) Bottom part of the diagram was adapted from Liu and Asanovic, “Mitosys: ParLab Manycore OS Architecture,” Jan A) Cell and Space Partitioning  A Spatial Partition (or Cell) comprises a group of processors acting within a hardware boundary  Each cell receives a vector of basic resources – Some number of processors, a portion of physical memory, a portion of shared cache memory, and potentially a fraction of memory bandwidth  A cell may also receive – Exclusive access to other resources (e.g., certain hardware devices and raw storage partition) – Guaranteed fractional services (i.e., QoS guarantees) from other partitions (e.g., network service and file service) CPUCPU L1L1 L2BankL2Bank DRAMDRAM DRAM & I/O Interconnect L1 Interconnect CPUCPU L1L1 L2BankL2Bank DRAMDRAM CPUCPU L1L1 L2BankL2Bank DRAMDRAM CPUCPU L1L1 L2BankL2Bank DRAMDRAM CPUCPU L1L1 L2BankL2Bank DRAMDRAM CPUCPU L1L1 L2BankL2Bank DRAMDRAM (+) Fraction of memory bandwidth

+ Time-sensitive Network Subsystem Time-sensitive Network Subsystem Network Service (Net Partition) Network Service (Net Partition) Input device (Pinned/TT Partition) Input device (Pinned/TT Partition) Graphical Interface (GUI Partition) Graphical Interface (GUI Partition) Audio-processing / Synthesis Engine (Pinned/TT partition) Audio-processing / Synthesis Engine (Pinned/TT partition) Output device (Pinned/TT Partition) Output device (Pinned/TT Partition) GUI Subsystem Communication with other audio-processing nodes Music program Preliminary Example of Music Application

+ A plea for more flexible GPU I/O

+ Thanks for your attention.

+ Reserve Slides

+ Tessellation OSTessellation: 19 Nov emb er 12th, Tessellation in Server Environment DiskI/ODrivers OtherDevices Network QoS MonitorAndAdapt Persistent Storage & Parallel File System Large Compute-Bound Application Large I/O-Bound Application DiskI/ODrivers OtherDevices Network QoS MonitorAndAdapt Persistent Storage & Parallel File System Large Compute-Bound Application Large I/O-Bound Application DiskI/ODrivers OtherDevices Network QoS MonitorAndAdapt Persistent Storage & Parallel File System Large Compute-Bound Application Large I/O-Bound Application DiskI/ODrivers OtherDevices Network QoS MonitorAndAdapt Persistent Storage & Parallel File System Large Compute-Bound Application Large I/O-Bound Application QoSGuarantees Cloud Storage BW QoS QoSGuarantees QoSGuarantees QoSGuarantees