MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John.

Slides:



Advertisements
Similar presentations
MPEG-4 CS Division University of California at Berkeley John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault.
Advertisements

MPEG-4 Structured Audio CS Division University of California at Berkeley John Lazzaro John Wawrzynek June 18, 2001 Modified.
Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Tamara Berg Advanced Multimedia
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
                      Digital Audio 1.
03/18/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 4: Digital.
Data Compression CS 147 Minh Nguyen.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
I Power Higher Computing Multimedia technology Audio.
SWE 423: Multimedia Systems Chapter 3: Audio Technology (2)
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
EE2F2 - Music Technology 9. Additive Synthesis & Digital Techniques.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
DIGITAL COMMUNICATIONS.  The modern world is dependent on digital communications.  Radio, television and telephone systems were essentially analog in.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Audiovisual digital documents Adolf Knoll National Library of the Czech Republic
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
Music Processing Roger B. Dannenberg. Overview  Music Representation  MIDI and Synthesizers  Synthesis Techniques  Music Understanding.
Spring 2002EECS150 - Lec13-proj Page 1 EECS150 - Digital Design Lecture 13 - Final Project Description March 7, 2002 John Wawrzynek.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
EET 450 Chapter 18 – Audio. Analog Audio Sound is analog Consists of air pressure that has a variety of characteristics  Frequencies  Amplitude (loudness)
Music Processing Roger B. Dannenberg. Overview  Music Representation  MIDI and Synthesizers  Synthesis Techniques  Music Understanding.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
5. Multimedia Data. 2 Multimedia Data Representation  Digital Audio  Sampling/Digitisation  Compression (Details of Compression algorithms – following.
Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.
MPEG-4 Cedar Wingate MUMT 621 Slide Presentation I Professor Ichiro Fujinaga September 24, 2009.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Digital Communication Techniques
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Seminar on Sound Card Presented by:- Guided by:-
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Introduction to Interactive Media 10: Audio in Interactive Digital Media.
COMP Representing Sound in a ComputerSound Course book - pages
Topics Introduction Hardware and Software How Computers Store Data
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
XP Practical PC, 3e Chapter 16 1 Looking “Under the Hood”
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
CHAPTER SEVEN SOUND. CHAPTER HIGHLIGHTS Nature of sound – Sine waves, amplitude, frequency Traditional sound reproduction Digital sound – Sampled – Synthesized.
AUDIO MEDIA 1 Created } “Borrowed” } Microphone MIDI keyboard CD’s & flash drives Internet Audio Sources 2.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
COSC 1P02 Introduction to Computer Science 4.1 Cosc 1P02 Week 4 Lecture slides “Programs are meant to be read by humans and only incidentally for computers.
Multimedia Elements: Sound, Animation, and Video.
By Van Bucsko May 7, Intro – What is SAOL? SAOL is a standard programming language that specifies sound as a computer program that generates audio.
Multimedia Technology and Applications Chapter 2. Digital Audio
Chapter 15 Recording and Editing Sound. 2Practical PC 5 th Edition Chapter 15 Getting Started In this Chapter, you will learn: − How sound capability.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 9 This presentation © 2004, MacAvon Media Productions Sound.
09/30/2005ENEE408G Fall 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 2: Digital Audio.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
MPEG-4 standard MPEG-4 Multimedia Standard Olivier Dechazal.
It sure is smart but can it swing? (Digital audio and computer music)
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
Interactive Multimedia Sound Mikael Fernström. Data sources Microphones and transducers –Sample acoustic reality Synthesis –Simulate reality (and beyond.
Sound. Sound Capture We capture, or record, sound by a process called sampling: “measuring” the sound some number of times per second. Sampling rate is.
Introduction to MPEG  Moving Pictures Experts Group,  Geneva based working group under the ISO/IEC standards.  In charge of developing standards for.
XP Practical PC, 3e Chapter 14 1 Recording and Editing Sound.
Multimedia Systems Dr. Wissam Alkhadour.
Chapter 15 Recording and Editing Sound
CS 591 S1 – Computational Audio -- Spring, 2017
Vector Processing => Multimedia
Topics Introduction Hardware and Software How Computers Store Data
Govt. Polytechnic Dhangar(Fatehabad)
Digital Audio Application of Digital Audio - Selected Examples
Presentation transcript:

MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley John Lazzaro John Wawrzynek Sep 1, 1999

MPEG 4 Structured Audio Outline:  Motivation for structured audio  Introduction to MP4-SA  Example encoding  C translator  Physical Instrument Modeling  Hardware Architectures  Future directions

Digital Audio Basics  How well does this work?  True Lossless: 2.5X reduction  Shorten, T. Robinson (Cambridge University)  “Perceptually Lossless” : 10X-20X reduction  MP3, Dolby AC3, …  mono: kbps  Cell-phone network: 5-10kbps  dialup modems: 50 kpbs  xDSL: 128 to 1000 kbps time amp 16-bit samples 44.1kHz sample rate decoderencoder Traditional Compression:

The Kolmogorov alternative:  Write a computer program that generates the desired audio stream.  Transmit the computer program.  To decode, execute the program.  MPEG-4 Structured Audio (MP4-SA) uses this approach.  Final draft standard: Nov 15,  Eric Schierer, Editor (MIT Media Lab).  Similar to Postscript!

MP4-SA Encoding  may be a creative act: writing a program.  directly (emacs), or  indirectly (GUI, webpage)  In this case, MP4-SA is a lossless compressor.  may be automatic -- given a sound, an encoder writes a program that generates the sound.  Automatic encoding is a hard problem in the general case. MP4-SA Decoders  are interpreters or compilers.

Key Application: Music Production  Modern Music Production is Computer based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Network Premium on low-bandwidth

Key Application: Music Production  Modern Music Production is Computer based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Ideal format for collaborative productions, remixes,... File System Standard Framework

MPEG 4 Structured Audio:  A binary file format that encodes:  The programming language SAOL (say: sail).  The musical score language SASL.  Legacy support for MIDI.  Audio sample data.  Result is normative: an MP4-SA file will sound identical on all compliant decoders. èDifferent from MIDI files.

MPEG 4 Standard Structured Audio: One “component” in the MPEG audio standard. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS ISO/IEC sec5

MPEG 4 Standard Advanced Audio Coding: successor to MP3, delivers highest quality audio, and highest bit-rate. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Time-Frequency Coding: Meant for a moderate bit/sec range, with moderate quality. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Code Excited Linear Prediction: Low bit rate coder, works best as a speech coder. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Parametric coders: Very-low bit rate coder, works best as as a speech coder. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Text-to-Speech: Takes phonetic and prosadic control information, produces syntesized speech. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard “System” level includes mechanisms for composing and synchronizing audio (& video) components. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

Why SAOL and MP4-SA? Why not Java?  Musical performance have temporal structure that changes over several timescales: Sample-by-sample 10’s of usec Amplitude & timbre envelopes: 10’s of msec Note-by-note: 100’s of msec  Writing sound generation code in a conventional language results in code dominated by time-scale management.  Hard to maintain, hard to optimize.

Time management is built into SAOL.  A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion.  Work is scheduled to happen:  at the a-rate (the audio sample rate)  at the k-rate (envelope control rate)  at the i-rate (rate for new notes)  Language variables are typed as a/k/i-rate.  A language statement is scheduled based on the rate of the variables it contains.

SAOL, SASL, and Scheduling:  Sound creation in MP4-SA can be compared to a musician playing notes on an instrument.  A SAOL subprogram (called an instr or instrument) serves as the instrument.  SASL commands (called score lines) act to play notes on SAOL instruments.  Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

Single Note Execution Trace SAOL Instruments... Contains all the instructions for playing a note: -- Code that runs at note launch. (once per i-pass) -- Code that models timbre evolution at the k-rate. (once per kpass) -- Code to generate audio samples at the a-rate. (once per a-pass) Executing a Note … (k-rate: 4 kHz, a-rate: 40 kHz) time(us) pass 0 i-pass 0 k-pass 0 a-pass 25 a-pass 50 a-pass a-pass 250 k-pass 250 a-pass 275 a-pass 300 a-pass a-pass 500 k-pass 500 a-pass 525 a-pass...

An example:  SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)  This SASL file plays melody on tone : 0.5 tone tone tone tone tone tone tone end How long instrument runs When instance is launched Instance parameters (note number, loudness)

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin( *cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin( *cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone i-rate

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin( *cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone k-rate

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin( *cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone a-rate

SAOL: Unique Features  Rate semantics:  i/k/a-rate execution  Vector arithmetic:  ex: A=B+C  for i=1,n A[i]=B[i]+C[i]  All floating-point arithmetic.  Extensive build-in audio function library:  signal generators, table operators, pitch converters, filters, fft, sample rate conversion, effects,...

SAOL: Unique Features  Instrument communication through bus structures:  Dynamic instrument creation and control.  Scheduler and language support for MIDI and SASL scores. CD BA bus

Sfront - a SAOL-to-C translator sfront foo.mp4sa.c  Converts MP4-SA files to a C program, that when executed, produces audio.  Runs on UNIX, Win98/NT.  Licensed under the GNU public license (GPL).  sfront foo.mp4 SAOL MIDI Uncompressed samples SASL sa.c  Handles SAOL, SASL, MIDI, uncompressed samples.

Sfront Benchmarks Sfront version 0.36 Machine: 450 Mhz Pentium III, 128 MB, gcc version egcs , -O3 optimizer Audio sample rate: 44.1 kHz for all examples MP3 compression ratio = 11

Sfront Performance Summary:  Rendering (file decoding):  Current performance: a benchmark suite of moderately complex MP4-SA streams computes in a time equivalent to the audio it generates, on a 400 Mhz Ultrasparc & 450 Mhz Pentium.  Real-time interaction:  with a MIDI keyboard with acceptable latency (~20 ms) and microphone input.

Interesting Issues:  MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good  Sampling Natural Instruments bad  If models are chosen carefully, compression ratios of 100 to 10,000 are possible.  Physical Modeling is relatively immature, but holds much promise.

Struck/Plucked Instrument Model frequency amplitude Digital resonator: Yn =  Yn-1 +  Yn-2 + Xn output M1 M2 M3 Mn striker  linear modes (resonances)attack section single strike multiple strikes Aluminum Bar Sounds Examples: struck bars, bells, drums, plucked strings Parameters: striker characteristics, resonator constants

Blown Instrument Model Examples: pipes, flutes, etc. jet y x frequency amplitude Parameters: shape of non-linear function, resonator constants non-linear element linear element (resonant modes) xy excitation tube Blown Pipe Sounds brass pipe overblown

Physical Modeling Summary  Models instrument not sound.  Advantages over traditional synthesis techniques (FM, sample-based):  Compact descriptions.  Physical parameterization leads to:  more intuitive control  lower control bandwidth  State accurate simulation leads to:  efficiency in re-excitation  emulation of otherwise missing effects  Ultimately - more realistic sounds.

Physical Modeling Summary (cont.)  Disadvantages:  potential for high computational complexity  Approaches:  PDE (partial differential equation) approach would be nice, but probably not practical.  ODE (ordinary differential equation, lumped circuit models) practical and very general. Capture essential physics.  Wave-guide filters provide a more efficient alternative in some cases.

Interesting Issues (cont.):  MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.  A new role for psychophysics: Instead of using psychophysics to squeeze bits out of a sound representation, MP4-SA decoders will use psychophysics to squeeze FLOPS out of sound computations.  Leverage spectral and temporal masking.

Interesting Issues (cont.):  MP4-SA can be used in a way similar to traditional compression except that the compression method can be ad hoc:  Frame-work for experimentation in encoding.  Hope for automatic encoding, if done in a voice specific way:  vocals  guitar  sax  and other hard-to-synthesize sounds.

Running SAOL on Conventional Architectures  Lessons Learned from SAOL development:  Temporal typing of variables has the nice side effect of marking the inner loops.  Typically, a-rate = 10X to 100X k-rate  A-rate code optimization : moving subexpressions into k-rate or i-rate.  SAOL semantics support a static heap.  No recursion, all variables sp floats, no pointers... simplifies optimization.  Other researchers (Giorgio Zoia - ETH) focusing on blocking all a-passes for an instance, reducing overhead.  Processors with SIMD FP support (Intel SSE, AMD 3DNow!) will be a good match.

Fixed-Function Hardware for SAOL Accelerators  Unlike MPEG-2 chips, DVD chips, etc., its not clear how MP4-SA can be accelerated by rolling an ASIC.  Since every MP4-SA file is a new algorithm.  Common opcodes can be hardwired and the general characteristics of typical MP4-SA files could be leveraged to specialize a conventional processor design.  But the language is only six months old; execution frequencies are not known.  Reconfigurable computing architectures might hold promise (however, MP4-SA is all floating point).

Directions / Research Opportunities  Compiler optimizations for:  SAOL and other languages with rate semantics  high-performance SIMD architectures  runtime code specialization  Runtime scheduling under limited compute resources.  SAOL programming environments.  Physical modeling.  Automatic encoding.