MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John.

MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek Sep 1, 1999

MPEG 4 Structured Audio Outline:  Motivation for structured audio  Introduction to MP4-SA  Example encoding  C translator  Physical Instrument Modeling  Hardware Architectures  Future directions

Digital Audio Basics  How well does this work?  True Lossless: 2.5X reduction  Shorten, T. Robinson (Cambridge University)  “Perceptually Lossless” : 10X-20X reduction  MP3, Dolby AC3, …  mono: 705.6 kbps  Cell-phone network: 5-10kbps  dialup modems: 50 kpbs  xDSL: 128 to 1000 kbps time amp 16-bit samples 44.1kHz sample rate decoderencoder Traditional Compression:

The Kolmogorov alternative:  Write a computer program that generates the desired audio stream.  Transmit the computer program.  To decode, execute the program.  MPEG-4 Structured Audio (MP4-SA) uses this approach.  Final draft standard: Nov 15, 1998.  Eric Schierer, Editor (MIT Media Lab).  http://sound.media.mit.edu/~eds/mpeg4/ Similar to Postscript!

MP4-SA Encoding  may be a creative act: writing a program.  directly (emacs), or  indirectly (GUI, webpage)  In this case, MP4-SA is a lossless compressor.  may be automatic -- given a sound, an encoder writes a program that generates the sound.  Automatic encoding is a hard problem in the general case. MP4-SA Decoders  are interpreters or compilers.

Key Application: Music Production  Modern Music Production is Computer based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Network Premium on low-bandwidth

Key Application: Music Production  Modern Music Production is Computer based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Ideal format for collaborative productions, remixes,... File System Standard Framework

MPEG 4 Structured Audio:  A binary file format that encodes:  The programming language SAOL (say: sail).  The musical score language SASL.  Legacy support for MIDI.  Audio sample data.  Result is normative: an MP4-SA file will sound identical on all compliant decoders. èDifferent from MIDI files.

MPEG 4 Standard Structured Audio: One “component” in the MPEG audio standard. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS ISO/IEC 14496-3 sec5

MPEG 4 Standard Advanced Audio Coding: successor to MP3, delivers highest quality audio, and highest bit-rate. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Time-Frequency Coding: Meant for a moderate bit/sec range, with moderate quality. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Code Excited Linear Prediction: Low bit rate coder, works best as a speech coder. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Parametric coders: Very-low bit rate coder, works best as as a speech coder. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard Text-to-Speech: Takes phonetic and prosadic control information, produces syntesized speech. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

MPEG 4 Standard “System” level includes mechanisms for composing and synchronizing audio (& video) components. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS

Why SAOL and MP4-SA? Why not Java?  Musical performance have temporal structure that changes over several timescales: Sample-by-sample 10’s of usec Amplitude & timbre envelopes: 10’s of msec Note-by-note: 100’s of msec  Writing sound generation code in a conventional language results in code dominated by time-scale management.  Hard to maintain, hard to optimize.

Time management is built into SAOL.  A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion.  Work is scheduled to happen:  at the a-rate (the audio sample rate)  at the k-rate (envelope control rate)  at the i-rate (rate for new notes)  Language variables are typed as a/k/i-rate.  A language statement is scheduled based on the rate of the variables it contains.

SAOL, SASL, and Scheduling:  Sound creation in MP4-SA can be compared to a musician playing notes on an instrument.  A SAOL subprogram (called an instr or instrument) serves as the instrument.  SASL commands (called score lines) act to play notes on SAOL instruments.  Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

Single Note Execution Trace SAOL Instruments... Contains all the instructions for playing a note: -- Code that runs at note launch. (once per i-pass) -- Code that models timbre evolution at the k-rate. (once per kpass) -- Code to generate audio samples at the a-rate. (once per a-pass) Executing a Note … (k-rate: 4 kHz, a-rate: 40 kHz) time(us) pass 0 i-pass 0 k-pass 0 a-pass 25 a-pass 50 a-pass... 225 a-pass 250 k-pass 250 a-pass 275 a-pass 300 a-pass... 475 a-pass 500 k-pass 500 a-pass 525 a-pass...

An example:  SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)  This SASL file plays melody on tone : 0.5 tone 0.75 52 0.25 1.5 tone 0.75 64 0.25 2.5 tone 0.5 63 0.25 3 tone 0.25 59 0.2 3.25 tone 0.25 61 0.225 3.5 tone 0.5 63 0.225 4 tone 0.5 64 0.25 5 end How long instrument runs When instance is launched Instance parameters (note number, loudness)

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone i-rate

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone k-rate

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone a-rate

SAOL: Unique Features  Rate semantics:  i/k/a-rate execution  Vector arithmetic:  ex: A=B+C  for i=1,n A[i]=B[i]+C[i]  All floating-point arithmetic.  Extensive build-in audio function library:  signal generators, table operators, pitch converters, filters, fft, sample rate conversion, effects,...

SAOL: Unique Features  Instrument communication through bus structures:  Dynamic instrument creation and control.  Scheduler and language support for MIDI and SASL scores. CD BA bus

Sfront - a SAOL-to-C translator sfront foo.mp4sa.c  Converts MP4-SA files to a C program, that when executed, produces audio.  Runs on UNIX, Win98/NT.  Licensed under the GNU public license (GPL).  www.cs.berkeley.edu/~lazzaro/sa sfront foo.mp4 SAOL MIDI Uncompressed samples SASL sa.c  Handles SAOL, SASL, MIDI, uncompressed samples.

Sfront Benchmarks Sfront version 0.36 Machine: 450 Mhz Pentium III, 128 MB, gcc version egcs-2.91.66, -O3 optimizer Audio sample rate: 44.1 kHz for all examples MP3 compression ratio = 11

Sfront Performance Summary:  Rendering (file decoding):  Current performance: a benchmark suite of moderately complex MP4-SA streams computes in a time equivalent to the audio it generates, on a 400 Mhz Ultrasparc & 450 Mhz Pentium.  Real-time interaction:  with a MIDI keyboard with acceptable latency (~20 ms) and microphone input.

Interesting Issues:  MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good  Sampling Natural Instruments bad  If models are chosen carefully, compression ratios of 100 to 10,000 are possible.  Physical Modeling is relatively immature, but holds much promise.

Struck/Plucked Instrument Model frequency amplitude Digital resonator: Yn =  Yn-1 +  Yn-2 + Xn output M1 M2 M3 Mn striker  linear modes (resonances)attack section single strike multiple strikes Aluminum Bar Sounds Examples: struck bars, bells, drums, plucked strings Parameters: striker characteristics, resonator constants

Blown Instrument Model Examples: pipes, flutes, etc. jet y x frequency amplitude Parameters: shape of non-linear function, resonator constants non-linear element linear element (resonant modes) xy excitation tube Blown Pipe Sounds brass pipe overblown

Physical Modeling Summary  Models instrument not sound.  Advantages over traditional synthesis techniques (FM, sample-based):  Compact descriptions.  Physical parameterization leads to:  more intuitive control  lower control bandwidth  State accurate simulation leads to:  efficiency in re-excitation  emulation of otherwise missing effects  Ultimately - more realistic sounds.

Physical Modeling Summary (cont.)  Disadvantages:  potential for high computational complexity  Approaches:  PDE (partial differential equation) approach would be nice, but probably not practical.  ODE (ordinary differential equation, lumped circuit models) practical and very general. Capture essential physics.  Wave-guide filters provide a more efficient alternative in some cases.

Interesting Issues (cont.):  MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.  A new role for psychophysics: Instead of using psychophysics to squeeze bits out of a sound representation, MP4-SA decoders will use psychophysics to squeeze FLOPS out of sound computations.  Leverage spectral and temporal masking.

Interesting Issues (cont.):  MP4-SA can be used in a way similar to traditional compression except that the compression method can be ad hoc:  Frame-work for experimentation in encoding.  Hope for automatic encoding, if done in a voice specific way:  vocals  guitar  sax  and other hard-to-synthesize sounds.

Running SAOL on Conventional Architectures  Lessons Learned from SAOL development:  Temporal typing of variables has the nice side effect of marking the inner loops.  Typically, a-rate = 10X to 100X k-rate  A-rate code optimization : moving subexpressions into k-rate or i-rate.  SAOL semantics support a static heap.  No recursion, all variables sp floats, no pointers... simplifies optimization.  Other researchers (Giorgio Zoia - ETH) focusing on blocking all a-passes for an instance, reducing overhead.  Processors with SIMD FP support (Intel SSE, AMD 3DNow!) will be a good match.

Fixed-Function Hardware for SAOL Accelerators  Unlike MPEG-2 chips, DVD chips, etc., its not clear how MP4-SA can be accelerated by rolling an ASIC.  Since every MP4-SA file is a new algorithm.  Common opcodes can be hardwired and the general characteristics of typical MP4-SA files could be leveraged to specialize a conventional processor design.  But the language is only six months old; execution frequencies are not known.  Reconfigurable computing architectures might hold promise (however, MP4-SA is all floating point).

Directions / Research Opportunities  Compiler optimizations for:  SAOL and other languages with rate semantics  high-performance SIMD architectures  runtime code specialization  Runtime scheduling under limited compute resources.  SAOL programming environments.  Physical modeling.  Automatic encoding.

MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John.

Similar presentations

Presentation on theme: "MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John.

Similar presentations

Presentation on theme: "MPEG 4 Structured Audio: Algorithmic Sound for the Internet and Beyond CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John."— Presentation transcript:

Similar presentations

About project

Feedback