MPEG-4 Structured Audio CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified.

MPEG-4 Structured Audio CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault January 20, 2003

MPEG 4 Standard Structured Audio: One “component” in the MPEG audio standard. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS ISO/IEC 14496-3 sec5

Audio Compression Basics  How well does this work?  “Perceptually Lossless” : 10X-20X reduction  MP3, Dolby AC3, …  True Lossless: 2.5X reduction  Shorten, T. Robinson (Cambridge University) decoderencoder time amp Filter into Critical Bands Allocate Bits Format Bit- stream Compute Masking  Traditional Technique for Music

The Kolmogorov alternative:  Write a computer program that generates the desired audio stream.  Transmit the computer program.  To decode, execute the program.  MPEG-4 Structured Audio (MP4-SA) uses this approach.  Eric Scheirer, Editor (MIT Media Lab).  http://sound.media.mit.edu/~eds/mpeg4/ Similar to Postscript!

MP4-SA Encoding  may be a creative act: writing a program.  directly (emacs), or  indirectly (GUI, webpage)  In this case, MP4-SA is a lossless compressor.  may be automatic: given a sound, an encoder writes a program that generates the sound.  Automatic encoding is a hard in the general case. MP4-SA Decoders  are interpreters or compilers.

Key Application: Music Production  Modern music production is computer-based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Network Premium on low-bandwidth

Key Application: Music Production  Modern music production is computer-based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Ideal for collaborative productions, remixes, and... File System Standard Framework

Key Application: Music Performance  Music Performance requires dynamic control.  True interactively requires parameterized sounds.  Musicians control instruments and effects with interactive controllers.  Control could be indirect and remote (ex: games). MP4-SA Enables Networked Music Performance Network Premium on low-bandwidth “The Decoder” sound rendering + “The Decoder” sound rendering +

MPEG 4 Structured Audio:  A binary file format that encodes:  The programming language SAOL (pronounced: sail).  The musical score language SASL.  Legacy support for MIDI.  Audio sample data.  Result is normative: an MP4-SA file will sound identical on all compliant decoders. èDifferent from MIDI files.

Why SAOL and MP4-SA? Why not Java?  Musical performance have temporal structure that changes over several timescales: Sample-by-sample 10’s of usec Amplitude & timbre envelopes: 10’s of msec Note-by-note: 100’s of msec  Writing sound generation code in a conventional language results in code dominated by time-scale management.  Hard to maintain, hard to optimize.

Time management is built into SAOL.  A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion.  Work is scheduled to happen:  at the a-rate (the audio sample rate)  at the k-rate (envelope control rate)  at the i-rate (rate for new notes)  Language variables are typed as a/k/i-rate.  A language statement is scheduled based on the rate of the variables it contains.

SAOL, SASL, and Scheduling:  Sound creation in MP4-SA can be compared to a musician playing notes on an instrument.  A SAOL subprogram (called an instr or instrument) serves as the instrument.  SASL commands (called score lines) act to play notes on SAOL instruments.  Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

An example:  SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)  This SASL file plays melody on tone : 0.5 tone 0.75 52 0.25 1.5 tone 0.75 64 0.25 2.5 tone 0.5 63 0.25 3 tone 0.25 59 0.2 3.25 tone 0.25 61 0.225 3.5 tone 0.5 63 0.225 4 tone 0.5 64 0.25 5 end How long instrument runs When instance is launched Instance parameters (note number, loudness)

SAOL Features  Rate semantics:  i/k/a-rate execution  Vector arithmetic:  ex: A=B+C  for i=1,n A[i]=B[i]+C[i]  All floating-point arithmetic.  Extensive build-in audio function library:  signal generators, table operators, pitch converters, filters, fft, sample rate conversion, effects,...

Spectrum of implementations Startup delay Execution performance ISO/IEC 14496-3 sec 5, reference implementation Zoia & Alverti, EPFL, ICASSP 2001 Significant development & maintenance complexity Directly Interpret Translate to VM, Interpret VM code Compile to machine code Translate to C, compile C code

Sfront - a SAOL-to-C translator sfront foo.mp4sa.c  Converts MP4-SA files to a ANSI C program, that when executed, produces audio.  Runs on UNIX, Windows, MacOS.  Under Linux, supports real-time MIDI input, real-time audio input and output, and MIDI over RTP.  www.cs.berkeley.edu/~lazzaro/sa sfront foo.mp4 SAOL MIDI Uncompressed samples SASL sa.c  Handles SAOL, SASL, MIDI, uncompressed samples.

Generator Techniques  Much of the SA standard describes a library  104 core opcodes (ex: pow(), allpass(), reverb() )  16 wave table generators (ex: harm, spline, random)  Sfront optimizes the code produced for each library element instance based on the invocation attributes  rate, width, size, constancy, integral nature of the parameters, number of paramaters

Interesting Issues:  MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good  Sampling Natural Instruments bad  If models are chosen carefully, compression ratios of 100 to 10,000 are possible.  Physical Modeling is relatively immature, but holds much promise.

Interesting Issues (cont.):  MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.  A new role for psychophysics: Instead of using psychophysics to squeeze bits out of a sound representation, MP4-SA decoders will use psychophysics to squeeze FLOPS out of sound computations.  Leverage spectral and temporal masking.

MPEG-4 Structured Audio CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified.

Similar presentations

Presentation on theme: "MPEG-4 Structured Audio CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MPEG-4 Structured Audio CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified.

Similar presentations

Presentation on theme: "MPEG-4 Structured Audio CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified."— Presentation transcript:

Similar presentations

About project

Feedback