MPEG-4 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault.

MPEG-4 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault January 20, 2003 Further modified by Ichiro Fujinaga January 20, 2005

MPEG 4 Standard  Finalized its standardization process in 1999 (Vancouver)  Design to integrate visual and audio  Includes "natural" (recorded) and "synthetic" (synthesized) coding of audio and video

MPEG 4 Scope  Provides a set of technologies to satisfy the needs of  authors  network service providers  end users  Enables the production of content that has far greater reusability in  digital television  animated graphics  web pages

MPEG 4 Features MPEG-4 provide standardized ways to:  represent units of aural, visual or audiovisual content, called “media objects”  Natural origin  Synthetic origin  recorded with a camera or microphone, or generated with a computer  describe the composition of these objects to create compound media objects that form audiovisual scenes  multiplex and synchronize the data associated with media objects, so that they can be transported over networks providing a QoS (Quality of Service)  interact with the audiovisual scene generated at the receiver’s end

MPEG 4 Standard (audio) MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS ISO/IEC 14496-3 sec5

MPEG 4 Audio: Natural (recorded)  AAC: The Advanced Audio Coding  Originally created as an extension to MPEG-2  Provides better quality at 64 kbit/sec/channel than MP3 does at 128 kbit/sec/channel  CELP: A codebook-excited linear prediction  scheme optimized for telephone- quality transmission of speech in the range 8-32 kbps  Parametric:  A novel "harmonic vector + noise" method that allows lossy but extremely low-bitrate coding of wideband sounds down to 2 kbps/sec/ channel

MPEG 4 Audio: Synthetic (synthesized)  Structured Audio:  A downloadable synthesis method that allows producers to describe new synthesis methods as part of the bitstream  the receiver implements a reconfigurable synthesis engine and synthesizes the sound on-the-fly as the instructions are received  Text-to-Speech:  An interface to standalone TTS systems is provided, so that synthetic speech can be synchronized in multimedia presentations  No "method" of creating synthetic speech is standardized by MPEG

MPEG 4 Standard - Structured Audio Structured Audio: One “component” in the MPEG audio standard. MPEG 4 audiosystemvideo SA Natural codingSynthetic coding AACT/FCELPParametric TTS ISO/IEC 14496-3 sec5

Audio Compression Basics decoderencoder time amp Filter into Critical Bands Allocate Bits Format Bit- stream Compute Masking  Traditional Technique for Music

The Kolmogorov alternative:  Write a computer program that generates the desired audio stream.  Transmit the computer program.  To decode, execute the program.  MPEG-4 Structured Audio (MP4-SA) uses this approach.  Eric Scheirer, Editor (MIT Media Lab).  http://sound.media.mit.edu/~eds/mpeg4/ Similar to Postscript!

MP4-SA Encoding  may be a creative act: writing a program.  directly (emacs), or  indirectly (GUI, webpage)  In this case, MP4-SA is a lossless compressor.  may be automatic: given a sound, an encoder writes a program that generates the sound.  Automatic encoding is a hard in the general case. MP4-SA Decoders  are interpreters or compilers.

Key Application: Music Production  Modern music production is computer-based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Network Premium on low-bandwidth

Key Application: Music Production  Modern music production is computer-based.  Musicians enter performances into computers as control information, not audio waveforms.  Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. “The Program” synthesis algorithms effects “boxes” mixers Musical performance Mix-down control information “The Decoder” sound rendering MP4-SA Maps to Modern Music Production Ideal for collaborative productions, remixes, and... File System Standard Framework

Key Application: Music Performance  Music Performance requires dynamic control.  True interactively requires parameterized sounds.  Musicians control instruments and effects with interactive controllers.  Control could be indirect and remote (ex: games). MP4-SA Enables Networked Music Performance Network Premium on low-bandwidth “The Decoder” sound rendering + “The Decoder” sound rendering +

MPEG 4 Structured Audio:  A binary file format that encodes:  The programming language SAOL (pronounced: sail).  The musical score language SASL.  Legacy support for MIDI.  Audio sample data.  Result is normative: an MP4-SA file will sound identical on all compliant decoders. èDifferent from MIDI files.

Why SAOL and MP4-SA? Why not Java?  Musical performance have temporal structure that changes over several timescales: Sample-by-sample 10’s of usec Amplitude & timbre envelopes: 10’s of msec Note-by-note: 100’s of msec  Writing sound generation code in a conventional language results in code dominated by time-scale management.  Hard to maintain, hard to optimize.

Time management is built into SAOL.  A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion.  Work is scheduled to happen:  at the a-rate (the audio sample rate)  at the k-rate (envelope control rate)  at the i-rate (rate for new notes)  Language variables are typed as a/k/i-rate.  A language statement is scheduled based on the rate of the variables it contains.

SAOL, SASL, and Scheduling:  Sound creation in MP4-SA can be compared to a musician playing notes on an instrument.  A SAOL subprogram (called an instr or instrument) serves as the instrument.  SASL commands (called score lines) act to play notes on SAOL instruments.  Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

An example:  SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)  This SASL file plays melody on tone : 0.5 tone 0.75 52 0.25 1.5 tone 0.75 64 0.25 2.5 tone 0.5 63 0.25 3 tone 0.25 59 0.2 3.25 tone 0.25 61 0.225 3.5 tone 0.5 63 0.225 4 tone 0.5 64 0.25 5 end How long instrument runs When instance is launched Instance parameters (note number, loudness)

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone

SAOL Features  Rate semantics:  i/k/a-rate execution  Vector arithmetic:  ex: A=B+C  for i=1,n A[i]=B[i]+C[i]  All floating-point arithmetic.  Extensive build-in audio function library:  signal generators, table operators, pitch converters, filters, fft, sample rate conversion, effects,...

Sfront - a SAOL-to-C translator sfront foo.mp4sa.c  Converts MP4-SA files to a ANSI C program, that when executed, produces audio.  Runs on UNIX, Windows, MacOS.  Under Linux, supports real-time MIDI input, real-time audio input and output, and MIDI over RTP (Real Time Protocol).  www.cs.berkeley.edu/~lazzaro/sa sfront foo.mp4 SAOL MIDI Uncompressed samples SASL sa.c  Handles SAOL, SASL, MIDI, uncompressed samples.

Generator Techniques  Much of the SA standard describes a library  104 core opcodes (ex: pow(), allpass(), reverb() )  16 wave table generators (ex: harm, spline, random)  Sfront optimizes the code produced for each library element instance based on the invocation attributes  rate, width, size, constancy, integral nature of the parameters, number of paramaters

Conclusions  MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good  Sampling Natural Instruments bad  If models are chosen carefully, compression ratios of 100 to 10,000 are possible.  MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.

MPEG-4 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault.

Similar presentations

Presentation on theme: "MPEG-4 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MPEG-4 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault.

Similar presentations

Presentation on theme: "MPEG-4 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault."— Presentation transcript:

Similar presentations

About project

Feedback