An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003.

An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003

Xiphophorus zXiphophorus is a freshwater fish genus comprised of 23 species. zSince the 1920's its been known that one could make hybrids between the different species easily. In some cases, one simply had to place one Xiphophorus species next to another in an aquarium, and they would reproduce.

XIPH.COM zXiphophorus is a non-profit organization responsible for the Ogg project. zXiphophorus is GPL. zAll cool companies have an X to start their name.

What Is Ogg Vorbis zThe Ogg project is an open-source alternative to proprietary and patented codecs for digital media (for both audio and video). zThe Vorbis project is responsible for the creation of a perceptual audio encoder similar to famous,inherently evil, proprietary codecs popularized by global, illegal file sharing.

It Is Not MP3 zVorbis is in the same category as yMPEG-4 (AAC) zAnd similar to, but higher performance than yMPEG-1/2 audio layer 3 yMPEG-4 audio (TwinVQ) yWMA - Windows Media Audio yPAC

Classification zVorbis I yVorbis I is a forward-adaptive monolithic transform CODEC based on the Modified Discrete Cosine Transform.  The codec is structured to allow addition of a hybrid wavelet filter bank in Vorbis II to offer better transient response and reproduction using a transform better suited to localized time

Packets zVorbis uses free-form packets that have no minimum size, maximum size, or fixed/expected size. Packets are designed that they may be truncated (or padded) and remain decodable.

Error Detection zVorbis provides none of its own protection against errors. z It is solely a method of accepting input audio, dividing it into individual frames and compressing these frames into raw, unformatted 'packets'.

ATH – Absolute Threshold of Hearing yMost codecs assume volume is fixed during playback. Vobis assumes that volume can be adjusted.

Tone Masking zTone masking is when louder frequencies mask out adjacent quieter ones. zMost codes use a psychoacoustics model to calculate what’s left as best as possible in given bit-rate limits. zVorbis approximates the same thing using as many bits as it takes.

Coupling zMost sounds consist of many channels and have redundancy between these channels. This is exploited to lower the bit-rate if the channels are encoded in some joint representation. zThe simplest example is to encode the average and the difference between channels (for a stereo sound) – this is called mid/side representation and it requires fewer bits for sections that are close to mono.

Channel Support zVorbis supports up to 255 channels. zAt the moment the encoder knows to use coupling for 2-channel files only, but eventually it will scale.

Vector Quantization zVector Quantization (VQ) is a lossy data compression method where vectors are rounded off into encoding regions. zBasically if you group together numbers describing different channels, your channels become automatically coupled (normally a group would be picked from data describing a single channel, so channels would be approximated independently).

Vector Quantization… zThe process of VQ introduces some vector quantization noise. The difference between the approximation (a limited number of these can be chosen) and the original group of numbers. zAll codecs suffer from quantization problems. VQ should suffer less.

Memory Usage zThe vector codebooks used in the first stage of decoding are packed, in their entirety into the Vorbis bit-stream headers. zIn packed form, these codebooks occupy only a few kilobytes; The extent to which they are pre-decoded into a cache is the dominant factor in decoder memory usage.

Following the Standard zAny file that follows the decoding standard, regardless of encoding method follows the standard.

Headers zIdentification Header yThe identification header identifies the bitstream as Vorbis, Vorbis version, and the simple audio characteristics of the stream such as sample rate and number of channels. zComment Header yThe comment header includes user text comments ["tags"] and a vendor string for the application/library that produced the bitstream. zSetup Header yThe setup header includes extensive CODEC setup information as well as the complete VQ and Huffman codebooks needed for decode.

Decoding Procedure zThe decoding and synthesis procedure for all audio packets is fundamentally the same. z1. decode packet type flag z2. decode mode number z3. decode window shape [long windows only] z4. decode floor z5. decode residue into residue vectors z6. inverse channel coupling of residue vectors z7. generate floor curve from decoded floor data

Decoding Procedure... z8.compute dot product of floor and residue, producing audio spectrum vector z9.inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I z10.overlap/add left-hand output of transform with right-hand output of previous frame z11.store right hand-data from transform of current frame for future lapping. z12.if not first frame, return results of overlap/add as audio result of current frame Rearrangement of the synthesis arithmetic is possible.

Controversy zThe entire probability model of the codec, the Huffman and VQ codebooks, is packed into the bitstream header along with extensive CODEC setup parameters (often several hundred fields). zIt’s impossible to embed a simple frame type flag in each audio packet, or begin decode at any frame in the stream without having previously fetched the codec setup header. zVorbis can initiate decode at any arbitrary packet within a bitstream so long as the codec has been initialized/setup with the setup headers.

Window Shape Decode zVorbis frames use one of two PCM sample sizes specified during codec setup. In Vorbis I, legal frame sizes are powers of two from 64 to 8192 samples. Aside from coupling, Vorbis handles channels as independent vectors and these frame sizes are in samples per channel.

Overlapping Windows zVorbis uses an overlapping transform, namely the MDCT, to blend one frame into the next, avoiding most inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added. The window shape assures seamless reconstruction.

Dealing with Windows And slightly more complex in the case of overlapping unequal sized windows:

Inverse Monolithic Transform zThe audio spectrum is converted back into time domain PCM audio via an inverse modified discrete cosine transform (MDCT). A detailed description of the MDCT is available in the paper The use of multirate filter banks for coding of high quality digital audio_, by T. Sporer, K. Brandenburg and B. Edler.

An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003.

Similar presentations

Presentation on theme: "An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003.

Similar presentations

Presentation on theme: "An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003."— Presentation transcript:

Similar presentations

About project

Feedback