Sound in Multimedia and HCI

Name: Sound in Multimedia and HCI
Uploaded: 2017-08-09T02:55:34+00:00
Duration: PTM25S38
Channel: Sherilyn Fields
Description: Sound in Multimedia and HCI

Sound in Multimedia and HCI
• The physical characteristics of sound • The psychological characteristics of sound • Quality • Sound file formats • Sound on the Internet

Sound Sound is a continuous wave that travels through the air
The wave is made up of pressure differences. Sound is detected by measuring the pressure level at a location Sound waves have normal wave properties (reflection, refraction, diffraction etc.)

Physical characteristics of sound (I)
• Sound – is a pressure wave which travels in air at 330ms – with a frequency between 20 and 20,000 Hz (variations/second) • Sound is a perceptual effect caused by a pressure wave of between 20 and 20KHz being detected at the ear.

Physical characteristics of sound (II)
• The pressure wave has two physical characteristics: • Amplitude – The measure of displacement of the air pressure wave • Frequency – Represents the number of periods in a second. – and is measured in hertz (Hz) or cycles per second.

Devices for sound generation and transduction
• For input to a computer, the pressure wave from the microphone is – converted to an analogue electrical signal (transduced) – converted to a digital signal ( digitized) (Analogue to Digital Converter), (ADC) • For output from a computer, the digitized signal is – converted to an analogue signal (Digital to Analogue Converter,) DAC – converted to a pressure wave to loudspeaker

Psychological characteristics of sound (I)
•Sound has three defining characteristics: 1- loudness – how loud (intense) the sound appears 2- pitch – can be said to be simply the pitch of a sound is determined by its frequency 3- timbre – the nature of the sound (e.g. distinctive timbre of instruments) • All sounds have a loudness, – but many are unpitched • timbre is often used as a catch -all term to describe those aspects of the sound not captured by loudness and pitch.

Measuring loudness • Our ears have (essentially) a logarithmic response (with respect to sound amplitude) – loudness depends on power: proportional to amplitude * amplitude – actually, (real, perceptual) loudness is difficult to compute • Decibels – the ratio of the power of two signals is measured in decibels (dB)

Psychological characteristics of sound (II)
• Direction – Sound is normally generated by some source • and normally there are lots of concurrent sources – and each source has some location – so that the sound from it is perceived to come from some direction

Basically, brain identifies source of a sound on the basis of differences in intensity and phase between the signals received from the Left (L) and Right (R) ears Earlier & louder L R

• Sound recorded using a synthetic head

Psychological characteristics of sound (III)
• Distance – we can often also tell (roughly) the distance of the sound source – this comes partly from the loudness of the sound, and partly from other characteristics of the sound • Physical correlates of distance are – reflections and spectral shape

Psychological characteristics of sound (IV)
• Associations – many sounds have associations – these may be obvious (and usable)… • breaking glass • scream • door slamming – or may be personal (and different for each individual) • dog barking

Quality • Whenever sound is transduced, digitized, or reconverted to analogue, the original signal is altered in some way. • Digitizing sound – sound is digitized using an analogue to digital converter (ADC) – sound is converted back to analogue using a digital to analogue converter (DAC) – Both forms of conversion can introduce alterations in the sound • but the ADC is the more problematic. • Analogue to digital conversion has two parameters: – sampling rate (determined by the sampling process) – sample size (determined by the quantization process)

Sampling and Quantizing
To include sound in a multimedia application, the sound waves must be converted from analog to digital form This conversion is called sampling – every fraction of a second a sample of the sound is recorded in digital bits Sampling – process of acquiring an analogue signal Quantizing – conversion of held signal into sequence of digital values

Sampling and Quantization
3-bit quantization Sampling rate: Number of samples per second (measured in Hz) 3-bit quantization gives 8 possible sample values Here is a demonstration of the concepts of sampling and quantizaion. Again we consider a sinusoidal sound wave and sample it at discrete points. We notice that the resulting sampled values could be of arbitrary precision. Therefore we use a 3-bit quantization to discretize these values to finite values. One important observation to make is the process of quantization gives us discrete finite precision values which are needed for representation on a computer. Another important question to ask ourselves is: How often should we sample the signal so as achieve a faithful digital representation of the analog signal. This question will be answered by the Nyquist Sampling theorem which we will look at momentarily.

Sampling rate • Sampling rate describes how frequently the analogue signal is converted (i.e. analogue signal’s value is measured at discrete intervals) – Normally measured in samples/second • conversion is done regularly, at a fixed number of samples/second – sampling rate must be at least twice the highest frequency of interest in the signal • Nyquist sampling theorem • otherwise aliasing can occur - see later

Nyquist Theorem The sampling frequency determines the limit of audio frequencies that can be reproduced digitally. One of the most important rules of sampling is called the Nyquist Theorem, which states that the highest frequency which can be accurately represented is less than one-half of the sampling rate. So, if we want a full 20 kHz audio bandwidth, we must sample at least twice that fast, i.e. over 40 kHz. If we don't, bad things happen. Here's our example sine wave

The dashed vertical lines are sample intervals, and the blue dots are the crossing points - the actual samples taken by the conversion process. The sampling rate here is below the Nyquist frequency, so when we reconstruct the waveform we see the problem quite readily: Consider a sine wave For Lossless digitization, the sampling rate should be at least twice the maximum frequency responses

What happens if sampling rate not high enough?
A high frequency signal sampled at too low a rate looks like … … a lower frequency signal.

Application of Nyquist Theorem
Nyquist theorem is used to calculate the optimum sampling rate in order to obtain good audio quality. The CD standard sampling rate of Hz means that the waveform is sampled times per sec. Digitally sampled audio has a bandwidth of (20 Hz - 20 KHz). By sampling at twice the maximum frequency (40 KHz) we could have achieved good audio quality.

Sample size (II) • Quantization may be linear or logarithmic
– Linear: levels to which a signal is quantized are linearly spaced. – logarithmic: provides more resolution at lower levels - idea is to use non -linearly spaced quantization levels, with higher levels spaced further apart levels, than the low ones, so quieter sounds are represented in greater detail than louder ones.

Time — measure amp. at each tick of sample clock
3-bit Quantization A 3-bit binary (base 2) number has 23 = 8 values. 1 2 3 4 5 6 7 Amplitude Time — measure amp. at each tick of sample clock A rough approximation

4-bit Quantization A 4-bit binary number has 24 = 16 values.
2 4 6 8 10 12 14 Amplitude Time — measure amp. at each tick of sample clock A better approximation

16-bit Sample Word Length
A 16-bit integer can represent 216, or 65,536, values (amplitude points). We typically use signed 16-bit integers, and center the 65,536 values around 0. 32,767 -32,768

Audio Sampling Variables
Three main criteria: How many samples? OR “sampling rate” How much data per sample? OR “bit depth” How many channels sampled?

Audio Quality Factors involved:
– The quality of the original audio source – The quality of the capture device and supporting hardware – The characteristics used for capture: frequency, data rate (amplitude), number of channels – The capability of the playback environment

How Many Samples? Audio “Sampling Rates”
Digital Video CD Quality Stereo FM Radio AM Radio Telephone

Other Sample Rates Sample Rate Less 8000 Quality Telephone 11000
AM Radio 16000 FM Radio 22050 per Stereo

How Much Data per Sample?
Common Sampling “Bit Depth” 8 bits of data per sample 16 bits of data

How Many Channels Sampled
Number of Channels Stereo (2 channels) Mono (1 channel) Multiple tracks

Audio Sampling Variables
Sample Rate Bit Depth Number of Channels Record Settings Sound Quality File Size

Audio Record Rate More Audio Samples Larger Files

File size is determined by a combination of:
Audio File Size File size is determined by a combination of: Bit Depth Sample Rate No of Channels

Audio File Size File size is determined by a combination of:
Sample Rate Bit Depth No of Channels Length in Minutes

Variables of concern: Audio File Size Various Sample Rates
8 bit or 16 bit Various Sample Rates Stereo or Mono Length in Minutes

Audio File Size CD characteristics…
- Sampling rate: 44,100 samples per second (44.1 kHz) - Sample word length: 16 bits (i.e., 2 bytes) per sample - Number of channels: 2 (stereo) How big is a 5-minute CD-quality sound file?

Audio File Size How big is a 5-minute CD-quality sound file?
44,100 samples * 2 bytes per sample * 2 channels = 176,400 bytes per second 5 minutes * 60 seconds per minute = 300 seconds 300 seconds * 176,400 bytes per second = 52,920,000 bytes = 50.5 megabytes (MB)

Sound compression (I) Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are typically referred to as audio codecs. As with other specific forms of data compression, there exist many "lossless" and "lossy" algorithms to achieve the compression effect.

• Requirements are less stringent than for video
• Compression of sound data requires different techniques from those for graphical data • Requirements are less stringent than for video – data rate for CD quality audio is much less than for video, but still exceeds the capacity of dial -up Internet connections •Data rate is 44100*2*2 bytes/sec=176400bytes/s=1.41Mbits/sec – 3 minute song recorded in stereo occupies 31Mbytes

• Sound is difficult to compress using lossless methods
– complex and unpredictable nature of sound waveforms • Different requirements depending on the nature of the sound – speech – music – natural sounds natural – …and on nature of the application

Sound compression (II)
• A simple lossless compression method is to record the length of a period of silence – no need to record 44,100 samples of value zero for each second of silence – form of run-length encoding – in reality this is not lossless, as “silence” rarely corresponds to sample values of exactly zero; rather some threshold value is applied

• So what can we discard from sound data?
• Difference between how we perceive sounds and images results in different lossy compression techniques for the two media – high spatial frequencies can be discarded in images – high sound frequencies, however, are highly significant • So what can we discard from sound data?

1- Companding • Non-linear quantization developed by telephone companies – known as companding (compressing/expanding) • mu-law ( (µ-law)) • A-law • Telephone signals are sampled at 8KHz. At this rate, µ-law compression is able to squeeze a dynamic range of 12 bits into just 8 bits, giving a one -third reduction in data-rate.

2- Pulse-code Modulation (PCM)
Is a digital representation of an analog signal where the magnitude of the signal is sampled regularly at uniform intervals, then quantized to a series of symbols in a digital (usually binary) code. PCM has been used in digital telephone systems and is also the standard form for digital audio in computers and the compact disc red book format .

3- Linear Predictive Coding
Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.

Resource Interchange File Format (RIFF)
The Resource Interchange File Format (RIFF), a tagged file structure, is a general specification upon which many file formats can be defined. The main advantage of RIFF is its extensibility; file formats based on RIFF can be future-proofed, as format changes can be ignored by existing applications. The RIFF file format is suitable for the following multimedia tasks: • Playing back multimedia data • Recording multimedia data • Exchanging multimedia data between applications and across platforms

Sound files and formats
It is important to distinguish between a file format and a codec. Though most audio file formats support only one audio codec, a file format may support multiple codecs, as AVI does. There are three major groups of audio file formats: common formats, such as WAV, AIFF and AU. formats with lossless compression, such as (filename extension APE), and lossless Windows Media Audio (WMA). formats with lossy compression, such as MP3, lossy Windows Media Audio (WMA) and AAC.

There are many sound file formats
- Windows PCM waveform (.wav), a form of RIFF specification; basically uncompressed data. - Windows ADPCM waveform (.wav), another form of RIFF file, but compressed to 4 bits/channel. - CCITT mu-law (A-law) waveforms (.wav), another form using 8 bit logarithmic compression. - NeXT/SUN file format (.snd , or .au), actually many different varieties: header followed by data, data may be in many forms, linear, or mu-law, etc.

- RealAudio (.ra ), used for streaming audio; a compressed format.
- MPEG format (includes MP3) , has various different forms of compression - QuickTime, AVI and Shockwave Flash, can include audio as well as video

Examples of sizes • A second stretch of sound, digitized at samples/second, 16 bits, mono takes:- – bytes as a . snd – bytes as an uncompressed .wav – bytes as A-law .wav – bytes as an compressed .wav – 5860 bytes as RealAudio – bytes as ASCII text • …and if you listen hard you can hear the difference!

(Music) • To store and transmit natural sounds it is generally necessary to use digitized recordings of the real sounds • Music can also be specified: musical scores • We can send a piece of music to someone as either – a recording of an actual performance – or some notation of the score, provided the receiver has some means or some recreating the music from the score e.g. they can play it on a piano etc • This is a kind to bitmapped versus vector -based graphics

MIDI (I) • Musical Instruments Digital Interface (MIDI) • Enables people to use multimedia computers and electronic musical instruments to create, enjoy and learn about music • Resynthesis instead of reproduction – like a player piano, instead of a CD player • Consists of commands which result in notes being played (synthesised)

MIDI: Data Format Information traveling through the hardware is encoded in MIDI data format. The encoding includes note information like beginning of note, frequency and sound volume; upto 128 notes The MIDI data format is digital The data are grouped into MIDI messages Each MIDI message communicates one musical event between machines. An event might be pressing keys, moving slider controls, 10 mins of music encoded in MIDI data format is about 200 Kbytes of data. (compare against CD-audio!)

Table of common formats and software
Company Brand Name Audio formats Comments Website Real Audio (Real Audio Networks) .ram Audio/video streaming software, server/client programs. Developers must pay fee. Quicktime (Apple) .mov Quicktime Player 6.5 is free. Quicktime Pro costs money and Quicktime Streaming Server is free with OS X. Windows Media Player (Microsoft) .wma Audio/video streaming software, server/client programs. Developers must pay fee. Shoutcast client and server (Nullsoft) .mp3 Audio/video streaming, using WinAmp client. Open Source (generally - FREE). Icecast client and server (Icecast.org) .mp3 .ogg Audio/video streaming software, server/client programs. Open source (FREE). Linux only

Sound on the Internet Your first consideration when using sound on the Internet is file size Uncompressed files can be very large A 10 second recording of an audio CD can be as large as 2MB

Sound Tips for the Internet
Appropriate Use Consider the appropriateness of using sound. Some sounds are content-related, such as hearing a foreign phrase pronounced. Other sounds are for effect, such as creating a mood or setting a scene. Avoid using sound when there is no compelling benefit. Quality Start with the highest-quality sound available and reduce the file size by converting the audio file to a compressed format. When possible, avoid using free sound clips available from the Internet. These are often of poor quality and overused. Cost considerations When recording audio files, it may be cost-prohibitive to contract with a recording studio and hire professional talent. Investing in reasonably high-end equipment (such as a sound card, microphone, and recording and editing software), however, will prove worthwhile.

Sound Tips for the Internet
Alternative Methods Consider using sound and still images as an alternative to video to reduce file sizes. It may be just as effective to show a photograph of a speaker and play the sound file of the speech as it is to show a video of a “talking head.” Streaming Consider streaming the audio, especially for large files. User Control If appropriate, provide a way to give the user some control over the audio. Consider allowing the user to skip a sound clip or adjust the volume. This issue is especially important if a musical introduction is played when the user first enters a Web site. The second time visiting the site, the user may not want to hear the musical introduction.

Sound in Multimedia and HCI

Similar presentations

Presentation on theme: "Sound in Multimedia and HCI"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sound in Multimedia and HCI

Similar presentations

Presentation on theme: "Sound in Multimedia and HCI"— Presentation transcript:

Similar presentations

About project

Feedback