Week 8: Audio Processing 1
Light and sound are both transmitted in waves 2
The human ear can hear between about 12 Hz and 20,000 Hz The higher the frequency of the wave, the higher the frequency of the note Note that the A an octave above concert A (A440) has twice the frequency Each half-step is an increase in the frequency by a factor of about 1.06 NoteFrequency A440 B C D E F G A880 3
We can take a sound: And reproduce that sound at double the frequency (speeding it up): Notice that we have to add twice as much information to have the sound fill the same amount of time 4
The amplitude of a wave is the distance to its peak (measured by its y-value) In sound, amplitude is a measure of volume The larger the amplitude, the louder the sound 5
We can take a sound: And make the sound with half the amplitude: The frequency is exactly the same, but the sound is half as loud 6
Something that looks like a smooth sine wave is called a pure tone No real instruments play anything like that Even the purest real sound has overtones and harmonics Real sound is the result of many messy waves added together: 7
On a computer, we cannot record a wave form directly As usual, we have to figure out a way to store a wave as a series of numbers We are going to use these numbers to approximate the heights of the wave at various points 8
Hertz (Hz) is a unit that means a number of times per second We are going to break down the wave into lots of slices We are going to have 44,100 slices in a second Thus, we are slicing at 44,100 Hz 9
We slice up a wave and record the height of the wave Each height value is called a sample By getting 44,100 samples per second, we get a pretty accurate picture of the wave 10
There are many different formats for sampling audio In our system, each sample will be recorded as a double The minimum value of a sample will be -1.0 and the maximum value of a sample is 1.0 A series of samples with value 0.0 represents silence Our samples will be stored in an array 11
Audio data on computers is sometimes stored in a WAV file A WAV file is much simpler than an MP3 because it has no compression Even so, it contains two channels (for stereo) and can have many different sample rates and formats for recording sound The StdAudio class lets you read and write a WAV file easily and always deal with a single array of sound, sampled at 44,100 Hz 12
Everything you’d want to do with sound: To do interesting things, you have to manipulate the array of samples Make sure you have StdAudio.java in your directory before trying to use it MethodUse static double[] read(String file) Read a WAV file into an array of double s static void save(String file, double[] input) Save an array of double s (samples) into a WAV file static void play(String file) Play a WAV file static void play(double[] input) Play an array of double s (samples) 13
Let’s load a file into an array: If the song has these samples: Perhaps sample will contain: String file = “song.wav”; double[] sample = StdAudio.read(file);
With the audio samples loaded into the array named sample, we can play them as follows: StdAudio.play(sample); 15
Or, we could generate sound from scratch with StdAudio This example from the book creates 1 second of the pitch A440: StdAudio.SAMPLE_RATE is double[] sound = new double[StdAudio.SAMPLE_RATE + 1]; for( int i = 0; i < sound.length; i++ ) sound[i] = Math.sin(2 * Math.PI * i * 440 / StdAudio.SAMPLE_RATE); StdAudio.play(sound); 16
The book provides a short program called Play That Tune (pp. 150 and 205) that will generate a sequence of notes according to an input file It’s a sort of software synthesizer If you know the notes for a song you’d like to play and their durations, you can create a file to play it with PlayThatTune.java java PlayThatTune < song.txt 17
As you can see, the file is just a list of numbers giving pitch and duration Although ugly, this form of bookkeeping takes up virtually no space compared to WAV files Drawbacks: You can only play one note at a time All notes have a pure tone with a little bit of harmonics It takes forever to type out an input file No dynamics (volume control) 18
Remember, sound is a wave We are going to start playing with the waveform now to get different effects Phase is a property of a wave that can be determined by its starting position You can think of phase as the alignment of the wave 19
What happens if we invert a sound wave? For example, we turn this wave: Into this wave: 20
How would we invert the phase of a wave in code? And what does that crazy, upside-down sound sound like? Exactly the same The human ear is not sensitive to phase double[] sound = StdAudio.read( file ); for( int i = 0; i < sound.length; i++ ) sound[i] = -sound[i]; StdAudio.play(sound); 21
Phase is still worth knowing about If you hear two copies of the same sound, but one is 180° out of phase, the sounds will cancel each other out This can happen if you hook up one of your stereo speakers backwards 22
To reverse a sound, we simply send the wave form backwards For example, we take this sound: And turn it into this sound: 23
How would we reverse a sound in code? What does it sound like? Sort of like a bizarre foreign language double[] sound = StdAudio.read( file ); int start = 0; int end = sound.length - 1; while( start < end ) { double temp = sound[start]; sound[start] = sound[end]; sound[end] = temp; start++; end--; } StdAudio.play(sound); 24
Recall that the amplitude of a wave is the distance to its peak (measured by its y-value) In sound, amplitude is a measure of volume The larger the amplitude, the louder the sound 25
We can take a sound: And make the same sound with half the amplitude: The frequency is exactly the same, but the sound is half as loud 26
How would we half the volume of a sound in code? This is half the amplitude. Is this exactly half the volume? No… this stuff gets complicated Different frequencies with the same amplitude are actually perceived as different loudnesses But, it’s close enough double[] sound = StdAudio.read( file ); for( int i = 0; i < sound.length; i++ ) sound[i] = 0.5*sound[i]; StdAudio.play(sound); 27
How would we double the volume of a sound in code? Be careful! Remember that the max is 1.0 and the min is -1.0 If you go outside of that range, you should limit the values double[] sound = StdAudio.read( file ); for( int i = 0; i < sound.length; i++ ) { sound[i] = 2*sound[i]; if( sound[i] > 1.0 ) sound[i] = 1.0; else if( sound[i] < -1.0 ) sound[i] = -1.0; } StdAudio.play(sound); 28
We can take a sound: And shrink that sound in half the time (while also doubling its frequency): 29
How would we double the speed of a sound in code? We are just taking every other sample We are throwing out half the information double[] sound = StdAudio.read( file ); double[] speed = new double[sound.length/2]; for( int i = 0; i < speed.length; i++ ) speed[i] = sound[2*i]; StdAudio.play(speed); 30
We can take a sound: And stretch that sound to double the time (while also cutting its frequency in half): 31
How would we stretch that sound to double in code? Often times, this can sound terrible We are doubling the length of the sound, but we have no extra information We are just filling in holes in the samples with copies double[] sound = StdAudio.read( file ); double[] slow = new double[sound.length*2]; for( int i = 0; i < slow.length; i++ ) slow[i] = sound[i/2]; StdAudio.play(slow); 32
Integers are easy You can apply the same ideas to non-integer speed-ups and slow-downs Smarter algorithms, like the ones used by professionals, may do some averaging or other tricks to retain sound quality 33
To add noise to a waveform, add small random numbers to the wave Make sure that the random numbers are both positive and negative (with a mean of 0) You can turn this wave: Into this wave: 34
How can we turn this wave… Into this wave? That is much harder! 35