Working with Sounds Barb Ericson College of Computing Georgia Institute of Technology
Sound Processing Goals: Give you the basic understanding of audio processing, including psycho-acoustics, Identify some interesting examples to use.
How sound works: Acoustics, the physics of sound Sounds are waves of air pressure Sound comes in cycles The frequency of a wave is the number of cycles per second (cps), or Hertz (Complex sounds have more than one frequency in them.) The amplitude is the maximum height of the wave
Live demos here! Use the Squeak MediaTools to see real sound patterns. Try to bring in few musical instruments
Volume and pitch: Psychoacoustics, the psychology of sound Our perception of volume is related (logarithmically) to changes in amplitude If the amplitude doubles, it’s about a 3 decibel (dB) change. A decibel is a ratio between two intensities: 10 * log 10 (I 1 /I 2 ) As an absolute measure, it’s in comparison to threshold of audibility 0 dB can’t be heard. Normal speech is 60 dB. A shout is about 80 dB Our perception of pitch is related (logarithmically) to changes in frequency Higher frequencies are perceived as higher pitches We can hear between 5 Hz and 20,000 Hz (20 kHz) A above middle C is 440 Hz
Digitizing Sound: How do we get that into numbers? in calculus you estimate the curve by creating rectangles We can do the same to estimate the sound curve Analog-to-digital conversion (ADC) will give us the amplitude at an instant as a number: a sample How many samples do we need?
Nyquist Theorem We need twice as many samples as the maximum frequency in order to represent (and recreate, later) the original sound. The number of samples recorded per second is the sampling rate If we capture 8000 samples per second, the highest frequency we can capture is 4000 Hz That’s how phones work If we capture more than 44,000 samples per second, we capture everything that we can hear (max 22,000 Hz) CD quality is 44,100 samples per second
Digitizing sound in the computer Each sample is stored as a number (two bytes) What’s the range of available combinations? 16 bits, 2 16 = 65,536 But we want both positive and negative values To indicate compressions and rarefactions. What if we use one bit to indicate positive (0) or negative (1)? That leaves us with 15 bits 15 bits, 2 15 = 32,768 One of those combinations will stand for zero We’ll use a “positive” one, so that’s one less pattern for positives Each sample can be between -32,768 and 32,767
Basic Sound Functions makeSound(filename) creates and returns a sound object, from the WAV file at the filename play(sound) makes the sound play (but doesn’t wait until it’s done) blockingPlay(sound) waits for the sound to finish We’ll learn more later like getSample and setSample
Working with sounds We’ll use pickAFile and makeSound as we have before. But now we want.wav files We’ll use getSamples to get all the sample objects out of a sound We can also get the value at any index with getSampleValueAt Sounds also know their length (getLength) and their sampling rate (getSamplingRate) Can save sounds with writeSoundTo(sound,”file.wav”)
Recipe to Increase the Volume def increaseVolume(sound): for sample in getSamples(sound): value = getSample(sample) setSample(sample,value * 2) Using it: >>> f=getMediaPath("test.wav") >>> s=makeSound(f) >>> increaseVolume(s) >>> play(s) >>> writeSoundTo(s,getMediaPath("testLouder.wav"))
Decreasing the volume def decreaseVolume(sound): for sample in getSamples(sound): value = getSample(sample) setSample(sample,value * 0.5) This works just like increaseVolume, but we’re lowering each sample by 50% instead of doubling it.
Maximizing volume How do we get maximal volume? It’s a three-step process: First, figure out the loudest sound (largest sample). Next, figure out a multiplier needed to make that sound fill the available space. We want to solve for x where x * loudest = So, x = 32767/loudest Finally, multiply the multiplier times every sample
Maxing (normalizing) the sound def normalize(sound): largest = 0 for s in getSamples(sound): largest = max(largest,getSample(s)) multiplier = / largest print "Largest sample value in original sound was", largest print "Multiplier is", multiplier for s in getSamples(sound): louder = multiplier * getSample(s) setSample(s,louder)
Increasing volume by sample index def increaseVolumeByRange(sound): for sampleIndex in range(getLength(sound)): value = getSampleValueAt(sound,sampleIndex) setSampleValueAt(sound,sampleIndex,value * 2) This really is the same as: def increaseVolume(sound): for sample in getSamples(sound): value = getSample(sample) setSample(sample,value * 2)
Recipe to play a sound backwards (Trace it!) def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0, getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest Return the processed sound for further use in the function that calls playBackward Work backward Start at the last Valid index
How does this work? We make two copies of the sound The srcSample starts at the end, and the destSample goes from 0 to the end. Each time through the loop, we copy the sample value from the srcSample to the destSample Note that the destSample is increasing by 1 each time through the loop, but srcSample is decreasing by 1 each time through the loop def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0, getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest
def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0,getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest Starting out (3 samples here) sourcedest You are here
Ready for the copy srcSampledestSample sourcedest You are here def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0, getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest
Do the copy sourcedest srcSampledestSample def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0, getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest You are here
def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0, getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest Ready for the next one? sourcedest srcSampledestSample You are here
Moving them together sourcedest srcSampledestSample You are here def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0, getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest
How we end up sourcedest srcSampledestSample def playBackward(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = getLength(source)-1 for destSample in range(0, getLength(dest)): srcVolume = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, srcVolume) srcSample = srcSample - 1 return dest You are here
Recipe for halving the frequency of a sound def half(filename): source = makeSound(filename) dest = makeSound(filename) srcSample = 0 for destSample in range(0,getLength(dest)): volume = getSampleValueAt(source, int(srcSample) ) setSampleValueAt(dest, destSample, volume) srcSample = srcSample play(dest) return dest This is how a sampling synthesizer works! Here are the piece that do it
Changing pitch of sound vs. changing picture size def half(filename): source = makeSound(filename) target = makeSound(filename) srcSample = 0 for destSample in range(0, getLength(dest)): vol = getSampleValueAt( source, int(srcSample)) setSampleValueAt(dest, destSample, vol) srcSample = srcSample play(dest) return dest def copyBarbsFaceLarger(): barbf=getMediaPath("barbara.jpg") barb = makePicture(barbf) canvasf = getMediaPath("7inX95in.jpg") canvas = makePicture(canvasf) sourceX = 45 for targetX in range(100,100+((200-45)*2)): sourceY = 25 for targetY in range(100,100+((200-25)*2)): px = getPixel(barb,int(sourceX),int(sourceY)) color = getColor(px) setColor(getPixel(canvas,targetX,targetY), color) sourceY = sourceY sourceX = sourceX show(barb) show(canvas) return canvas
Both of them are sampling Both of them have three parts: A start where objects are set up A loop where samples or pixels are copied from one place to another To decrease the frequency or the size, we take each sample/pixel twice In both cases, we do that by incrementing the index by 0.5 and taking the integer of the index Finishing up and returning the result 1 2 3
Recipe to double the frequency of a sound def double(filename): source = makeSound(filename) target = makeSound(filename) targetIndex = 0 for sourceIndex in range(0, getLength(source), 2): setSampleValueAt( target, targetIndex, getSampleValueAt( source, sourceIndex)) targetIndex = targetIndex + 1 #Clear out the rest of the target sound -- it's only half full! for secondHalf in range( getLength( target)/2, getLength( target)): setSampleValueAt(target,targetIndex,0) targetIndex = targetIndex + 1 play(target) return target Here’s the critical piece: We skip every other sample in the source!
What happens if we don’t “clear out” the end? Try this out! def double(filename): source = makeSound(filename) target = makeSound(filename) targetIndex = 0 for sourceIndex in range(0, getLength(source), 2): setSampleValueAt( target, targetIndex, getSampleValueAt( source, sourceIndex)) targetIndex = targetIndex + 1 #Clear out the rest of the target sound -- it's only half full! #for secondHalf in range( getLength( target)/2, getLength( target)): # setSampleValueAt(target,targetIndex,0) # targetIndex = targetIndex + 1 play(target) return target “Switch off” these lines of code by commenting them out.
Splicing Sounds Splicing gets its name from literally cutting and pasting pieces of magnetic tape together Doing it digitally is easy (in principle), but painstaking Say we want to splice pieces of speech together: We find where the end points of words are We copy the samples into the right places to make the words come out as we want them (We can also change the volume of the words as we move them, to increase or decrease emphasis and make it sound more natural.)
Finding the word end-points Using MediaTools and play before/after cursor, can figure out the index numbers where each word ends
Now, it’s all about copying We have to keep track of the source and target indices, srcSample and destSample destSample = Where-the-incoming-sound-should-start for srcSample in range(startingPoint, endingPoint): sampleValue = getSampleValueAt(source, srcSample) setSampleValueAt(dest, destSample, sampleValue) destSample = destSample + 1
def splicePreamble(): file = "/Users/guzdial/mediasources/preamble10.wav" source = makeSound(file) dest = makeSound(file) # This will be the newly spliced sound destSample=17408 # targetIndex starts at just after "We the" in the new sound for srcSample in range( 33414, 40052): # Where the word "United" is in the sound setSampleValueAt(dest, destSample, getSampleValueAt( source, srcSample)) destSample = destSample + 1 for srcSample in range(17408, 26726): # Where the word "People" is in the sound setSampleValueAt(dest, destSample, getSampleValueAt( source, srcSample)) destSample = destSample + 1 for index in range(1, 1000): #Stick some quiet space after that setSampleValueAt(dest, destSample, 0) destSample = destSample + 1 play(dest) #Let's hear and return the result return dest The Whole Splice
What’s going on here? First, set up a source and target. Next, we copy “United” (samples to 40052) after “We the” (sample 17408) That means that we end up at ( ) = =24046 Where does “People” start? Next, we copy “People” (17408 to 26726) immediately afterward. Do we have to copy “of” to? Or is there a pause in there that we can make use of? Finally, we insert a little (1/441-th of a second) of space – 0’s
What if we didn’t do that second copy? Or the pause? def splicePreamble(): file = "/Users/guzdial/mediasources/preamble10.wav" source = makeSound(file) dest = makeSound(file) # This will be the newly spliced sound destSample=17408 # targetIndex starts at just after "We the" in the new sound for srcSample in range( 33414, 40052): # Where the word "United" is in the sound setSampleValueAt(dest, destSample, getSampleValueAt( source, srcSample)) destSample = destSample + 1 #for srcSample in range(17408, 26726): # Where the word "People" is in the sound #setSampleValueAt(dest, destSample, getSampleValueAt( source, srcSample)) #destSample = destSample + 1 #for index in range(1, 1000): #Stick some quiet space after that #setSampleValueAt(dest, destSample, 0) #destSample = destSample + 1 play(dest) #Let's hear and return the result return dest
Changing the splice What if we wanted to increase or decrease the volume of an inserted word? Simple! Multiply each sample by something as it’s pulled from the source. Could we do something like slowly increase volume (emphasis) or normalize the sound? Sure! Just like we’ve done in past programs, but instead of working across all samples, we work across only the samples in that sound!
Making more complex sounds We know that natural sounds are often the combination of multiple sounds. Adding waves in physics or math is hard. In computer science, it’s easy! Simply add the samples at the same index in the two waves: for srcSample in range(0, getLength(source)): destValue=getSampleValueAt(dest, srcSample) srcValue=getSampleValueAt(source,srcSample) setSampleValueAt(source, srcSample, srcValue+destValue)
Adding sounds The first two are sine waves generated in Excel. The third is just the sum of the first two columns. a b a + b = c
Uses for adding sounds We can mix sounds We even know how to change the volumes of the two sounds, even over time (e.g., fading in or fading out) We can create echoes We can add sine (or other) waves together to create kinds of instruments/sounds that do not physically exist, but which sound interesting and complex
A function for adding two sounds def addSoundInto(sound1, sound2): for sampleNmr in range(0, getLength(sound1)): sample1 = getSampleValueAt(sound1, sampleNmr) sample2 = getSampleValueAt(sound2, sampleNmr) setSampleValueAt(sound2, sampleNmr, sample1 + sample2) Notice that this adds sound1 and sound by adding sound1 into sound2
Making a chord by mixing three notes >>> setMediaFolder() New media folder: C:\Documents and Settings\Mark Guzdial\My Documents\mediasources\ >>> getMediaPath("bassoon-c4.wav") 'C:\\Documents and Settings\\Mark Guzdial\\My Documents\\mediasources\\bassoon-c4.wav' >>> c4=makeSound(getMediaPath("bassoon-c4.wav")) >>> e4=makeSound(getMediaPath("bassoon-e4.wav")) >>> g4=makeSound(getMediaPath("bassoon-g4.wav")) >>> addSoundInto(e4,c4) >>> play(c4) >>> addSoundInto(g4,c4) >>> play(c4)
Adding sounds with a delay def makeChord(sound1, sound2, sound3): for index in range(0, getLength(sound1)): s1Sample = getSampleValueAt(sound1, index) if index > 1000: s2Sample = getSampleValueAt(sound2, index ) setSampleValueAt(sound1, index, s1Sample + s2Sample) if index > 2000: s3Sample = getSampleValueAt(sound3, index ) setSampleValueAt(sound1, index, s1Sample + s2Sample + s3Sample) -Add in sound2 after 1000 samples -Add in sound3 after 2000 samples Note that in this version we’re adding into sound1!
Challenge Option #1: Create an audio collage where the same sound is spliced in at least three times: Once in its original form Then with any modification you want to make to it Reverse, scale up or down. Option #2: Make music (it’s up to you what you do!) Look in MusicSounds folder in MediaSources Several instruments, different notes Shift frequencies to get new tones Crop to get shorter notes Save sounds with writeSoundTo(sound,filename)