Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004

of 49 Objectives Introduction Background Theory Methods Examples Matlab Code Short Time Fourier Transform Short Time Fourier Transform Magnitude Speech Samples Conclusion Questions References

of 49 Introduction Goal To either speed up or slow down a speech signal while maintaining the approximate pitch Applications Change voice mail playback Court stenographers-play proceedings quicker Sound effects Etc…

of 49 Introduction Option 1 – Change sample rate If you modify the sample rate, you can change the speed but the pitch is also changed Increase sample rate = higher pitch (chipmunk sound) Decrease sample rate = lower pitch (drawn out echo sound) Option 2 – Decimate or Interpolate Signal If you change the number of samples, the result is the same as modifying the sample rate

of 49 Introduction Option 3 – Use more complex methods This will change the speed of the sample while preserving the pitch data Short Time Fourier Transform Short Time Fourier Transform Magnitude Sinusoidal Synthesis Linear Prediction Synthesis

of 49 Terminology Window Size Frame Rate

of 49 Theory Short Time Fourier Transform Methods Chapter 7 in our text (Discrete-Time Speech Signal Processing) Refer to notes from in class for mathematical theory of operation I will pick up from where Dr. Kepuska stopped in his notes

of 49 Short Time Fourier Transform Also called the Fairbanks method Extract successive short-time segments and then discard the following ones STFT Decimate Samples IFFT OLA Signal Output

of 49 Short Time Fourier Transform Frame Rate factor L In frequency domain after taking the STFT, you get X(nL,ω) Form a new signal by Y(nL, ω) = X(snL, ω)  where s = compression factor Take Inverse Fourier Transform Use Overlap and Add method to form new signal

of 49 Short Time Fourier Transform X(nL, ω) Y(nL, ω) = X(2nL, ω)

of 49 Short Time Fourier Transform New Sequence Original Windowed Sequence

of 49 Short Time Fourier Transform Problems Pitch Synchronization It is highly likely that the pitch periods will not line up properly

of 49 Short Time Fourier Transform Magnitude Problems with STFT method relate directly to the linear phase component of the STFT Time shift = phase change Alternate approach is to only use the magnitude portion of the STFT—Short Time Fourier Transform Magnitude

of 49 Short Time Fourier Transform Magnitude Compression With the Fairbanks method, time slices were discarded Now we can just compress the time slices Form a new signal by |Y(nM, ω)| = |X(nL, ω)| where  M = compression factor = L / speed  i.e. for speeding up by two => M = L/2

of 49 Short Time Fourier Transform Magnitude Compression Take Inverse Fourier Transform Use Overlap and Add method to form new signal

of 49 Short Time Fourier Transform Magnitude X(nL, ω) Y(nM, ω) = X(nL, ω) M=L/2

of 49 Short Time Fourier Transform Magnitude New Sequence Original Windowed Sequence

of 49 Other Methods Sinusoidal Synthesis—Chapter 9 Time-warp the sinewave frequency track and the amplitude function This technique has been successful with not only speech but also music, biological, and mechanical signals Problems Does not maintain the original phase relations Suffer from reverberance

of 49 Other Methods Linear Prediction Synthesis Use Homomorphic and Linear Prediction results to modify the time base Book briefly mentions this is possible but ran out of time before I could investigate this process more

of 49 Other Methods New Techniques Internet search showed several methods trying to improve on what is out there now Software Different software programs that will change speed for you Adobe Audition is one of the most all encompassing right now

of 49 Matlab Code - Prepare the Workspace %%%%%%%% % Prepare Workspace %%%%%%%% close all; clear all; window_size_1 = 200; frame_rate_1 = 100; %Speed to slow down by speed = 2;

of 49 Matlab Code -Load the Speech Signal %%%%%%%% % Load Data File %%%%%%%% filename = input('Please enter the file name to be used. '); [sample_data,sample_rate,nbits] = wavread(filename); loop_time = floor(max(size(sample_data))/frame_rate_1); sample_data((max(size(sample_data))):(loop_time+1)* frame_rate_1)=0;

of 49 Matlab Code -Develop the Window %%%%%%%% % Create Windows %%%%%%%% % Want windows of 25ms % File sampled at 10,000 samples/sec % Want a window of size 10000 * 25ms(10ms) triangle_30ms = triang(window_size_1); %triangle_30ms = hamming(window_size_1); W0 = sum(triangle_30ms);

of 49 Matlab Code -Window the Entire Speech Signal %%%%%%%% % Window the speech %%%%%%%% for i =0:loop_time-1 window_data(:,i+1)=sample_data((frame_rate_1*i)+1:((i+2)* frame_rate_1)).*triangle_30ms; end

of 49 Matlab Code -Perform the Fast Fourier Transform %%%%%%%% % Create FFT %%%%%%%% for i = 1:loop_time window_data_fft(:,i) = fft(window_data(:,i),1024); end

of 49 Matlab Code -Recreate the Modified Signal %%%%%%%% % Recreate Original Signal %%%%%%%% %Initialize the recreated signals reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; real_reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; modified_reconstructed_signal(1:(loop_time+3)*(frame_rate_1/speed)) =0; modified_reconstructed_signal_compressed(1:(loop_time+3)* (frame_rate_1/ speed))=0;

of 49 Matlab Code -Recreate the Modified Signal % Perform the ifft for i = 1:loop_time recreated_data_ifft(:,i) = ifft(window_data_fft(:,i),1024); real_recreated_data_ifft(:,i) = ifft(abs(window_data_fft(:,i)),1024); truncated_recreated_data_ifft(:,i) = recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); real_truncated_recreated_data_ifft(:,i) = real_recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); end

of 49 Matlab Code -Recreate the Modified Signal % Get back to the original signal for i=0:loop_time-1 reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + truncated_recreated_data_ifft(:,i+1)'; real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i+1)'; end

of 49 Matlab Code -Recreate the Modified Signal % Get a modified signal by deleting certain parts (STFT) for i=0:(loop_time-1)/speed modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate _1)) + real_truncated_recreated_data_ifft(:,i*speed+1)'; end

of 49 Matlab Code -Recreate the Modified Signal % Initialize the compressed sequence (STFTM) modified_reconstructed_signal_compressed(1:frame_rate_1+frame_rat e_1/speed+1)=truncated_recreated_data_ifft(frame_rate_1- frame_rate_1/speed:window_size_1,1)'; % Get a modified signal by compressing for i=0:(loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/speed*i) +1:(frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/speed*i) +1:(frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(:,i+2)'; end

of 49 Matlab Code -Plot Results %%%%%%%% % Plot Results %%%%%%%% Figure; subplot(211) plot(sample_data) title('Original Speech'); v1=axis; hold on; subplot(212) plot(real(modified_reconstructed_signal)) title(['STFT Synthesis w/ Speed = ',num2str(speed),'X']); v2=axis; if speed > 1 subplot(211); axis(v1) subplot(212); axis(v1) else subplot(211); axis(v2) subplot(212); axis(v2) end

of 49 Matlab Code -Write Sound Files %%%%%%%% % Write sound files %%%%%%%% wavwrite(modified_reconstructed_signal,sample_rate,nbits,'C:\Classes\ ECE_5525\tea party fairbanks 2x.wav')

of 49 Examples Baseline Samples STFT Sound file STFTM Sound file Original File Sample Rate 2X Sample Rate.5X

of 49 Examples STFT—Speed 0.5X Sound file

of 49 Examples STFT—Speed 2X Sound file

of 49 Examples STFT—Speed 4X Sound file

of 49 Examples STFTM—Speed 0.5X Sound file

of 49 Examples STFTM—Speed 2X Sound file

of 49 Examples STFTM—Speed 4X Sound file

of 49 More Results Change in window size If the window size becomes too small, then a change in pitch will occur Need window to be 2 to 3 pitch periods long I generally used 20 – 30 ms windows

of 49 More Results Change in frame rate If the frame rate decreases too much, then there will be too many samples overlapping to get an intelligible signal

of 49 More Results Change filter type Tried Hamming—not much perceptual difference Using the window energy becomes important here Frame Rate/W0 is not equal to one

of 49 Conclusion Optimum area Frame rate is one half of the window size Window size needs to be 2 to 3 pitch periods long It is possible to easily change the time scale and still maintain the original pitch although the result is not always natural sounding

of 49 Conclusion Further investigation What to do when you want to slow down over half. Using the STFTM means there will be gaps between the sequences

of 49 Conclusion Further investigation What to do when you want to slow down over half Could replicate windowed segments

of 49 Conclusion Further investigation Use the other methods to determine quality Implement Sinusoidal Synthesis Implement Linear Predictive Synthesis using linear prediction and homomorphic methods Work on synchronizing pitch periods Shift samples so that the peaks line up  Scott and Gerber—Synchronized Overlap and Add (SOLA)  Cross-correlation of two samples to find peak  Use the peaks to line up samples Align the window at same relative location within a pitch period

of 49 Questions Are there any questions?

of 49 References Quatieri, Thomas E. Discrete-Time Speech Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2002. Rabiner, L.R. and Schafer, R.W. Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River, NJ, 1978. Oppenheim, A.V and Schafer, R.W. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, 1975. Scott, R. and Gerber, S. “Pitch Synchronous Time- Compression of Speech,” Proc. Conf. Speech Communications Processing, p63-85, April 1972.

of 49 References Fairbanks, G., Everitt, W.L., and Jaeger, R.P. “Method for Time or Frequency Compression- Expansion of Speech,” IEEE Transaction Audio and Electroacoustics, vol. AU-2 pp.7-12, Jan 1954.

Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Similar presentations

Presentation on theme: "Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Similar presentations

Presentation on theme: "Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004."— Presentation transcript:

Similar presentations

About project

Feedback