Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Similar presentations


Presentation on theme: "Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004."— Presentation transcript:

1 Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004

2 Slide 2 of 49 Objectives Introduction Background Theory Methods Examples Matlab Code Short Time Fourier Transform Short Time Fourier Transform Magnitude Speech Samples Conclusion Questions References

3 Slide 3 of 49 Introduction Goal To either speed up or slow down a speech signal while maintaining the approximate pitch Applications Change voice mail playback Court stenographers-play proceedings quicker Sound effects Etc…

4 Slide 4 of 49 Introduction Option 1 – Change sample rate If you modify the sample rate, you can change the speed but the pitch is also changed Increase sample rate = higher pitch (chipmunk sound) Decrease sample rate = lower pitch (drawn out echo sound) Option 2 – Decimate or Interpolate Signal If you change the number of samples, the result is the same as modifying the sample rate

5 Slide 5 of 49 Introduction Option 3 – Use more complex methods This will change the speed of the sample while preserving the pitch data Short Time Fourier Transform Short Time Fourier Transform Magnitude Sinusoidal Synthesis Linear Prediction Synthesis

6 Slide 6 of 49 Terminology Window Size Frame Rate

7 Slide 7 of 49 Theory Short Time Fourier Transform Methods Chapter 7 in our text (Discrete-Time Speech Signal Processing) Refer to notes from in class for mathematical theory of operation I will pick up from where Dr. Kepuska stopped in his notes

8 Slide 8 of 49 Short Time Fourier Transform Also called the Fairbanks method Extract successive short-time segments and then discard the following ones STFT Decimate Samples IFFT OLA Signal Output

9 Slide 9 of 49 Short Time Fourier Transform Frame Rate factor L In frequency domain after taking the STFT, you get X(nL,ω) Form a new signal by Y(nL, ω) = X(snL, ω)  where s = compression factor Take Inverse Fourier Transform Use Overlap and Add method to form new signal

10 Slide 10 of 49 Short Time Fourier Transform X(nL, ω) Y(nL, ω) = X(2nL, ω)

11 Slide 11 of 49 Short Time Fourier Transform New Sequence Original Windowed Sequence

12 Slide 12 of 49 Short Time Fourier Transform Problems Pitch Synchronization It is highly likely that the pitch periods will not line up properly

13 Slide 13 of 49 Short Time Fourier Transform Magnitude Problems with STFT method relate directly to the linear phase component of the STFT Time shift = phase change Alternate approach is to only use the magnitude portion of the STFT—Short Time Fourier Transform Magnitude

14 Slide 14 of 49 Short Time Fourier Transform Magnitude Compression With the Fairbanks method, time slices were discarded Now we can just compress the time slices Form a new signal by |Y(nM, ω)| = |X(nL, ω)| where  M = compression factor = L / speed  i.e. for speeding up by two => M = L/2

15 Slide 15 of 49 Short Time Fourier Transform Magnitude Compression Take Inverse Fourier Transform Use Overlap and Add method to form new signal

16 Slide 16 of 49 Short Time Fourier Transform Magnitude X(nL, ω) Y(nM, ω) = X(nL, ω) M=L/2

17 Slide 17 of 49 Short Time Fourier Transform Magnitude New Sequence Original Windowed Sequence

18 Slide 18 of 49 Other Methods Sinusoidal Synthesis—Chapter 9 Time-warp the sinewave frequency track and the amplitude function This technique has been successful with not only speech but also music, biological, and mechanical signals Problems Does not maintain the original phase relations Suffer from reverberance

19 Slide 19 of 49 Other Methods Linear Prediction Synthesis Use Homomorphic and Linear Prediction results to modify the time base Book briefly mentions this is possible but ran out of time before I could investigate this process more

20 Slide 20 of 49 Other Methods New Techniques Internet search showed several methods trying to improve on what is out there now Software Different software programs that will change speed for you Adobe Audition is one of the most all encompassing right now

21 Slide 21 of 49 Matlab Code - Prepare the Workspace %%%%%%%% % Prepare Workspace %%%%%%%% close all; clear all; window_size_1 = 200; frame_rate_1 = 100; %Speed to slow down by speed = 2;

22 Slide 22 of 49 Matlab Code -Load the Speech Signal %%%%%%%% % Load Data File %%%%%%%% filename = input('Please enter the file name to be used. '); [sample_data,sample_rate,nbits] = wavread(filename); loop_time = floor(max(size(sample_data))/frame_rate_1); sample_data((max(size(sample_data))):(loop_time+1)* frame_rate_1)=0;

23 Slide 23 of 49 Matlab Code -Develop the Window %%%%%%%% % Create Windows %%%%%%%% % Want windows of 25ms % File sampled at 10,000 samples/sec % Want a window of size 10000 * 25ms(10ms) triangle_30ms = triang(window_size_1); %triangle_30ms = hamming(window_size_1); W0 = sum(triangle_30ms);

24 Slide 24 of 49 Matlab Code -Window the Entire Speech Signal %%%%%%%% % Window the speech %%%%%%%% for i =0:loop_time-1 window_data(:,i+1)=sample_data((frame_rate_1*i)+1:((i+2)* frame_rate_1)).*triangle_30ms; end

25 Slide 25 of 49 Matlab Code -Perform the Fast Fourier Transform %%%%%%%% % Create FFT %%%%%%%% for i = 1:loop_time window_data_fft(:,i) = fft(window_data(:,i),1024); end

26 Slide 26 of 49 Matlab Code -Recreate the Modified Signal %%%%%%%% % Recreate Original Signal %%%%%%%% %Initialize the recreated signals reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; real_reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; modified_reconstructed_signal(1:(loop_time+3)*(frame_rate_1/speed)) =0; modified_reconstructed_signal_compressed(1:(loop_time+3)* (frame_rate_1/ speed))=0;

27 Slide 27 of 49 Matlab Code -Recreate the Modified Signal % Perform the ifft for i = 1:loop_time recreated_data_ifft(:,i) = ifft(window_data_fft(:,i),1024); real_recreated_data_ifft(:,i) = ifft(abs(window_data_fft(:,i)),1024); truncated_recreated_data_ifft(:,i) = recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); real_truncated_recreated_data_ifft(:,i) = real_recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); end

28 Slide 28 of 49 Matlab Code -Recreate the Modified Signal % Get back to the original signal for i=0:loop_time-1 reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + truncated_recreated_data_ifft(:,i+1)'; real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i+1)'; end

29 Slide 29 of 49 Matlab Code -Recreate the Modified Signal % Get a modified signal by deleting certain parts (STFT) for i=0:(loop_time-1)/speed modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate _1)) + real_truncated_recreated_data_ifft(:,i*speed+1)'; end

30 Slide 30 of 49 Matlab Code -Recreate the Modified Signal % Initialize the compressed sequence (STFTM) modified_reconstructed_signal_compressed(1:frame_rate_1+frame_rat e_1/speed+1)=truncated_recreated_data_ifft(frame_rate_1- frame_rate_1/speed:window_size_1,1)'; % Get a modified signal by compressing for i=0:(loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/speed*i) +1:(frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/speed*i) +1:(frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(:,i+2)'; end

31 Slide 31 of 49 Matlab Code -Plot Results %%%%%%%% % Plot Results %%%%%%%% Figure; subplot(211) plot(sample_data) title('Original Speech'); v1=axis; hold on; subplot(212) plot(real(modified_reconstructed_signal)) title(['STFT Synthesis w/ Speed = ',num2str(speed),'X']); v2=axis; if speed > 1 subplot(211); axis(v1) subplot(212); axis(v1) else subplot(211); axis(v2) subplot(212); axis(v2) end

32 Slide 32 of 49 Matlab Code -Write Sound Files %%%%%%%% % Write sound files %%%%%%%% wavwrite(modified_reconstructed_signal,sample_rate,nbits,'C:\Classes\ ECE_5525\tea party fairbanks 2x.wav')

33 Slide 33 of 49 Examples Baseline Samples STFT Sound file STFTM Sound file Original File Sample Rate 2X Sample Rate.5X

34 Slide 34 of 49 Examples STFT—Speed 0.5X Sound file

35 Slide 35 of 49 Examples STFT—Speed 2X Sound file

36 Slide 36 of 49 Examples STFT—Speed 4X Sound file

37 Slide 37 of 49 Examples STFTM—Speed 0.5X Sound file

38 Slide 38 of 49 Examples STFTM—Speed 2X Sound file

39 Slide 39 of 49 Examples STFTM—Speed 4X Sound file

40 Slide 40 of 49 More Results Change in window size If the window size becomes too small, then a change in pitch will occur Need window to be 2 to 3 pitch periods long I generally used 20 – 30 ms windows

41 Slide 41 of 49 More Results Change in frame rate If the frame rate decreases too much, then there will be too many samples overlapping to get an intelligible signal

42 Slide 42 of 49 More Results Change filter type Tried Hamming—not much perceptual difference Using the window energy becomes important here Frame Rate/W0 is not equal to one

43 Slide 43 of 49 Conclusion Optimum area Frame rate is one half of the window size Window size needs to be 2 to 3 pitch periods long It is possible to easily change the time scale and still maintain the original pitch although the result is not always natural sounding

44 Slide 44 of 49 Conclusion Further investigation What to do when you want to slow down over half. Using the STFTM means there will be gaps between the sequences

45 Slide 45 of 49 Conclusion Further investigation What to do when you want to slow down over half Could replicate windowed segments

46 Slide 46 of 49 Conclusion Further investigation Use the other methods to determine quality Implement Sinusoidal Synthesis Implement Linear Predictive Synthesis using linear prediction and homomorphic methods Work on synchronizing pitch periods Shift samples so that the peaks line up  Scott and Gerber—Synchronized Overlap and Add (SOLA)  Cross-correlation of two samples to find peak  Use the peaks to line up samples Align the window at same relative location within a pitch period

47 Slide 47 of 49 Questions Are there any questions?

48 Slide 48 of 49 References Quatieri, Thomas E. Discrete-Time Speech Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2002. Rabiner, L.R. and Schafer, R.W. Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River, NJ, 1978. Oppenheim, A.V and Schafer, R.W. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, 1975. Scott, R. and Gerber, S. “Pitch Synchronous Time- Compression of Speech,” Proc. Conf. Speech Communications Processing, p63-85, April 1972.

49 Slide 49 of 49 References Fairbanks, G., Everitt, W.L., and Jaeger, R.P. “Method for Time or Frequency Compression- Expansion of Speech,” IEEE Transaction Audio and Electroacoustics, vol. AU-2 pp.7-12, Jan 1954.


Download ppt "Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004."

Similar presentations


Ads by Google