朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw.

朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw
Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳

Outline Introduction GM(1,1) Model Grey Noise Estimation (GNE)
Grey Magnitude Spectral Subtraction (GMSS) Energy-Based Voice Activity Detection with GMSS (GMSS/EVAD) Grey Voice Activity Detection (GVAD) Conclusions 2018/12/4

1. Introduction (1/2) Grey system was introduced and developed by Deng since Basically, it can be divided into two categories in grey system: grey estimation and grey relational analysis. As for grey estimation, GM(1,1) model is a very popular model. In the GM(1,1) modeling, it requires as few as four data for estimation, such as fitting and forecasting. With its simplicity and effectiveness, GM(1,1) model has been applied in many fields like earthquake forecasting, energy management, integrated circuit forecasting, and so on. 2018/12/4

1. Introduction (2/2) In today’s presentation, a fundamental noise estimation scheme is first developed based on GM(1,1) model, which is called grey noise estimation (GNE). Based on GNE, the grey magnitude spectral subtraction (GMSS) to speech enhancement is introduced. Based on GMSS, a grey voice activity detection is proposed which is called GMSS/EVAD where GMSS is used as a preprocessing scheme to remove additive noise. Based on GNE, another time-domain VAD is proposed which is able to deal with non-stationary noise. 2018/12/4

2. GM(1,1) Model (1/8) The GM(1,1) modeling process is described in the following. Given non-negative data sequence , a new data sequence is found by the first-order accumulated generating operation (1-AGO) as (1) 2018/12/4

2. GM(1,1) Model (2/8) By sequences and , a grey difference equation is formed as (2) where (3) for and parameters a and b are called developing coefficient and grey input, respectively. 2018/12/4

2. GM(1,1) Model (3/8) From (2), parameters a and b can be obtained as
(4) where (5) and (6) 2018/12/4

2. GM(1,1) Model (4/8) It can be shown that the solution of is given as (7) where parameters a and b are found in (4). By 1-IAGO, the estimate of , ,is obtained as (8) where The estimation error for is given as (9) which will be used to estimate additive noise. 2018/12/4

Fig. 1 The block diagram of the GM(1,1) modeling
The minimum number of data required in the GM(1,1) modeling process is as few as four, i.e The GM(1,1) modeling is depicted in Figure 1. Fig. 1 The block diagram of the GM(1,1) modeling 2018/12/4

2. GM(1,1) Model (6/8) To justify the GM(1,1) model can estimate speech signal appropriately, the clean speech b.wav (male speech of letter “b”), is used as an example. Since b.wav is within the range (-1,1) , it is level-shifted by 5 to be non-negative and then put into the GM(1,1) model to find the estimate of b.wav. 2018/12/4

2. GM(1,1) Model (7/8) By shifting down by 5, the final estimate of b.wav is found. The original b.wav and its estimate from the GM(1,1) model are depicted in Figures 2(a) and 2(b), respectively. As shown in Figure 2, the GM(1,1) is able to estimate b.wav properly. 2018/12/4

2. GM(1,1) Model (8/8) Fig. 2 (a) Clean speech b.wav (b) Estimate of b.wav by the GM(1,1) model 2018/12/4

From GM(1,1) Model to Grey Noise Estimation
2018/12/4

3. Grey Noise Estimation (1/9)
The purpose of noise estimation is to recover signal component from noisy. noise estimation is required in many engineering applications. A grey noise estimation is developed for the following applications of speech enhancement and voice activity detection. 2018/12/4

Note that the estimation error of GM(1,1) model is nearly zero for a almost constant signal and approximately zero for random signal when additive noise is absent. When additive noise is present, both signals have non-zero estimation error. Consequently, there is a hope to estimate additive noise in noisy speech through estimation error of GM(1,1) model. 2018/12/4

Assume the available noise signal has the additive signal model where and are the clean signal and the additive noise in , respectively. Denote a segment of noisy signal as where is the total number of samples. Notation K is the number of samples used in the GM(1,1) modeling and is the number of subsets with one sample overlapped. 2018/12/4

The GNE implementation steps are given in the following: Step 1. Divide into subsets as The way to divide into subsets for the case is depicted in Figure where the square indicates the overlapped sample. Fig. 3 One sample overlapped subsets for the GNE (K=4) 2018/12/4

Step 2. For each subset i, find an estimate of , , by the GM(1,1) model as stated Section 2. Then consider as an estimate of , That is, Note that and therefore additive noise is not equal to estimation error of the GM(1,1) model, , in general. Step 3. Since but related to is estimated as where is a user-defined scaling factor and is determined by experiences. 2018/12/4

Step 4. Concatenate , as and Step 5. Estimate mean of additive noise as (10) Since is of one sample overlapped. Step 6. Estimate standard deviation of as (11) 2018/12/4

The block diagram of the GNE is depicted as below. Fig. 4 The block diagram of the GNE 2018/12/4

Finally, in order to demonstrate the GNE is able to estimate additive noise through estimation error of the GM(1,1) model. The speech b.wav used previously is injected by zero-mean additive Gaussian white noise whose SNR is 5 dB. The noisy b.wav is depicted in Figure 5(a) and true additive noise ( =0.091) in Figure 5(b), the estimated additive noise by the GNE ( =0.092) in Figure 5(c) where =1.7 . The result shown in Figure 5 indicate that the GNE is able to estimate additive noise appropriately. 2018/12/4

Fig. 5 (a) Noisy speech b.wav (b) Additive noise (c) Estimated additive noise 2018/12/4

From GNE to Speech Enhancement
2018/12/4

4 . Grey Magnitude Spectral Subtraction (1/15)
Speech enhancement is a very important preprocessing scheme in many speech processing systems, such as speech recognition system, since it affects the performance significantly. In general, better performance of speech enhancement results from appropriate noise estimation. Basically, the speech enhancement consists of two stages: noise estimation and noise removal. 2018/12/4

Note that the spectral subtraction approach to speech enhancement is heavily dependent on accuracy of statistics of additive noise and that the grey noise estimation (GNE) is able to estimate additive noise appropriately. In the presented approach, the GNE is applied to speech estimate additive noise and noise removal technique is based on magnitude spectral subtraction (MSS). The proposed approach is called grey magnitude spectral subtraction (GMSS). 2018/12/4

Assume that the additive signal model is appropriate for the noisy speech and that the noisy speech is stored in the wave file format whose range is within Given a noisy speech , the implementation steps are described in the following where additive noise is assumed known and the length of is assumed as a multiple of L. 2018/12/4

Step 1. Shift up the level of by an appropriate constant C, , such that is met . Step 2. Divide into M non-overlapped segments of length L and denote as speech segment of length L. Then Steps 3 to 9 are performed for each speech segme 2018/12/4

Step 3. With an rectangular window, obtain where denotes as L-point fast Fourier transform (FFT). Step 4. Estimate additive noise as by the GNE described in Section 2. Step 5. Perform L-point FFT on to find the magnitude of , , where is used. Step 6. Estimate the standard deviation of , 2018/12/4

Step7. Perform MSS as (12) where is an estimate of and is a user- defined scaling factor determined by experiences. Step 8. Find estimate of as , where is the inverse of L-point FFT. 2018/12/4

Step 9. Shift down the level of by the constant C . Step 10. Concatenate M segments to find estimate of , Step 11. Obtain residual noise 2018/12/4

Step 12. Calculate input, output, and improvement signal to noise ratio, SNRin, SNRout , and SNRimp as follows (13) (14) (15) 2018/12/4

Fig. 6 The block diagram for the proposed GMSS approach 2018/12/4

Simulation results Speech file: f0101.wav , =10 KHz and length of samples is 98,000. Reading the sentence: “one, two, …, ten.” Parameter Settings: GNE: K=4, C=5, α=1.7 GMSS: L=1000, M=98,000/1000=98 Additive white noise: Gaussian and uniform 2018/12/4

Fig. 7 (a) Clean speech. (b) Clean spectrum (f0101s.wav). 2018/12/4

Gaussian noise Fig. 8 (a) Noisy speech( ) Fig. 9 (a)Noisy spectrum( ) (b) Enhanced speech(f0101s.wav) (b)Enhanced spectrum(f0101s.wav) 2018/12/4

Fig. 10 SNRin, SNRout, and SNRimp, for Gaussian noise with different σ 2018/12/4

Uniform noise Fig.11 (a) Noisy speech( ) Fig. 12 (a)Noisy spectrum( ) (b) Enhanced speech(f0101s.wav) (b)Enhanced spectrum(f0101s.wav) 2018/12/4

Fig. 13 SNRin, SNRout, and SNRimp, for uniform noise with different γ 2018/12/4

From GMSS to Voice Activity Detection
2018/12/4

5 . Energy-Based Voice Activity Detection with GMSS (1/14)
Voice activity detection (VAD) is a very important preprocessing required in many speech systems, such as speech coding and wireless speech communications for better bit rate utilization, and width efficiency, and battery saving.. The objective of VAD is to determine if a segment is speech or non-speech. 2018/12/4

In most of VAD approaches, an additive signal model is assumed where a noisy speech results from a sum of clean speech and additive noise. The performance of VAD heavily depends on the estimation of noise. Appropriate noise estimation leads to good performance generally. Once the noise statistics being estimated, the determination of speech and non-speech portions is made. 2018/12/4

In order to improvement VAD performance in low SNR conditions. The approach called the grey magnitude spectral subtraction with energy voice activity detection (GMSS/EVAD) is presented here. The proposed GMSS/EVAD approach consists of two portions: First by the GMSS, the noisy speech is processed into an approximate clean speech. Second, to determine the frames is speech or non-speech by energy-based VAD. 2018/12/4

Since the GMSS is able to remove additive noise effectively, it is considered as a preprocessing scheme for the following energy-based VAD. Note that the speech processed by the GMSS is approximate to a clean speech and that the energy-based VAD has good performance for clean or little noisy speeches. Consequently, EVAD should be appropriate to the speech after the GMSS. Based on the ideas just described, the GMSS/EVAD approach is developed. 2018/12/4

Denote as the speech processed by the GMSS. The implementation steps are given in the following. Step 1. Divide into segments partly overlapped. Figure 15 shows adjacent 240-sample segments with 160 samples overlapped, where denotes the ith segment Fig. 15 The relation between of the first two adjacent frames 2018/12/4

Step 2. Calculate the energy of as (16) Step 3. By threshold , determine the overlapped portion in as a speech segment if and a non-speech segment otherwise. Step 4. Continue Steps 3 until all segments are processed. 2018/12/4

Fig. 16 The block diagram of the GMSS/EVAD approach 2018/12/4

The proposed GMSS/EVAD approach is justified through examples and is compared with VAD schemes in G.729 and GSM AMR. The examples used in the simulation are speech files f0101s.wav, f0125s.wav and f0126s.wav which are, respectively, female oral reading sentences: “one, two, …, ten.’’, “We were away a year ago.” and “Early one morning man and woman ambled along a one mile lane.” 2018/12/4

These files are considered as clean speeches with sampling rate 10 KHz and depicted in Figures 17(a), 18(a), and 19(a), respectively. The additive noise is white Gaussian noise that is artificially generated. In the simulation, all speech files are level-shifted by 5, i.e. C=5, and each segment length and overlapped length are set to 240 and 160, respectively. 2018/12/4

In addition, the number of samples used in the GM(1,1) modeling is 4, i.e. K=4, where the parameter =1.7 in the GNE. The parameter in the GMSS is set to 5. In the simulation, the SNR for all noisy speeches are set to 0 dB and depicted in Figures 17(b), 18(b), and 19(b), respectively. 2018/12/4

The speeches after the GMSS are depicted in Figures 17(c), 18(c), and 19(c), respectively. The comparison results for the proposed GMSS/EVAD, G.729 and GSM AMR are depicted in Figures 17(d)(e)(f) to 19(d)(e)(f), respectively. The simulation results indicate that the proposed GMSS/EVAD outperforms VAD schemes in G.729 and GSM AMR in low SNR examples. 2018/12/4

Figure 17 (a) Clean f0101s.wav (b) Noisy f0101s.wav (0 dB) (c) GMSS output (d) GMSS/EVAD output (e) G.729 VAD output (f) GSM AMR VAD output 2018/12/4

From GNE to Voice Activity Detection
2018/12/4

6 . Grey Voice Activity Detection (1/12)
General traditional statistical method needs extra statistics model and operates the frequently domain where the computational complexity is high . Consequently, a grey approach to VAD problem is proposed and called the grey VAD (GVAD) which is a time domain approach and requires no statistical model. 2018/12/4

The proposed VAD based on GNE is called grey VAD (GVAD). The implementation steps of GVAD are described as follows: Step 1. Shift up the level of in wave file format by a positive constant C, , such that Step 2. Divide into M non-overlapped segments of length L and denote as the ith speech segment of length L. Without loss of generality, the length of is assumed a multiple of L. 2018/12/4

Step 3. Estimate additive noise as based on GNE where is used. Step 4. Estimate the signal as and Step 5. Estimate the segmental standard deviation of , as in (11), where index i denotes the ith segment of 2018/12/4

Step 6. Estimate the segmental standard deviation of , , by (11) similarly.. Step 7. Calculate the ith segmental SNR, Step 8. Determine if is a speech or non-speech segment as follows. Given threshold for the ith speech segment, mark as a speech segment if and a non-speech segment otherwise. where and is a scaling factor. 2018/12/4

Step 9. Shift down the level of , Step 10. Continue Steps 3 to 10 until all M segmented speech are processed. 2018/12/4

Figure 20. The flowchart of the proposed GVAD algorithm 2018/12/4

The examples used in the simulation are speech files b.wav and f0125s.wav. These speech files are considered as clean speech whose sampling rate is 10 KHz. The non-stationary additive white Gaussian noise is generated artificially. 2018/12/4

In the simulation, all speech files are level-shifted by 5, i.e., C=5 and the segment length is set to 240. For adjacent segments, 160 samples are overlapped. The number of samples used in GM(1,1) modeling is 4, i.e., K=4. The parameter =1.7 is employed in GNE and the parameter in GVAD is set to 7.5. 2018/12/4

Fig 21. The clean b.wav, additive noise, estimated noise, and noisy b.wav with GVAD output 2018/12/4

Fig 22. The clean f0125s.wav, additive noise, estimated noise, and noisy f0125s.wav with GVAD output 2018/12/4

Fig 23. The clean b.wav, noisy b.wav, threshold, and GVAD, G.729, GSM AMR outputs 2018/12/4

Fig 24. The clean f0125s.wav, noisy f0125s.wav, threshold, and GVAD, G.729, GSM AMR outputs 2018/12/4

7 . Conclusions (1/3) This talk is about the application of GM(1,1) model to speech enhancement and voice activity detection. It consists of five parts: First, GM(1,1) model is briefly reviewed. Second, a noise estimation scheme based on GM(1,1) model is introduced which is called grey noise estimation (GNE). 2018/12/4

7 . Conclusions (2/3) Third, with GNE a magnitude spectral subtraction approach is given to improve the quality of noisy speech. The approach is called the grey magnitude spectral subtraction (GMSS). Fourth, the GMSS is used as a preprocessing scheme for voice activity detection (VAD) in low SNR conditions. By an energy-based VAD (EVAD), the enhanced speech by GMSS is segmented and determined as a speech or non-speech segment. The approach is called the GMSS/EVAD. 2018/12/4

7 . Conclusions (3/3) Finally, another VAD based on GNE is proposed where an adaptive threshold is developed to determine speech/non-speech segments. The proposed VAD is called GVAD which is performed in the time domain and free from statistical model assumption. 2018/12/4

Thanks for your attention.
2018/12/4

朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw.

Similar presentations

Presentation on theme: "朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

朝陽科技大學 資訊工程系 謝政勳 chhsieh@cyut.edu.tw Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳 chhsieh@cyut.edu.tw.

Similar presentations

Presentation on theme: "朝陽科技大學 資訊工程系 謝政勳 chhsieh@cyut.edu.tw Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳 chhsieh@cyut.edu.tw."— Presentation transcript:

Similar presentations

About project

Feedback

朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw.

Presentation on theme: "朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳 chhsieh@cyut.edu.tw."— Presentation transcript: