Download presentation
Presentation is loading. Please wait.
Published byLuke Rose Modified over 8 years ago
1
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag
2
The Project Goal: Implementation of a speaker verification algorithm on a DSP
3
Introduction Speaker verification is the process of automatically authenticating the speaker on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, security control for confidential information areas, and more. Speaker verification is the process of accepting or rejecting the identity claim of a speaker. Speaker Verification System Result [0:1] Speaker’s Voice Segment Speaker’s Identity (Reference)
4
System Overview: BT Base Station Speaker Verification Unit BT Base Station Server My name is Bob! LAN
5
The System Architecture: DSP Bluetooth unit Bluetooth Base station Authorization Server “My name is Bob” Voice Channel (optional) Codec Verification Channel Enrollment Server (training phase – building A signature) Signature Parameters (Voice Reference) Bluetooth Radio Interface LAN In the project scope Optional implementations Legend Voice Channel (optional) Speaker Verification Unit
6
Pre-Processing Feature Extraction Pattern Matching Decision Analog Speech Reference Model Speaker Verification System – Block Diagram Result [0:1]
7
LPF A/D First Order FIR Frame Blocking Analog Speech Frame Windowing Band Limited Analog Speech Digital Speech Pre-emphasized Digital Speech (PDS) PDS Frames Windowed PDS Frames Analog to digital converter with frequency sampling (Fs) of [10,16]KHz Anti aliasing filter to avoid aliasing during sampling. LPF [0, Fs/2] Low order digital system to spectrally flatten the signal (in favor of vocal tract parameters), and make it less susceptible to later finite precision effects Frame blocking of the sampled signal. Each frame is of N samples overlapped with N-M samples of the previous frame. Frame rate ~ 100 Frames/Sec N values: [200,300], M values: [100,200] Using Hamming (or Hanning or Blackman) windowing in order to minimize the signal discontinuities at the beginning and end of each frame. Pre-Processing module
8
Feature Extraction Module Feature Extraction Windowed PDS Frame Feature Vector [1:K] In the project we will check two common methods for extracting the features from the speech signal: LPC (Linear Prediction Coefficient) and MFCC (Mel Frequency Cepsral Coefficients). The idea is to find the most suitable method that will comply both the DSP limitations and reasonable results. In both methods we are receiving a vector (K size) representing the features of the windowed PDS frame (N samples each frame). The size of the feature vector is [10,20].
9
Pattern Matching Modeling Module In the project we will check two pattern-matching and modeling techniques and eventually will choose one of them for the DSP implementation. The pattern matching modeling techniques is divided into two sections; the enrolment part, in which we build the reference model of the speaker and the verifications (matching) part where the users will be compared to this model.
10
Pattern Matching Modeling Module – Vector Quantization (VQ) In the enrolment part we build a codebook of the speaker according to the LBG (Linde, Buzo, Gray) algorithm, which creates an N size codebook from set of L feature vectors. In the verification stage, we are measuring the distortion of the given sequence of the feature vectors to the reference codebook. Pattern Matching = Distortion measure Reference Model = Codebook Distortion Rate Feature Vector
11
Pattern Matching Modeling Module – Hidden Markov Model (HMM) In the enrolment stage we build an HMM for the specific speaker (this procedure creates the following outputs: A and B matrix, vector). The building of the model is done by using the Baum-Welch algorithm. In the matching procedure, we compute the matching probability of the current speaker with the model. This is done by the Viterbi algorithm. Pattern Matching = Probability Calc Reference Model = HMM Probability Score Feature Vector
12
Decision Module In VQ the decision is based on checking if the distortion rate is higher than a preset threshold: if distortion rate > t, Output = Yes, else Output = No. In HMM the decision is based on checking if the probability score is higher than a preset threshold: if probability scores > t, Output = Yes, else Output = No.
13
Hardware Requirements The DSP family we are going to use is TI’s C5X family. The decision about the specific model will be determined after learning the chosen algorithm performance in MATLAB
14
Time Table – First Semester 14.11.01 – Project description presentation 15.12.01 – completion of phase A: literature review and algorithm selection 25.12.01 – Handing out the mid-term report 25.12.01 – Beginning of phase B: algorithm implementation in MATLAB 03.03.02 – Publishing the MATLAB results and selecting the algorithm that will be implemented on the DSP
15
Time Table – Second Semester 26.03.02 – Presenting the progress and planning of the project to the supervisor 27.03.02 – The beginning of the implementation on the DSP 03.03.03 – Project presentation and handing the project final report
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.