Presentation is loading. Please wait.

Presentation is loading. Please wait.

Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.

Similar presentations


Presentation on theme: "Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant."— Presentation transcript:

1 Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant Analysis x Correlated Audio & Visual Subspaces Co-inertia & Canonical Correlation Analysis

2 Page 2 NOLISP, Paris, May 23rd 2007 Voice conversion techniques Definition: Process of making one person’s voice « source » sounds like another person’s voice target source target Voice conversion My name is John

3 Page 3 NOLISP, Paris, May 23rd 2007 Principle of ALISP Dictionary of representative segments Spectral analysis Prosodic analysis Selection of segmental units Segment index Prosodic parameters Input speech Concatenative synthesis HNM Output speech CODER

4 Page 4 NOLISP, Paris, May 23rd 2007 details of Encoding speech Spectral analysis Prosodic analysis HMM Recognition Dictionary of HMM models of ALISP classes Synth unit A 1 … Synth unit A 8 HMM A Representative units of the class Selection by DTW Prosodic encoding Index of ALISP class Index of synth. unit Pitch, energy, duration

5 Page 5 NOLISP, Paris, May 23rd 2007 Details of decoding Output speech Synth unit A 1 … Synth unit A 8 ALISP Index Synth unit index within class Prosodic parameters Loading Synth unit Concatenative synthesis

6 Page 6 NOLISP, Paris, May 23rd 2007 Principle of Alisp conversion Learning step: one hour of target voice - Parametric analysis: MFCC - Segmentation based on temporal decompostion and vector quantization - Stochastic modelling based on HMM - Creation of representative units Conversion step - Parametric analysis: MFCC - HMM recognition - Selection of representative segment  DTW Synthesis step - Concatenation of representative - HNM synthesis

7 Page 7 NOLISP, Paris, May 23rd 2007 Voice conversion using ALISP results Score distributionDET curve EER before forgery: 16 % (1729 impostors, 1320 clients) EER after forgery : 26 % (1729 impostors, 1320 clients)

8 Page 8 NOLISP, Paris, May 23rd 2007 Voice conversion using ALISP results BREF databaseNIST database Source Result Target SourceTarget Result female male


Download ppt "Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant."

Similar presentations


Ads by Google