Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.

Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University lilt@cslt.riit.tsinghua.edu.cn May 26, 2016

Biometric Recognition Speaker Recognition Deep Learning in Speaker Recognition Prospects

Biometric refers to metrics related to human characteristics [wiki]. Term "biometrics" is derived from Greek words bio (life) and metric (to measure). Biometric recognition is a kind of automated technologies for measuring and analyzing an individual's physiological or behavioral characteristics, and can be used to verify or identify an individual.

 Fingerprint  Face  Palmprint  Iris  Retina Scan  DNA  Signatures  Gait/gesture  Keystroke  Voiceprint Physiological Characteristic Behavioral Characteristic

Language Recognition What language was spoken? Accent Recognition Where is he/she from? Speech Recognition What was spoken? Gender Recognition Male or Female? Emotion Recognition Positive? Negative? Happy? Sad? Speaker Recognition Who spoke?

Speaker recognition is the identification of a person from characteristics of voices (voice biometrics). It is also called voiceprint recognition. [wiki]

Speaker Identification  Determining which identity in a specified speaker set is speaking during a given speech segment. Speaker Verification  Determining whether a claimed identity is speaking during a speech segment. It is a binary decision task. Speaker Detection  Determining whether a specified target speaker is speaking during a given speech segment. Speaker Tracking ( Speaker Diarization = Who Spoke When )  Performing speaker detection as a function of time, giving the timing index of the specified speaker.

Advantages of Speaker recognition  Speech signal more accessible  The use more acceptable by users  Remote authentication more convenient Application scenarios  Access control (e.g. voice lock)  Transaction authentication (e.g. remote payment)  Forensic analysis (e.g. police criminal detection)

Time 1930 1960 1970 1980 1990 2000 2010 Feature Model speech waveform spectrogrm LPC, LPCC, PLAR MFCC PLP phone information, deep spk-vector Template matching DTW, VQ, HMM GMM-UBM, GMM-SVM JFA, i-vector Deep learning Keep pace with times Work together to develop Small and clean speech data Big and practical speech data

Machine learning is a subfield of compute science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence [wiki]. Deep learning is a branch of machine learning methods based on learning representations of data. Neural networks is a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data.

Lantian Li, Dong Wang, Zhiyong Zhang, Thomas Fang Zheng, “Deep Speaker Vectors for Semi Text-independent Speaker Verification”, arXiv:1505.06427, 2015.

Segment pooling and dynamic time warping (DTW) Lantian Li, Yiye Lin, Zhiyong Zhang, Dong Wang, “Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition”, APSIPA ASC 2015, pp. 426-429. IEEE, 2015.

Max-margin metric learning  Metric learning: to learn a projection M.  Distance metric:  Goal: to discriminate true speakers and imposters. Lantian Li, Dong Wang, Chao Xing, Thomas Fang Zheng, “Max-margin Metric Learning for Speaker Recognition”, arXiv:1510.05940, 2015.

Multi-task Recurrent Model for Speech and Speaker Recognition Zhiyuan Tang +, Lantian Li +, Dong Wang, “Multi-task Recurrent Model for Speech and Speaker Recognition”, arXiv:1603.09643, 2016.

Sequence model --> Deep speaker embedding Encoder-decoder + attention model

Thank you http://lilt.cslt.org/

Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.

Similar presentations

Presentation on theme: "Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.

Similar presentations

Presentation on theme: "Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016."— Presentation transcript:

Similar presentations

About project

Feedback