Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh 81271003 June 2005.

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Applications of one-class classification

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

A brief review of non-neural-network approaches to deep learning

Active Appearance Models

Building an ASR using HTK CS4706

DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

Face Alignment with Part-Based Modeling

Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.

3D Face Modeling Michaël De Smet.

 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

LYU0603 A Generic Real-Time Facial Expression Modelling System Supervisor: Prof. Michael R. Lyu Group Member: Cheung Ka Shun ( ) Wong Chi Kin ( )

A Study of Approaches for Object Recognition

Speaker Adaptation for Vowel Classification

Optimal Adaptation for Statistical Classifiers Xiao Li.

Augmented Reality: Object Tracking and Active Appearance Model

UNIVERSITY OF MURCIA (SPAIN) ARTIFICIAL PERCEPTION AND PATTERN RECOGNITION GROUP REFINING FACE TRACKING WITH INTEGRAL PROJECTIONS Ginés García Mateos Dept.

Presented by Pat Chan Pik Wah 28/04/2005 Qualifying Examination

Artificial Intelligence & Information Analysis Group (AIIA) Centre of Research and Technology Hellas INFORMATICS & TELEMATICS INSTITUTE.

ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.

Artificial Neural Networks

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Oral Defense by Sunny Tang 15 Aug 2003

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.

Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.

MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way

7-Speech Recognition Speech Recognition Concepts

Three Topics Facial Animation 2D Animated Mesh MPEG-4 Audio.

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.

Image Compression Supervised By: Mr.Nael Alian Student: Anwaar Ahmed Abu-AlQomboz ID: IT College “Multimedia”

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.

Using Feed Forward NN for EEG Signal Classification Amin Fazel April 2006 Department of Computer Science and Electrical Engineering University of Missouri.

Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

An Artificial Neural Network Approach to Surface Waviness Prediction in Surface Finishing Process by Chi Ngo ECE/ME 539 Class Project.

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.

Video Compression and Standards

Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Facial Expression Analysis Theoretical Results –Low-level and mid-level segmentation –High-level feature extraction for expression analysis (FACS – MPEG4.

Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.

Machine Learning Supervised Learning Classification and Regression

ECE 417 Lecture 1: Multimedia Signal Processing

Efficient Image Classification on Vertically Decomposed Data

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

Artificial Intelligence (CS 370D)

Final Year Project Presentation --- Magic Paint Face

Efficient Image Classification on Vertically Decomposed Data

4.2 Data Input-Output Representation

Neural Networks Advantages Criticism

Categorization by Learning and Combing Object Parts

network of simple neuron-like computing elements

Multimodal Caricatural Mirror

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

End-to-End Speech-Driven Facial Animation with Temporal GANs

Outline Announcement Neural networks Perceptrons - continued

Presentation transcript:

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005

System Description Inputs : Speech signal Outputs: Facial Animation A generic 3D face in MPEG4 standard Speech stream

Agenda MPEG4 Standard Speech Processing Different Approaches Learning Phase Face Feature Extraction Training Neural Networks Experimental Results Conclusion

MPEG4 Standard Multimeida Communication Standard 1999 / Moving Picture Expert Group High quality / Low bit rate Interaction of users with media Object Oriented Object Properties Scalable quality SNHC (Synthetic Natural Hybrid Coding) Synthetic faces and bodies

Facial Animation in MPEG4 FDP (Face Definition Parameters)  Shape 84 Feature Points  Texture FAP ( Face Animation Parameters)  For animating feature points  68 parameter  High level / Low level  Global and local parameters  FAP units

Face Definition Parametes

Face Animation Parameter Units

Speech Processing Phases: Noise Reduction  Simple noise Framing Feature Extraction Speech features: LPC,MFCC, Delta MFCC, Delta Delta MFCC Frame 1 Frame 2 Feature Vector X 1 Feature Vector X 2

Two Approaches Phoneme-Viseme Mapping Approaches Transitions among visemes Discrete phonetic units Extremely stylized Language dependent Acoustic-Visual Mapping Approaches Relation between speech features and facial expressions Functional approximation Language independent Neural networks and HMM : learning machines for mapping

Learning Phase Speaker Video Speech stream Feature Extraction Training NN FAP Extraction FAP Player

Face Feature Extraction Deformable template based approach Semi automatic Candid model A wire frame model For model based coding Parameterized 113 vertex 168 face

Candid Model Parameters of WFM  Global 3d Rotation, 2d Translation, Scale  Shape Units Lip Width, Eyes Distance, …  Action Units Lip Shape, Eyebrow, … Each parameter value is a real number Texture

New Face Generation

Transformation (a 1, b 1 ) P P*P*P*P*   O O*O*O*O* Y X Correspondences: (a 1, b 1 )  (x 1, y 1 ), (a 2, b 2 )  (x 2, y 2 ), (a 3, b 3 )  (x 3, y 3 ),    *    * (a 2, b 2 ) (a 3, b 3 ) (x 2, y 2 ) (x 3, y 3 ) (x 1, y 1 )  **** source target

Transformation (cont.)

New Face Generation

Model Adaptation Selecting Optimal Parameters Global Parameters: 3d Rotation, 2d Translation, Scale Lip Parameters:  Upper Lip  Jaw Open  Lip Width  Lip Corners Vertical Movements Full Search ( expensive ) Using Previous Frame Information

Lip Reading Using of color data to guess lip area Using extracted lip area to guess lip model parameters. Upper lip, jaw open, mouth width, lip corners Using related vertex of Candide model. Two regions from first frame: Lip regions Non lip regions

Lip Area Classification Fisher Linear Discriminant Simple Fast Two point sets X, Y in n dimensions m1 is projection of X on unit vector α m2 is projection of Y on unit vector α Find α that maximizes

Estimating Lip Parameters FLD is trained by first frames pixels rgb data of pixels HSV is better than RGB. Robust in different brightness conditions

Lip Area Classification A simple approach for estimating lip parameters. Column scanning Row scanning

Generating FAPs from model Generating FAP file from model FAP file format Trial and error approach Open source FAP players FAP and wave file as input

Training Neural Networks 60 videos as data set 45 sentences for train 15 sentences for test Multilayer Perceptrons One input layer, One hidden layer, One output layer Back propagation algorithm Nine neuron in output layer Five global parameters Four lip parameters

Training Neural Networks Four speech features LPC, MFCC, Delta MFCC, Delta Delta MFCC Six networks for each speech feature One feature vector as input  30, 60, 90 neuron in hidden layer Three feature vector as input  90, 120, 150 neuron in hidden layer frame rate Video : 25 fps Speech : 50 fps

Generating Results From NNs Generating four lip parameters for each frame

Assessment Criterion A performance metric to measure the predicted accuracy of audio-visual mapping Correlation Coefficients G is one if two vectors are equal k : frame number N : number of frames in the test set

Results For LPC Networks

Results For MFCC Networks

Results For Delta MFCC Networks

Results For Delta Delta MFCC Networks

Conclusion Speech driven facial animation is possible! Delta Delta MFCC has the best performance Using previous and next speech frames improves the performance. Using combination of different speech features

Future Works More train data Speaker independent train data Multi language Other speech features Combination of speech features Facial emotions HMM for storing the mappings

Thanks…