Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander.

Similar presentations


Presentation on theme: "Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander."— Presentation transcript:

1 Final Presentation

2 Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander Ronzhin Zdenek Krnoul

3  Finger spelling Speech (F2S & S2F) ◦ Translation between Russian, English, Czech, Turkish

4  Multilingual fingersign alphabet database ◦ Turkish alphabet (5 subjects) ◦ Czech alphabet (4 subjects) ◦ Russian alphabet (2 subjects) ◦ Numbers and special stop signs

5  Semi-Automatic annotation module: ◦ 11 videos each 15-30 minutes Filter Images Select Keyframes Crop Sign- Space Segment Hand Locations

6  Skin color based hand detection ◦ Initialization of model by movement of hands Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

7  Tracking of the hands by Camshift ◦ Hierarchical hand and face redetection ◦ Hand segmentation  Backprojection  Double Differencing Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

8  Two tier classification: ◦ Keyframe Selection ◦ Gesture Recognition  Detection of Keyframes: ◦ Motion of Hands  Displacement of tracked hand centers  Changes in hand external contour ◦ Image Blur  Strength of gradient trace around hand contours Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

9  Hand gesture Descriptors: ◦ Radial Distance Functions ◦ Elliptic Fourier Descriptors ◦ Local Binary Patterns ◦ Hu Moments  Classification of each feature is done by KNN. ◦ Classified results for each feature are fused by voting. ◦ Optional word level fusion with Levenshtein Distance. Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

10  Continuous speech recognition: ◦ A weighted finite-state transducer based speech decoder ◦ 3-gram language model ◦ 100K vocabulary size  News portal based  10843 tri-phone HMM states ◦ 11 Gaussians for acoustic model ◦ 188 hours broadcast news speech data

11  Voice Activity Detection(VAD) ◦ Preprocessing step on continious ASR ◦ Identifies false voice triggers ◦ Employed Methods:  Rabiner’s Method: Energy level and zero-crossing rates of the acoustic waveform  Supervised learning: Energy level of the signal modeled using GMMs

12  Isolated speech recognition: ◦ Phoneme based speech recognition ◦ Represented by HMMs using GMMs ◦ Used for out-of-vocabulary words ◦ Speech Commands allow module control

13  Python Based Web Service ◦ Handles Input/Output from multiple modules ◦ Users communicate using sessions ◦ All messages in utf-8 encoding or transcribed form ◦ Translation of sentences handled by Google Translate ◦ Messages types:  Letter  Word  Sentence

14  Computer speech synthesis given an arbitrary input text  Two TTS systems are applied: ◦ MARY TTS developed by DFKI (Germany) ◦ TTS engine developed by UIIP (Belarus) and SPIIRAS (Russia).  Web-based service ◦ Polls for messages from the web-server.

15  Visual Fingersign output provided through a 3D avatar  Available for two languages: ◦ Czech Sign Alphabet ◦ American Sign Alphabet  Module composed of: ◦ 3D animation model  38 joints and segments (16 for hand) ◦ Trajectory generator  Rotations of body parts handled with Inverse Kinematics  Head and lip motion provided by talking head system  Inputs and outputs words.

16

17  City names game ◦ Module Design: ◦ Fingerspell-> Amsterdam Speech-> Madrid ◦ Fingerspell-> Doha Speech-> Alta ◦ Fingerspell-> Athens Speech-> Sukre ◦ Fingerspell-> Eton Speech-> Nairobi Visual Input (Turkish) Audio Letter Input (Russian) Finger Spelling Recognition Isolated Speech Recognition Finger Spelling Synthesis Speech Synthesis Visual Output (Czech) Audio Output (English) Server (Translator)

18  City names game ◦ Fingerspell-> Amsterdam Speech-> Madrid ◦ Fingerspell-> Doha Speech-> Alta ◦ Fingerspell-> Athens Speech-> Sukre ◦ Fingerspell-> Eton Speech-> Nairobi

19  Casual Continuous Conversation Audio Sentence Input (Turkish) Isolated Speech Recognition Finger Spelling Synthesis Speech Synthesis Visual Output (Czech) Audio Output (English) Server (Translator)

20  Automated language detection for fingerspelling  Further testing  Increasing overall system speed  Addition of missing languages to underlying modules

21

22


Download ppt "Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander."

Similar presentations


Ads by Google