Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Motion Capture The process of recording movement and translating that movement onto a digital model Games Fast Animation Movies Bio Medical Analysis VR.
Virtual Me. Motion Capture (mocap) Motion capture is the process of simulating actual movement in a computer generated environment The capture subject.
DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.
Department of Electrical and Computer Engineering He Zhou Hui Zheng William Mai Xiang Guo Advisor: Professor Patrick Kelly ASLLENGE Midway Design review.
M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
Department of Electrical and Computer Engineering He Zhou Hui Zheng William Mai Xiang Guo Advisor: Professor Patrick Kelly ASLLENGE.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
SOMM: Self Organizing Markov Map for Gesture Recognition Pattern Recognition 2010 Spring Seung-Hyun Lee G. Caridakis et al., Pattern Recognition, Vol.
Potential Projects RGBD gesture recognition with the Microsoft Kinect Person recognition by parts.
Broadcast News Parsing Using Visual Cues: A Robust Face Detection Approach Yannis Avrithis, Nicolas Tsapatsoulis and Stefanos Kollias Image, Video & Multimedia.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Recent Developments in Human Motion Analysis
Create Photo-Realistic Talking Face Changbo Hu * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Auditory User Interfaces
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Jan SedmidubskySeptember 23, 2014Motion Retrieval for Security Applications Jan Sedmidubsky Jakub Valcik Pavel Zezula Motion Retrieval for Security Applications.
HAND GESTURE BASED HUMAN COMPUTER INTERACTION. Hand Gesture Based Applications –Computer Interface A 2D/3D input device (Hand Tracking) Translation of.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Prepared By: Menna Hamza Mohamed Mohamed Hesham Fadl Mona Abdel Mageed El-Koussy Yasmine Shaker Abdel Hameed Supervised By: Dr. Magda Fayek.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Hand Gesture Recognition System for HCI and Sign Language Interfaces Cem Keskin Ayşe Naz Erkan Furkan Kıraç Özge Güler Lale Akarun.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
出處: Signal Processing and Communications Applications, 2006 IEEE 作者: Asanterabi Malima, Erol Ozgur, and Miijdat Cetin 2015/10/251 指導教授:張財榮 學生:陳建宏 學號: M97G0209.
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Spectrograms Revisited Feature Extraction Filter Bank Analysis EEG.
卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主 持 人 : 傅立成 共同主持人 : 李琳山,歐陽明,洪一平, 陳祝嵩 水美溫泉會館研討會
Head Tracking in Meeting Scenarios Sascha Schreiber.
Model of the Human  Name Stan  Emotion Happy  Command Watch me  Face Location (x,y,z) = (122, 34, 205)  Hand Locations (x,y,z) = (85, -10, 175) (x,y,z)
KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Module Overview. Aims apply your programming skills to an applied study of Digital Image Processing, Digital Signal Processing and Neural Networks investigate.
MULTI-LINGUAL AND DEVICELESS COMPUTER ACCESS FOR DISABLED USERS C.Premnath and J.Ravikumar S.S.N. College of Engineering TamilNadu.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Animated Speech Therapist for Individuals with Parkinson Disease Supported by the Coleman Institute for Cognitive Disabilities J. Yan, L. Ramig and R.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
Multimedia Systems and Communication Research Multimedia Systems and Communication Research Department of Electrical and Computer Engineering Multimedia.
Design & Implementation of a Gesture Recognition System Isaac Gerg B.S. Computer Engineering The Pennsylvania State University.
University of West Bohemia Faculty of Applied Sciences Department of Cybernetics Enterface 2008 Milos Zelezny –
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
1 st State T 2 nd State T 3 rd State T “[t]” Phoneme “track” “down” “and” “neutralize” “terrorists” “Track down and neutralize terrorists.” Word Sentence.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
KINECT AMERICAN SIGN TRANSLATOR (KAST)
3D Puppetry: A Kinect-based Interface for 3D Animation
MikeTalk:An Adaptive Man-Machine Interface
University of West Bohemia – Department of Cybernetics
Pilar Orero, Spain Yoshikazu SEKI, Japan 2018
Multimedia Information Retrieval
Human-centered Interfaces
Multimodal Caricatural Mirror
Project #2 Multimodal Caricatural Mirror Intermediate report
Interactive media.
Listen Attend and Spell – a brief introduction
Presentation transcript:

Final Presentation

Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander Ronzhin Zdenek Krnoul

 Finger spelling Speech (F2S & S2F) ◦ Translation between Russian, English, Czech, Turkish

 Multilingual fingersign alphabet database ◦ Turkish alphabet (5 subjects) ◦ Czech alphabet (4 subjects) ◦ Russian alphabet (2 subjects) ◦ Numbers and special stop signs

 Semi-Automatic annotation module: ◦ 11 videos each minutes Filter Images Select Keyframes Crop Sign- Space Segment Hand Locations

 Skin color based hand detection ◦ Initialization of model by movement of hands Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

 Tracking of the hands by Camshift ◦ Hierarchical hand and face redetection ◦ Hand segmentation  Backprojection  Double Differencing Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

 Two tier classification: ◦ Keyframe Selection ◦ Gesture Recognition  Detection of Keyframes: ◦ Motion of Hands  Displacement of tracked hand centers  Changes in hand external contour ◦ Image Blur  Strength of gradient trace around hand contours Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

 Hand gesture Descriptors: ◦ Radial Distance Functions ◦ Elliptic Fourier Descriptors ◦ Local Binary Patterns ◦ Hu Moments  Classification of each feature is done by KNN. ◦ Classified results for each feature are fused by voting. ◦ Optional word level fusion with Levenshtein Distance. Video Input (Turkish or Czech) Skin Color Detection Keyframe Selection Text Output (UTF 8) Tracking and Segmentation of hands Feature Extraction & Classification

 Continuous speech recognition: ◦ A weighted finite-state transducer based speech decoder ◦ 3-gram language model ◦ 100K vocabulary size  News portal based  tri-phone HMM states ◦ 11 Gaussians for acoustic model ◦ 188 hours broadcast news speech data

 Voice Activity Detection(VAD) ◦ Preprocessing step on continious ASR ◦ Identifies false voice triggers ◦ Employed Methods:  Rabiner’s Method: Energy level and zero-crossing rates of the acoustic waveform  Supervised learning: Energy level of the signal modeled using GMMs

 Isolated speech recognition: ◦ Phoneme based speech recognition ◦ Represented by HMMs using GMMs ◦ Used for out-of-vocabulary words ◦ Speech Commands allow module control

 Python Based Web Service ◦ Handles Input/Output from multiple modules ◦ Users communicate using sessions ◦ All messages in utf-8 encoding or transcribed form ◦ Translation of sentences handled by Google Translate ◦ Messages types:  Letter  Word  Sentence

 Computer speech synthesis given an arbitrary input text  Two TTS systems are applied: ◦ MARY TTS developed by DFKI (Germany) ◦ TTS engine developed by UIIP (Belarus) and SPIIRAS (Russia).  Web-based service ◦ Polls for messages from the web-server.

 Visual Fingersign output provided through a 3D avatar  Available for two languages: ◦ Czech Sign Alphabet ◦ American Sign Alphabet  Module composed of: ◦ 3D animation model  38 joints and segments (16 for hand) ◦ Trajectory generator  Rotations of body parts handled with Inverse Kinematics  Head and lip motion provided by talking head system  Inputs and outputs words.

 City names game ◦ Module Design: ◦ Fingerspell-> Amsterdam Speech-> Madrid ◦ Fingerspell-> Doha Speech-> Alta ◦ Fingerspell-> Athens Speech-> Sukre ◦ Fingerspell-> Eton Speech-> Nairobi Visual Input (Turkish) Audio Letter Input (Russian) Finger Spelling Recognition Isolated Speech Recognition Finger Spelling Synthesis Speech Synthesis Visual Output (Czech) Audio Output (English) Server (Translator)

 City names game ◦ Fingerspell-> Amsterdam Speech-> Madrid ◦ Fingerspell-> Doha Speech-> Alta ◦ Fingerspell-> Athens Speech-> Sukre ◦ Fingerspell-> Eton Speech-> Nairobi

 Casual Continuous Conversation Audio Sentence Input (Turkish) Isolated Speech Recognition Finger Spelling Synthesis Speech Synthesis Visual Output (Czech) Audio Output (English) Server (Translator)

 Automated language detection for fingerspelling  Further testing  Increasing overall system speed  Addition of missing languages to underlying modules