Correlational and Regressive Analysis of the Relationship between Tongue and Lips Motion - An EMA and Video Study of Selected Polish Speech Sounds Robert.

Slides:



Advertisements
Similar presentations
Evidential modeling for pose estimation Fabio Cuzzolin, Ruggero Frezza Computer Science Department UCLA.
Advertisements

Miroslav Hlaváč Martin Kozák Fish position determination in 3D space by stereo vision.
A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.
Tom Lentz (slides Ivana Brasileiro)
Introduction to Eye Tracking
Virtual Me. Motion Capture The process of recording movement and translating that movement onto a digital model Originally used for military tracking.
QR Code Recognition Based On Image Processing
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
A two dimensional kinematic mapping between speech acoustics and vocal tract configurations : WISP A.Hatzis, P.D.Green1 History of Vowel.
English Phonetics and Phonology Presented by Sergio A. Rojas.
Collection of speech production ultrasound data Donald Derrick 12, Romain Fiasson 2 and Catherine T. Best 1 1 University of Western Sydney (MARCS Institute)
Phonetics: The vocal tract Basic framework for describing speech sounds.
Speech Group INRIA Lorraine
Chapter 13 Facial Bones Part 1.
MEG Experiments Stimulation and Recording Setup Educational Seminar Institute for Biomagnetism and Biosignalanalysis February 8th, 2005.
Presented By Motion Capture Group: Azadeh Jamalian Ata Naemi Sa'ed Abu-Alhaija Sunghoon Ivan Lee SensIT Technology.
Tracking Migratory Birds Around Large Structures Presented by: Arik Brooks and Nicholas Patrick Advisors: Dr. Huggins, Dr. Schertz, and Dr. Stewart Senior.
Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
3D Measurements by PIV  PIV is 2D measurement 2 velocity components: out-of-plane velocity is lost; 2D plane: unable to get velocity in a 3D volume. 
1 The University of South Florida audiovisual phoneme database v 1.0 Frisch, S.A., Stearns, A.M., Hardin, S.A., & Nikjeh, D.A. University of South Florida.
Introduce about sensor using in Robot NAO Department: FTI-FHO-FPT Presenter: Vu Hoang Dung.
Muhammad Sohaib Shahid (Lecturer & Course Co-ordinator MID) University Institute of Radiological Sciences & Medical Imaging Technology (UIRSMIT)
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
A HIGH RESOLUTION 3D TIRE AND FOOTPRINT IMPRESSION ACQUISITION DEVICE FOR FORENSICS APPLICATIONS RUWAN EGODA GAMAGE, ABHISHEK JOSHI, JIANG YU ZHENG, MIHRAN.
Interarticulator programming in VCV sequences: Effects of closure duration on lip and tongue coordination Anders Löfqvist Haskins Laboratories New Haven,
Abstract Research Questions The present study compared articulatory patterns in production of dental stop [t] with conventional dentures to productions.
Ondřej Rozinek Czech Technical University in Prague Faculty of Biomedical Engineering 3D Hand Movement Analysis in Parkinson’s Disease
Shinta Kisriani.  INTRODUCTION  THEORY LITERATURE  METHOD DESIGN  ANALYSIS & RESULT  CONCLUSION  FUTURE WORK.
Lecture #3.  Axial skeleton – skull, vertebral column, ribs, sternum  Appendicular skeleton – pectoral girdle, pelvic girdle, limbs.
SS5305 – Motion Capture Initialization 1. Objectives Camera Setup Data Capture using a Single Camera Data Capture using two Cameras Calibration Calibration.
Network Computing Laboratory Radio Interferometric Geolocation Miklos Maroti, Peter Volgesi, Sebestyen Dora Branislav Kusy, Gyorgy Balogh, Andras Nadas.
CCTV Camera Component and Technology Imaging sensor Optic - Lens Camera Technology IP vs analogue CCTV Uniview IPC features Fundamental of CCTV.
Time state Athanassios Katsamanis, George Papandreou, Petros Maragos School of E.C.E., National Technical University of Athens, Athens 15773, Greece Audiovisual-to-articulatory.
COCHLEAR IMPLANTS Brittany M. Alphonse Biomedical Engineering BME 181.
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.
Copyright ©2012 by Pearson Education, Inc. All rights reserved. Essentials of Dental Radiography for Dental Assistants and Hygienists, Ninth Edition Evelyn.
Motion Capture and Analysis of Tongue During Speech by Jared Kiraly.
Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Projector Calibration of Interactive Multi-Resolution Display Systems 互動式多重解析度顯示系統之投影機校正 Presenter: 邱柏訊 Advisor: 洪一平 教授.
EEC 490 GROUP PRESENTATION: KINECT TASK VALIDATION Scott Kruger Nate Dick Pete Hogrefe James Kulon.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
5. Vowels he who.
2.3 Markedness Differential Hypothesis (MDH)
Tracking Systems in VR.
Lecture 1 Phonetics – the study of speech sounds
The Body and Health 3 Parts of the Body: The Head.
Soran University- College of Education English Department Articulatory phonetics/Speech organs Talib M. Sharif Omer Assistant lecturer
Observing Lip and Vertical Larynx Movements During Smiled Speech (and Laughter) - work in progress - Sascha Fagel 1, Jürgen Trouvain 2, Eva Lasarcyk 2.
Physics Requirements Sensitivity to Manufacturing Imperfections Strategy  where to map field  measure deviation from ideal model  fit to error tables.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Unit Two The Organs of speech
Welcome to all.
Robert Wielgat, Daniel Król Department of Technology
Introduction   Many 3-D pronunciation tutors with both internal and external articulator movements have been implemented and applied to computer-aided.
Introduction   Many 3-D pronunciation tutors with both internal and external articulator movements have been implemented and applied to computer-aided.
Acoustic to Articoulatory Speech Inversion by Dynamic Time Warping
Graphics Fundamentals
Video-based human motion recognition using 3D mocap data
Vehicle Segmentation and Tracking in the Presence of Occlusions
Pronunciations of the Letters
a t m o s p h e r e s e n s o r t e c h n o l o g i e s
Dysarthria Dysarthria is a motor speech disorder.
Parts of the Body: The Head
UNIT 3 THE CONSCIOUS SELF
The growth of the face stops around age 16. There are 14 facial bones.
A Japanese trilogy: Segment duration, articulatory kinematics, and interarticulator programming Anders Löfqvist Haskins Laboratories New Haven, CT.
Elmar Nöth, Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies Friday, 13 September 2019.
 The Outer Canthus of the eye is the fold of tissue at the outer corner of the eyelids.  The Inner Canthus of the eye is the fold of tissue at the.
Presentation transcript:

Correlational and Regressive Analysis of the Relationship between Tongue and Lips Motion - An EMA and Video Study of Selected Polish Speech Sounds Robert Wielgat, Łukasz Mik, Polytechnic Institute, State Higher Vocational School in Tarnów, Tarnów, POLAND Anita Lorenc, Department of Speech Pathology and Applied Linguistics, Maria Curie-Sklodowska University Lublin, POLAND Department of Speech and Language Therapy and Voice Production, Warsaw University, Warsaw, POLAND 1

Presentation Plan The presentation is divided into four parts: About EMA and fundamentals of the acquisition system Correlational and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion

Presentation Plan The presentation is divided onto four parts: About EMA and fundamentals of the acquisition system Correlation and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion

Presentation Plan The presentation is divided onto four parts: About EMA and fundamentals of the acquisition system Correlational and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion

Presentation Plan The presentation is divided onto four parts: About EMA and fundamentals of the acquisition system Correlational and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion

Electromagnetic Articulography (EMA) Electromagnetic Articoulography (EMA) – 3D imaging of speech articoulator movements (tongue, lips, palate, jaw). The movement of articulators is visualised due to small sensors (coils) fixed to articoulators e.g. the tongue. Sensors are moving in an electromagnetic field produced by 6 transmitters. Each transmitter produces an alternating magnetic field at different frequencies. The alternating magnetic field induces an alternating current in the sensors, and allows to obtain the distances of each sensor from six transmitters. It is then possible to calculate in the real time the XYZ coordinates as well as two angles of the sensors. Introduction Introduction Preliminary Research Problem Statement Current and Future Research Results and Discusion Summary Conclusion

Block Diagram of the EMA System Right Front Camcorder screen Introduction Introduction EMA Sensors Problem Statement Preliminary Research Central Front Cam- corder Left Front Camcorder Computer Results and Discusion Current and Future Research Frame Grabbers Electromagnetic Articulograph AG 500 Synchronizer Summary Conclusion

Articulograph AG 500 For the research purpose the AG-500 EMA was used. Transmitter coils 12 sensors (small coils) AG 500 EMA cube calibrator Introduction Introduction Problem Statement Positions of the sensors were recorded every 5ms in the x, y, z space The accuracy of the measurement is 0.5 mm Results and Discusion Future Research Summary Conclusion

Location of EMA sensors Introduction Introduction Z X Problem Statement Results and Discusion Future Research Summary Conclusion EMA sensor placement on the tongue, lips and mandible.

Location of EMA sensors Introduction Introduction Z X Problem Statement Results and Discusion Future Research Summary Conclusion EMA sensor placement on the tongue, lips and mandible.

Location of EMA sensors Introduction Introduction Z X Problem Statement Results and Discusion Future Research Summary Conclusion EMA sensor placement on the tongue, lips and mandible.

Vision System Three camcorders recording the pictures simultaneously will be used. B/W Camera parameters: model: Point Grey Gazelle GZL-CL-22C5M-C Resolution: 2048 x 1088 Frame rate: 280 fps (200 fps) Interface: Camera Link Synchronization: via external trigger or software trigger Good sensitivity at IR range Introduction Introduction Problem Statement Results and Discusion Current and Future Research Summary Conclusion

Face Markers’ Placement In order to capture the motion of the face, white dots (markers) were placed on the face of the speaker. Each is 4 millimeters in diameter. 7 reference markers - forehead, temples and nasal bridge 9 markers on lower jaw (mandible) 3 markers on beard 9 markers on zygomatic bone and cheeks 8 markers on lips 5 markers on the nose and nearby 1 marker on larynx Introduction Introduction Problem Statement Preliminary Research Results and Discusion Current and Future Research Current and Future Research Summary Summary Conclusion

Calculation of the Markers’ Coordinates Calculation of Markers’ Coordinates for the present research purpose was accomplished uisng only front face image. The method of calculation included three steps: Image thresholding Opening operation in order to remove small objects not being markers from the image Finding centers of markers using the Circular Hough Transform (CHT) Introduction Introduction Problem Statement Preliminary Research Results and Discusion Current and Future Research Current and Future Research Summary Summary Conclusion

EMA Sensor Placement in the Research Introduction Introduction Problem Statement Results and Discusion Future Research 5 sensors on the tongue 2 sensors on lips 1 sensor on the border of lower inscisors and gums 1 sensor for making palate contour 3 control sensors (placed on forehead and bones behind ears) Summary Conclusion

Pearson Correlation Coefficient In order to find optimal conditions for determining of correlation between EMA sensor signals and signals representing video markers’ trajectories two video markers have been placed directly on two EMA sensors: for lower lip (sensor LL) and for upper lip (sensor UL). Pearson correlation coefficient was calculated which is defined as: Introduction Introduction Introduction   Problem Statement Results and Discusion Current and Future Research Current and Future Research   Summary Conclusion Summary

Regression Analysis Introduction Introduction Introduction Problem Statement Results and Discusion Current and Future Research Current and Future Research Conclusion Summary Summary Regressive dependency between Z coordinates of sensors placed on the lower lip and tongue tip.

Analysed Speech Signal Introduction Waveform and spectrogram of the utterance ‘aca’ [at͡sa] extracted from the word ‘macanie’ Problem Statement Results and Discusion Conclusion

Influence of Delay of Acquisition Time on Correlation The relationship between Z coordinate of the LL video marker and Z coordinate of the LL EMA sensor without head movement correction for the segment [at͡sa] from the word ‘płacami’. Introduction recorded simultaneously Problem Statement Results and Discusion Delayed by 5 ms Conclusion

Correlation between LL video marker and LL EMA sensor - Results Pearson correlation coefficients between z coordinate of EMA sensor position and z coordinate of video marker position for different measurement conditions Introduction sensor TC correlation coefficients for word No: mean 89 97 137 261 LL no -0,961 -0,926 -0,947 -0,964 -0,950 yes -0,987 -0,962 -0,963 -0,986 -0,974 UL -0,985 -0,938 -0,853 -0,892 -0,917 -0,990 -0,868 -0,931 -0,934 Problem Statement Results and Discusion Conclusion Abbreviations in the Table: TC – time correction; LL – lower lip; UL – upper lip

Jitter of Reference Points Euclidean distance between 2 reference EMA sensors during recording of single word. Reference points are on the nose bridge and the mastoid processes behing left and right ear. Introduction Problem Statement Video reference markers Results and Discusion Conclusion Euclidean distance between 2 reference video markers during a recording of a single word.

Sensor Tilt Z dZ2 dZ1 dX2 dX1 > dX2 dZ1 < dZ2 dX1 Introduction Marker dZ1 dX1 dZ2 dX2 dX1 > dX2 dZ1 < dZ2 X Z Introduction Problem Statement Results and Discusion Conclusion Changes between EMA sensor and video marker distances in X and Z direction dependently on the sensor tilt

Dependency of LLz vs TTz Introduction Problem Statement Results and Discusion Conclusion Regressive dependency between Z coordinates of lower lip (LL) and tongue tip (TT) EMA sensors for whole utterance `aca`

LLz vs TTz – phonetic segmentation Introduction Problem Statement Regressive relationships between Z coordinates of lower lip (LL) and tongue tip (TT) EMA sensors for phonemes extracted from the utterance `aca` after phonetic segmentation Results and Discusion Conclusion

LLz vs TTz – articulatory segmentation Introduction Problem Statement Results and Discusion Conclusion Regressive relationships between Z coordinates of lower lip (LL) and tongue tip (TT) EMA sensors for segments extracted from the utterance `aca` after articulatory segmentation

Conclusion Conclusion delay time between EMA sensor signals and video recordings should be properly adjusted strong correlational or regressive relationships exist between coordinates of visible and invisible EMA sensors Future research finding transformation of video marker position coordinates into the reference system related with skull bones more comprehensive study on correlational and regressive relationships between inner and external sensors (markers). Finding efficient methods of the articulatory segmentation. Introduction Introduction Preliminary Research Problem Statement Current and Future Research Results and Discusion Summary Conclusion

Acknowledgment The research was supported by The Polish National Science Centre grant Nr 2012/05/E/HS2/03770 titled „Polish Language Pronunciation. Analysis Using 3-dimensional Articulography”

Thank you for your attention