Correlational and Regressive Analysis of the Relationship between Tongue and Lips Motion - An EMA and Video Study of Selected Polish Speech Sounds Robert Wielgat, Łukasz Mik, Polytechnic Institute, State Higher Vocational School in Tarnów, Tarnów, POLAND Anita Lorenc, Department of Speech Pathology and Applied Linguistics, Maria Curie-Sklodowska University Lublin, POLAND Department of Speech and Language Therapy and Voice Production, Warsaw University, Warsaw, POLAND 1
Presentation Plan The presentation is divided into four parts: About EMA and fundamentals of the acquisition system Correlational and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion
Presentation Plan The presentation is divided onto four parts: About EMA and fundamentals of the acquisition system Correlation and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion
Presentation Plan The presentation is divided onto four parts: About EMA and fundamentals of the acquisition system Correlational and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion
Presentation Plan The presentation is divided onto four parts: About EMA and fundamentals of the acquisition system Correlational and regressive analysis Gathered data and their analysis Conclusion and future research Introduction Problem Statement Results and Discusion Conclusion
Electromagnetic Articulography (EMA) Electromagnetic Articoulography (EMA) – 3D imaging of speech articoulator movements (tongue, lips, palate, jaw). The movement of articulators is visualised due to small sensors (coils) fixed to articoulators e.g. the tongue. Sensors are moving in an electromagnetic field produced by 6 transmitters. Each transmitter produces an alternating magnetic field at different frequencies. The alternating magnetic field induces an alternating current in the sensors, and allows to obtain the distances of each sensor from six transmitters. It is then possible to calculate in the real time the XYZ coordinates as well as two angles of the sensors. Introduction Introduction Preliminary Research Problem Statement Current and Future Research Results and Discusion Summary Conclusion
Block Diagram of the EMA System Right Front Camcorder screen Introduction Introduction EMA Sensors Problem Statement Preliminary Research Central Front Cam- corder Left Front Camcorder Computer Results and Discusion Current and Future Research Frame Grabbers Electromagnetic Articulograph AG 500 Synchronizer Summary Conclusion
Articulograph AG 500 For the research purpose the AG-500 EMA was used. Transmitter coils 12 sensors (small coils) AG 500 EMA cube calibrator Introduction Introduction Problem Statement Positions of the sensors were recorded every 5ms in the x, y, z space The accuracy of the measurement is 0.5 mm Results and Discusion Future Research Summary Conclusion
Location of EMA sensors Introduction Introduction Z X Problem Statement Results and Discusion Future Research Summary Conclusion EMA sensor placement on the tongue, lips and mandible.
Location of EMA sensors Introduction Introduction Z X Problem Statement Results and Discusion Future Research Summary Conclusion EMA sensor placement on the tongue, lips and mandible.
Location of EMA sensors Introduction Introduction Z X Problem Statement Results and Discusion Future Research Summary Conclusion EMA sensor placement on the tongue, lips and mandible.
Vision System Three camcorders recording the pictures simultaneously will be used. B/W Camera parameters: model: Point Grey Gazelle GZL-CL-22C5M-C Resolution: 2048 x 1088 Frame rate: 280 fps (200 fps) Interface: Camera Link Synchronization: via external trigger or software trigger Good sensitivity at IR range Introduction Introduction Problem Statement Results and Discusion Current and Future Research Summary Conclusion
Face Markers’ Placement In order to capture the motion of the face, white dots (markers) were placed on the face of the speaker. Each is 4 millimeters in diameter. 7 reference markers - forehead, temples and nasal bridge 9 markers on lower jaw (mandible) 3 markers on beard 9 markers on zygomatic bone and cheeks 8 markers on lips 5 markers on the nose and nearby 1 marker on larynx Introduction Introduction Problem Statement Preliminary Research Results and Discusion Current and Future Research Current and Future Research Summary Summary Conclusion
Calculation of the Markers’ Coordinates Calculation of Markers’ Coordinates for the present research purpose was accomplished uisng only front face image. The method of calculation included three steps: Image thresholding Opening operation in order to remove small objects not being markers from the image Finding centers of markers using the Circular Hough Transform (CHT) Introduction Introduction Problem Statement Preliminary Research Results and Discusion Current and Future Research Current and Future Research Summary Summary Conclusion
EMA Sensor Placement in the Research Introduction Introduction Problem Statement Results and Discusion Future Research 5 sensors on the tongue 2 sensors on lips 1 sensor on the border of lower inscisors and gums 1 sensor for making palate contour 3 control sensors (placed on forehead and bones behind ears) Summary Conclusion
Pearson Correlation Coefficient In order to find optimal conditions for determining of correlation between EMA sensor signals and signals representing video markers’ trajectories two video markers have been placed directly on two EMA sensors: for lower lip (sensor LL) and for upper lip (sensor UL). Pearson correlation coefficient was calculated which is defined as: Introduction Introduction Introduction Problem Statement Results and Discusion Current and Future Research Current and Future Research Summary Conclusion Summary
Regression Analysis Introduction Introduction Introduction Problem Statement Results and Discusion Current and Future Research Current and Future Research Conclusion Summary Summary Regressive dependency between Z coordinates of sensors placed on the lower lip and tongue tip.
Analysed Speech Signal Introduction Waveform and spectrogram of the utterance ‘aca’ [at͡sa] extracted from the word ‘macanie’ Problem Statement Results and Discusion Conclusion
Influence of Delay of Acquisition Time on Correlation The relationship between Z coordinate of the LL video marker and Z coordinate of the LL EMA sensor without head movement correction for the segment [at͡sa] from the word ‘płacami’. Introduction recorded simultaneously Problem Statement Results and Discusion Delayed by 5 ms Conclusion
Correlation between LL video marker and LL EMA sensor - Results Pearson correlation coefficients between z coordinate of EMA sensor position and z coordinate of video marker position for different measurement conditions Introduction sensor TC correlation coefficients for word No: mean 89 97 137 261 LL no -0,961 -0,926 -0,947 -0,964 -0,950 yes -0,987 -0,962 -0,963 -0,986 -0,974 UL -0,985 -0,938 -0,853 -0,892 -0,917 -0,990 -0,868 -0,931 -0,934 Problem Statement Results and Discusion Conclusion Abbreviations in the Table: TC – time correction; LL – lower lip; UL – upper lip
Jitter of Reference Points Euclidean distance between 2 reference EMA sensors during recording of single word. Reference points are on the nose bridge and the mastoid processes behing left and right ear. Introduction Problem Statement Video reference markers Results and Discusion Conclusion Euclidean distance between 2 reference video markers during a recording of a single word.
Sensor Tilt Z dZ2 dZ1 dX2 dX1 > dX2 dZ1 < dZ2 dX1 Introduction Marker dZ1 dX1 dZ2 dX2 dX1 > dX2 dZ1 < dZ2 X Z Introduction Problem Statement Results and Discusion Conclusion Changes between EMA sensor and video marker distances in X and Z direction dependently on the sensor tilt
Dependency of LLz vs TTz Introduction Problem Statement Results and Discusion Conclusion Regressive dependency between Z coordinates of lower lip (LL) and tongue tip (TT) EMA sensors for whole utterance `aca`
LLz vs TTz – phonetic segmentation Introduction Problem Statement Regressive relationships between Z coordinates of lower lip (LL) and tongue tip (TT) EMA sensors for phonemes extracted from the utterance `aca` after phonetic segmentation Results and Discusion Conclusion
LLz vs TTz – articulatory segmentation Introduction Problem Statement Results and Discusion Conclusion Regressive relationships between Z coordinates of lower lip (LL) and tongue tip (TT) EMA sensors for segments extracted from the utterance `aca` after articulatory segmentation
Conclusion Conclusion delay time between EMA sensor signals and video recordings should be properly adjusted strong correlational or regressive relationships exist between coordinates of visible and invisible EMA sensors Future research finding transformation of video marker position coordinates into the reference system related with skull bones more comprehensive study on correlational and regressive relationships between inner and external sensors (markers). Finding efficient methods of the articulatory segmentation. Introduction Introduction Preliminary Research Problem Statement Current and Future Research Results and Discusion Summary Conclusion
Acknowledgment The research was supported by The Polish National Science Centre grant Nr 2012/05/E/HS2/03770 titled „Polish Language Pronunciation. Analysis Using 3-dimensional Articulography”
Thank you for your attention