Data-Driven Machine Translation for Sign Languages Sara Morrissey PhD topic NCLT/CNGL Workshop 23 rd July 2008
outline background main problems data-driven MT for SLs experiments and results conclusions
background communication interpreters and technological aids machine translation –automatic and confidential –native language of users rule-based approaches (Veale et al., 1998, Marshall & Sáfár, 2002) data-driven approaches –Bauer et al., 1999, Stein et al., 2006, Wu et al., 2007
main problems representation no formally adopted writing system linguistic analysis little research appropriate data difficult to find evaluation visual-spatial nature rules out automatic
data-driven MT for SLs initial prototype system using Dutch SL MaTrEx system Air Traffic Information System (ATIS) Corpus 595 English sentences multi-lingual – ISL parallel corpus creation manual annotation with semantic glosses
data representation (Early morning flights between Cork and Belfast) EARLY MORNING BETWEEN be-CORK CORK FLY BELFAST BETWEEN ref-BELFAST ref-CORK
M A T R E X : data-driven machine translation English ISL bilingual database
translation directions SL RecognitionSL Generation SL Annotation Spoken Language Text
experiments and results machine translation experiments 2 segmentation methodologies type 1 chunks uses Marker Hypothesis (Green, 1979) type 2 uses dual segmentation method 1.Early morning flights between Cork and Belfast 2. early morning flights between Cork and Belfast
experiments and results SystemBLEUWERPER EN—ISL ISL—EN Baseline + T1 chunks + T2 chunks Baseline + T1 chunks + T2 chunks
animation real human signing preferred (Naqvi, 2007) but impractical avatar animation criteria: realistic, consistent, functional, fluid Poser Animation Software Version 6.0 50 randomly selected sentences, 66 hand- crafted videos problem of fluidity
animation ‘or’ ‘e’ how much flight
human evaluation experiments 4 native Deaf human monitors web-based evaluation of 50 ISL translations evaluated intelligibility and fidelity 82% animations = intelligible 72% animations = good-excellent translations HCI analysis using Nielsen’s approach experiments and results
conclusion MT methodology never before applied to SLs multi-component system, bidirectional system practical, technological alternative to help alleviate communication and comprehension for Deaf community positive automatic and manual evaluation scope for incorporating different SL representation methodologies and segmentation techniques
thank you questions?