Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009.

Similar presentations


Presentation on theme: "Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009."— Presentation transcript:

1 Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009

2 Why is there no writing system? Social reasons Variation and demographic spread Political reasons Recognition Linguistic reasons Visual-gestural-spatial languages, simultaneous phoneme production

3 Implications of the lack of writing system …for Deaf people Forced use language not native …for the languages social acceptance  standardisation (Pizzuto, 2006) … for MT Limits availability of domain-specific corpora No standards, difficult to compare systems Significance of results on small datasets Difficult to use NLP tools developed for spoken langs

4 Sign Language Representation Formats Linear Stokoe Notation, HamNoSys Multi-level Gloss, Partition/Constitute, Movement- Hold, SiGML Iconic SignWriting

5 Linear Symbolic Notations Stokoe Notation: “don’t know” HamNoSys Notation: “nineteen”

6 Multi-level Representations Movement-Hold Partition/Constitute Gloss Annotation SiGML

7 Iconic Sign Writing

8 But different groups, different requirements (Pizzuto et al, 2006): the aspect of a language chosen for its representation, is largely dictated by the society and culture developing the writing system and what purpose and settings such communication is required for. Deaf, linguists, language processors…

9 Requirements for MT large bilingual domain-specific corpus of good quality digital data gold standard reference segmentation algorithms for separating words, phrases and sentences alignment methodologies for these units. searching the source and target texts acceptable capturing of the language for output

10 Discussion of current methods Stokoe (Stokoe, 1960) –Difficult to capture classifiers and NMFs –Decontextualised signs only –ASCII version (Mandel, 1993) HamNoSys (Prillwitz, 1989) –NMFs included –Subsection of 150 symbols for handwriting purposes –Mac usage, Windows font

11 Discussion of current methods (2) Gloss Annotation: (Leeson et al., 2006, Neidle et al., 2002) –Most commonly used in MT and by linguists –No universal conventions –Extensible –Using one language to describe another –Allows for simultaneous timed logging of features –Tools widely available –SL and linguistic knowledge a requirement –No knowledge of supplementary symbolic system required

12 Discussion of current methods (3) Partition/Constitute (Huenerfauth, 2005) –Captures movement, classifier and spatial info –Comprehensive, hierarchical rep’n –Implicit use of gloss terms Movement-Hold (Liddell & Johnson, 1989) –Numerically-encoded handshapes –Multi-layer –Used with recognition technology (Vogler & Metaxas, 2004)

13 Discussion of current methods (4) SiGML (Elliott et al., 2004) –Describes HamNoSys for animation (ViSiCAST) –Double representation SignWriting (Sutton, 1995) –Compact icons –Information displayed in one place –Advocated by SL linguists and growing Deaf –Not currently machine readable

14 Worked Example “Data-driven Machine Translation for Sign Languages” (Morrissey, 2008) MaTrEx MT system Glossed Annotations of Irish Sign Language (ISL) and German Sign Language (DGS) Air Traffic Information System corpus of ~600 sentences Translated and signed by native Deaf signers

15 Hand-crafted gloss annotation corpus

16 Translation Directions

17 MaTrEx Experiments ISL gloss-to-English text –Baseline –SMT –EBMT 1 –EBMT 2 –Distortion limit

18 ISL-EN MaTrEx Experiments BLEUWERPER Annotation Baseline 25.2060.3150.42 SMT51.6339.3229.79 EBMT 150.6937.7530.76 EBMT 249.7639.9232.44

19 EN-ISL MaTrEx Experiments BLEUWERPER ISL-EN best scores 52.1838.4839.67 SMT38.8546.0234.33 EBMT 139.1145.9034.20 EBMT 239.0546.0234.21

20 Other experiments ISL  DE, DGS  DE, DGS  EN –ISL  EN best scores, by 6.38% BLEU –EBMT 1 chunks improves for ISL-DE only –EBMT 2 chunks improves for ISL-DE only DE  ISL, DE  DGS, EN  DGS –EN  DGS best scores, by 1.3% BLEU –EBMT 1 chunks improves for EN  DGS & EN  ISL –EBMT 2 chunks improves for all Comparison with RWTH system –We’re better!  ~2-6% BLEU ISL video recognition Speech output

21 ISL Animation Poser software Hand-crafted 66 videos, 50 sentences Played in sequence 4 Deaf evaluators 2 x 4-point scale 82% - intelligibility 72% - fidelity Questionnaire Demo

22 Thesis Conclusions Good results can be obtained Glossing most appropriate, but not going forward –Allowed linguistic-based alignment –Linear, easily accessible format –Lack of NMF detail, time-consuming, not considered adequate representation of language EBMT chunks show potential but require more development Development of animation module

23 Where do we go from here? (the words are coming out all weird…) What is the most appropriate SL representation for MT? –Adequately represents the language, –Animation production, –Facilitates the translation process.

24 Rep’n overview, redux Glossing: machine readable, doesn’t adequately represent the language or facilitate animation Stokoe: ASCII version, not adequate rep’n Partition/Constitute: multi-layered, uses glosses Movement-Hold: multi-layered, uses glosses Sign Writing: compact icons, accepted, potential readability, not machine readable at present … HamNoSys & SiGML: machine readable, comprehensive description, adapted for animation, suited to SMT

25 The Future… Explore HamNoSys in practice MT in medical domain, Health Ireland Partner GP work group questionnaire Human Factors Minority Language MT

26 Thank you for listening Yep, it’s the end! I hope it wasn’t too long Any questions?


Download ppt "Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009."

Similar presentations


Ads by Google