Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

ViSiCAST: Virtual sign: Capture, Animation Storage & Transmission BDA Conference 2nd August 2000 Belfast Dr John Low RNID.
Design, prototyping and construction
Yansong Feng and Mirella Lapata
Vogler and Metaxas University of Toronto Computer Science CSC 2528: Handshapes and Movements: Multiple- channel ASL recognition Christian Vogler and Dimitris.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
SiGML, Signing Gesture Mark-up Language, is the notation developed at UEA over the past three years to support the work of the EU-funded ViSiCAST and eSIGN.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Data-Driven Machine Translation for Sign Languages Sara Morrissey PhD topic NCLT/CNGL Workshop 23 rd July 2008.
Comparative Evaluation of the Linguistic Output of MT Systems for Translation and Non-translation Purposes Marie-Jo Astre Anna Civil Francine Braun-Chen.
Design of a Multi-lingual MT for Real-time Broadcast Captioning Course Project for Ying Zhang (Joy) Advisor: Eric.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing,
Machine Transliteration T BHARGAVA REDDY (Knowledge sharing)
TuniSigner: An avatar-based system to interpret SignWriting notations Yosra Bouzid & Mohamed Jemni Research Laboratory LaTICE, University of Tunis, Tunisia.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
Open Health Natural Language Processing Consortium (OHNLP)
Leveraging Reusability: Cost-effective Lexical Acquisition for Large-scale Ontology Translation G. Craig Murray et al. COLING 2006 Reporter Yong-Xiang.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Sign Language corpora for analysis, processing and evaluation A. Braffort, L. Bolot, E. Chételat-Pelé, A. Choisier, M. Delorme, M. Filhol, J. Segouat,
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Sensitivity of automated MT evaluation metrics on higher quality MT output Bogdan Babych, Anthony Hartley Centre for Translation.
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
An ICALL writing support system tunable to varying levels of learner initiative Karin Harbusch 1 & Gerard Kempen 2,3 1 University of Koblenz-Landau, Koblenz,
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
NEW REQUIREMENTS New requirements – American Sign Language – Recently Generated Sentences Issues with Requirements Options for Implementation Choice and.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Basics of Natural Language Processing Introduction to Computational Linguistics.
Chapter 6 Guidelines for Modelling. 1. The Modelling Process 1. Modelling as a Transformation Process 2. Basic Modelling Activities 3. Types of Modelling.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Building Sign Language Corpora in North America Workshop at Gallaudet University May 21, 2011.
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Neural Machine Translation
An Overview of ViSiCAST
Statistical Machine Translation
Artificial Intelligence for Speech Recognition
KantanNeural™ LQR Experiment
Joint Training for Pivot-based Neural Machine Translation
Issues in Arabic MT Alex Fraser USC/ISI 9/22/2018 Issues in Arabic MT.
8th Annual Post-Graduate Research Symposium
Dr. Bill Vicars Lifeprint.com
Yuri Pettinicchi Jeny Tony Philip
Information Retrieval
Neural Machine Translation
The Nature of learner language
Presentation transcript:

Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009

Why is there no writing system? Social reasons Variation and demographic spread Political reasons Recognition Linguistic reasons Visual-gestural-spatial languages, simultaneous phoneme production

Implications of the lack of writing system …for Deaf people Forced use language not native …for the languages social acceptance  standardisation (Pizzuto, 2006) … for MT Limits availability of domain-specific corpora No standards, difficult to compare systems Significance of results on small datasets Difficult to use NLP tools developed for spoken langs

Sign Language Representation Formats Linear Stokoe Notation, HamNoSys Multi-level Gloss, Partition/Constitute, Movement- Hold, SiGML Iconic SignWriting

Linear Symbolic Notations Stokoe Notation: “don’t know” HamNoSys Notation: “nineteen”

Multi-level Representations Movement-Hold Partition/Constitute Gloss Annotation SiGML

Iconic Sign Writing

But different groups, different requirements (Pizzuto et al, 2006): the aspect of a language chosen for its representation, is largely dictated by the society and culture developing the writing system and what purpose and settings such communication is required for. Deaf, linguists, language processors…

Requirements for MT large bilingual domain-specific corpus of good quality digital data gold standard reference segmentation algorithms for separating words, phrases and sentences alignment methodologies for these units. searching the source and target texts acceptable capturing of the language for output

Discussion of current methods Stokoe (Stokoe, 1960) –Difficult to capture classifiers and NMFs –Decontextualised signs only –ASCII version (Mandel, 1993) HamNoSys (Prillwitz, 1989) –NMFs included –Subsection of 150 symbols for handwriting purposes –Mac usage, Windows font

Discussion of current methods (2) Gloss Annotation: (Leeson et al., 2006, Neidle et al., 2002) –Most commonly used in MT and by linguists –No universal conventions –Extensible –Using one language to describe another –Allows for simultaneous timed logging of features –Tools widely available –SL and linguistic knowledge a requirement –No knowledge of supplementary symbolic system required

Discussion of current methods (3) Partition/Constitute (Huenerfauth, 2005) –Captures movement, classifier and spatial info –Comprehensive, hierarchical rep’n –Implicit use of gloss terms Movement-Hold (Liddell & Johnson, 1989) –Numerically-encoded handshapes –Multi-layer –Used with recognition technology (Vogler & Metaxas, 2004)

Discussion of current methods (4) SiGML (Elliott et al., 2004) –Describes HamNoSys for animation (ViSiCAST) –Double representation SignWriting (Sutton, 1995) –Compact icons –Information displayed in one place –Advocated by SL linguists and growing Deaf –Not currently machine readable

Worked Example “Data-driven Machine Translation for Sign Languages” (Morrissey, 2008) MaTrEx MT system Glossed Annotations of Irish Sign Language (ISL) and German Sign Language (DGS) Air Traffic Information System corpus of ~600 sentences Translated and signed by native Deaf signers

Hand-crafted gloss annotation corpus

Translation Directions

MaTrEx Experiments ISL gloss-to-English text –Baseline –SMT –EBMT 1 –EBMT 2 –Distortion limit

ISL-EN MaTrEx Experiments BLEUWERPER Annotation Baseline SMT EBMT EBMT

EN-ISL MaTrEx Experiments BLEUWERPER ISL-EN best scores SMT EBMT EBMT

Other experiments ISL  DE, DGS  DE, DGS  EN –ISL  EN best scores, by 6.38% BLEU –EBMT 1 chunks improves for ISL-DE only –EBMT 2 chunks improves for ISL-DE only DE  ISL, DE  DGS, EN  DGS –EN  DGS best scores, by 1.3% BLEU –EBMT 1 chunks improves for EN  DGS & EN  ISL –EBMT 2 chunks improves for all Comparison with RWTH system –We’re better!  ~2-6% BLEU ISL video recognition Speech output

ISL Animation Poser software Hand-crafted 66 videos, 50 sentences Played in sequence 4 Deaf evaluators 2 x 4-point scale 82% - intelligibility 72% - fidelity Questionnaire Demo

Thesis Conclusions Good results can be obtained Glossing most appropriate, but not going forward –Allowed linguistic-based alignment –Linear, easily accessible format –Lack of NMF detail, time-consuming, not considered adequate representation of language EBMT chunks show potential but require more development Development of animation module

Where do we go from here? (the words are coming out all weird…) What is the most appropriate SL representation for MT? –Adequately represents the language, –Animation production, –Facilitates the translation process.

Rep’n overview, redux Glossing: machine readable, doesn’t adequately represent the language or facilitate animation Stokoe: ASCII version, not adequate rep’n Partition/Constitute: multi-layered, uses glosses Movement-Hold: multi-layered, uses glosses Sign Writing: compact icons, accepted, potential readability, not machine readable at present … HamNoSys & SiGML: machine readable, comprehensive description, adapted for animation, suited to SMT

The Future… Explore HamNoSys in practice MT in medical domain, Health Ireland Partner GP work group questionnaire Human Factors Minority Language MT

Thank you for listening Yep, it’s the end! I hope it wasn’t too long Any questions?