Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer.

Slides:



Advertisements
Similar presentations
Lexis and Grammar for Translation Dott. M. Gatto Lingue e Culture per il Turismo Lingua e Traduzione Inglese I.
Advertisements

Lexical Functional Grammar : Grammar Formalisms Spring Term 2004.
Semantics (Representing Meaning)
Lexical Functional Grammar History: –Joan Bresnan (linguist, MIT and Stanford) –Ron Kaplan (computational psycholinguist, Xerox PARC) –Around 1978.
Projecting Grammatical Features in Nominals: 23 March 2010 Jerry T. Ball Senior Research Psychologist 711 th HPW / RHAC Air Force Research Laboratory DISTRIBUTION.
Verbs Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p ) Verbs provide the focal point of the clause. The main.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Introduction to phrases & clauses
Why Syntax? 1) Syntax and ELL:Taro a dog found. Chan, Alice Y.W. (2004). Syntactic Transfer: Evidence from the Interlanguage of Hong Kong Chinese ESL Learners.
Elicitation Corpus April 12, Agenda Tagging with feature vectors or feature structures Combinatorics Extensions.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Machine Translation with Scarce Resources The Avenue Project.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
THE PARTS OF SYNTAX Don’t worry, it’s just a phrase ELL113 Week 4.
Embedded Clauses in TAG
Building NLP Systems for Two Resource Scarce Indigenous Languages: Mapudungun and Quechua, and some other languages Christian Monson, Ariadna Font Llitjós,
Data Elicitation for AVENUE Lori Levin Alison Alvarez Jeff Good (MPI Leipzig) Bob Frederking Erik Peterson Language Technologies Institute Carnegie Mellon.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
Morphology An Introduction to the Structure of Words Lori Levin and Christian Monson Grammars and Lexicons Fall Term, 2004.
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Rules, Movement, Ambiguity
Natural Language Processing
An Overview of the AVENUE Project Presented by Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF, August 6, 2001 Machine Translation for Indigenous Languages.
Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006.
SYNTAX.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
1 Some English Constructions Transformational Framework October 2, 2012 Lecture 7.
Data Elicitation for AVENUE By: Alison Alvarez Lori Levin Bob Frederking Jeff Good (MPI Leipzig) Erik Peterson.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
NATURAL LANGUAGE PROCESSING
September 26, : Grammars and Lexicons Lori Levin.
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
Expanding verb phrases
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Lecture 1 Sentences Verbs.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
Lexical Functional Grammar
Semantics (Representing Meaning)
Presentation transcript:

Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Joint work with Jeff Good

Outline The AVENUE MT project –Including a list of languages we have worked on The elicitation tool –Including which kinds of fonts it works for The questionnaire –Including which languages it has been translated into Tools for building and revising questionnaires

MT Approaches Interlingua : introduce-self Syntactic Parsing Pronoun-acc-1-sg chiamare-1sg N Semantic Analysis Sentence Planning Text Generation [np poss-1sg “name”] BE-pres N Source Mi chiamo Lori Target My name is Lori Transfer Rules Direct: SMT, EBMT AVENUE: Automate Rule Learning

AVENUE Machine Translation System Type information Synchronous Context Free Rules Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) ) Jaime Carbonell (PI), Alon Lavie (Co-PI), Lori Levin (Co-PI) Rule learning: Katharina Probst

AVENUE Rules can be written by hand or learned automatically. Hybrid –Rule-based transfer –Statistical decoder –Multi-engine combinations with SMT and EBMT

AVENUE systems (Small and experimental, but tested on unseen data) Hebrew-to-English –Alon Lavie, Shuly Wintner, Katharina Probst –Hand-written and automatically learned –Automatic rules trained on 120 sentences perform slightly better than about 20 hand-written rules. Hindi-to-English –Lavie, Peterson, Probst, Levin, Font, Cohen, Monson –Automatically learned –Performs better than SMT when training data is limited to 50K words

AVENUE systems (Small and experimental, but tested on unseen data) English-to-Spanish –Ariadna Font Llitjos –Hand-written, automatically corrected Mapudungun-to-Spanish –Roberto Aranovich and Christian Monson –Hand-written Dutch-to-English –Simon Zwarts –Hand-written

Outline The AVENUE MT project  The elicitation tool The questionnaire Tools for building questionnaires

Elicitation Get data from someone who is –Bilingual –Literate With consistent spelling –Not experienced with linguistics

English-Hindi Example Elicitation Tool: Erik Peterson

English-Chinese Example Note: Translator has to insert spaces between words in Chinese.

English-Arabic Example

Outline The AVENUE MT project The elicitation tool  The questionnaire Tools for building questionnaires

Size of Questionnaire Around 3200 sentences 20K words

Questionnaire Sample: clause level Mary is writing a book for John. Who let him eat the sandwich? Who had the machine crush the car? They did not make the policeman run. Mary had not blinked. The policewoman was willing to chase the boy. Our brothers did not destroy files. He said that there is not a manual. The teacher who wrote a textbook left. The policeman chased the man who was a thief. Mary began to work. Tense, aspect, transitivity, animacy Questions, causation and permission Interaction of lexical and grammatical aspect Volitionality Embedded clauses and sequence of tense Relative clauses Phase aspect

Questionnaire Sample: noun phrase level The man quit in November. The man works in the afternoon. The balloon floated over the library. The man walked over the platform. The man came out from among the group of boys. The long weekly meeting ended. The large bus to the post office broke down. The second man laughed. All five boys laughed. Temporal and locative meanings Quantifiers Numbers Combinations of different types of modifers –My book Possession, definiteness –A book of mine Possession, indefiniteness

Organization into Minimal Pairs srcsent: Tú caíste. tgtsent: Eymi ütrünagimi. aligned: ((1,1),(2,2)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) fell srcsent: Tú estás cayendo. tgtsent: Eymi petu ütrünagimi. aligned: ((1,1),(2 3,2 3)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) are falling srcsent: Tú caíste. tgtsent: Eymi ütrunagimi. aligned: ((1,1),(2,2)) context: tú = María [femenino, 2a persona del singular] comment: You (Mary) fell

Feature Detection: Spanish The girl saw a red book. ((1,1)(2,2)(3,3)(4,4)(5,6)(6,5)) La niña vió un libro rojo A girl saw a red book ((1,1)(2,2)(3,3)(4,4)(5,6)(6,5)) Una niña vió un libro rojo I saw the red book ((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi el libro rojo I saw a red book. ((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo Feature: definiteness Values: definite, indefinite Function-of-*: subj, obj Marked-on-head-of-*: no Marked-on-dependent: yes Marked-on-governor: no Marked-on-other: no Add/delete-word: no Change-in-alignment: no

Feature Detection: Chinese A girl saw a red book. ((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8)) 有 一个 女人 看见 了 一本 红色 的 书 。 The girl saw a red book. ((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7)) 女人 看见 了 一本 红色的 书 Feature: definiteness Values: definite, indefinite Function-of-*: subject Marked-on-head-of-*: no Marked-on-dependent: no Marked-on-governor: no Add/delete-word: yes Change-in-alignment: no

Feature Detection: Chinese I saw the red book ((1, 3)(2, 4)(2, 5)(4, 1)(5, 2)) 红色的 书, 我 看见 了 I saw a red book. ((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6)) 我 看见 了 一本 红色的 书 。 Feature: definitenes Values: definite, indefinite Function-of-*: object Marked-on-head-of-*: no Marked-on-dependent: no Marked-on-governor: no Add/delete-word: yes Change-in-alignment: yes

Feature Detection: Hebrew A girl saw a red book. ((2,1) (3,2)(5,4)(6,3)) ילדה ראתה ספר אדום The girl saw a red book ((1,1)(2,1)(3,2)(5,4)(6,3)) הילדה ראתה ספר אדום I saw a red book. ((2,1)(4,3)(5,2)) ראיתי ספר אדום I saw the red book. ((2,1)(3,3)(3,4)(4,4)(5,3)) ראיתי את הספר האדום Feature: definiteness Values: definite, indefinite Function-of-*: subj, obj Marked-on-head-of-*: yes Marked-on-dependent: yes Marked-on-governor: no Add-word: no Change-in-alignment: no

Feature Detection Feeds into… Corpus Navigation: which minimal pairs to pursue next. –Don’t pursue gender in Mapudungun –Do pursue definiteness in Hebrew Morphology Learning: –Morphological learner identifies the forms of the morphemes –Feature detection identifies the functions Rule learning: –Rule learner will have to learn a constraint for each morpho- syntactic marker that is discovered E.g., Adjectives and nouns agree in gender, number, and definiteness in Hebrew.

Languages The set of feature structures with English sentences has been delivered to the Linguistic Data Consortium as part of the Reflex program. Translated (by LDC) into: –Thai –Bengali Plans to translate into: –Seven “strategic” languages per year for five years. As one small part of a language pack (BLARK) for each language.

Languages Spanish version in progress at New Mexico State University (Helmreich and Cowie) –Plans to translate into Guarani Portuguese version in progress in Brazil (Marcello Modesto) –Plans to translate into Karitiana 200 speakers Plans to translate into Inupiaq (Kaplan and MacLean)

Previous Elicitation Work Pilot corpus –Around 900 sentences –No feature structures Mapudungun –Two partial translations Quechua –Three translations Aymara –Seven translations Hebrew Hindi –Several translations Dutch

Feature Structures The questionnaire is actually a corpus of feature structures that happen to have English or Spanish sentences attached to them.

Bengali example with feature structure srcsent: The large bus to the post office broke down. context: tgtsent: ((actor ((modifier ((mod-role mod-descriptor) (mod-role role-loc-general-to))) (np-identifiability identifiable)(np-specificity specific) (np-biological-gender bio-gender-n/a)(np-animacy anim-inanimate) (np-person person-third)(np-function fn-actor)(np-general-type common-noun- type)(np-number num-sg)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun- antecedent antecedent-n/a)(np-distance distance-neutral))) (c-general-type declarative-clause)(c-my-causer-intentionality intentionality-n/a)(c- comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control- n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation- directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition- n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-polarity polarity-positive)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula- type copula-n/a)(c-v-absolute-tense past)(c-power-relationship power-peer)(c-our- shared-subject shared-subject-n/a)(c-question-gap gap-n/a))

Why feature structures? Decide what grammatical meaning to elicit. Represent it in a feature structure. Formulate an English or Spanish sentence that expresses that meaning. –We can use the same corpus of feature structures for several elicitation languages Have the informant translate it.

Grammatical meanings vs syntactic categories Features and values are based on a collection of grammatical meanings –Many of which are similar to the grammatemes of the Prague Treebanks

Grammatical Meanings YES Semantic Roles Identifiability Specificity Time –Before, after, or during time of speech Modality NO Case Voice Determiners Auxiliary verbs

Grammatical Meanings YES How is identifiability expressed? –Determiner –Word order –Optional case marker –Optional verb agreement How is specificity expressed? How are generics expressed? How are predicate nominals marked? NO How are English determiners translated? –The boy cried. –The lion is a fierce beast. –I ate a sandwich. –He is a soldier. Il est soldat.

Argument Roles Actor Undergoer Predicate and predicatee –The woman is the manager. Recipient –I gave a book to the students. Beneficiary –I made a phone call for Sam.

Why not subject and object? Languages use their voice systems for different purposes. Mapudungun obligatorily uses an inverse marked verb when third person acts on first or second person. –Verb agrees with undergoer –Undergoer exhibits other subjecthood properties –Actor may be object. Yes: How are actor and undergoer encoded in combination with other semantic features like adversity (Japanese) and person (Mapudungun)? No: How is English voice translated into another language?

Argument Roles Accompaniment –With someone –With pleasure Material –(out) of wood About 20 more roles –From the Lingua checklist; Comrie & Smith (1977) –Many also found in tectogrammatical representations in the Prague Treebanks Around 80 locative relations –From Lingua checklist Many temporal relations

Noun Phrase Features Person Number Biological gender Animacy Distance (for deictics) Identifiability Specificity Possession Other semantic roles –Accompaniment, material, location, time, etc. Type –Proper, common, pronoun Cardinals Ordinals Quantifiers Given and new information –Not used yet because of limited context in the elicitation tool.

Clause level features Tense Aspect –Lexical, grammatical, phase Type –Declarative, open-q, yes-no-q Function –Main, argument, adjunct, relative Source –Hearsay, first-hand, sensory, assumed Assertedness –Asserted, presupposed, wanted Modality –Permission, obligation –Internal, external

Other clause types (Constructions) Causative –Make/let/have someone do something Predication –May be expressed with or without an overt copula. Existential –There is a problem. Impersonal –One doesn’t smoke in restaurants in the US. Lament –If only I had read the paper. Conditional Comparative Etc.

Outline The AVENUE MT project The elicitation tool The questionnaire  Tools for building questionnaires

Mar 1, 2006 The Process List of semantic features and values The Corpus Feature Maps: which combinations of features and values are of interest … Clause- Level Noun- Phrase Tense & Aspect Modality Feature Structure Sets Feature Specification Reverse Annotated Feature Structure Sets: add English sentences Smaller Corpus Sampling

Feature Specification Defines Features and their values Sets default values for features Specifies feature requirements and restrictions Written in XML

Feature Specification Feature: c-copula-type (a copula is a verb like “be”; some languages do not have copulas) Values copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula) Notes: copula-role Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler" copula-identity Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. "Clark Kent is Superman" "Sam is the teacher" copula-location Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification. copula-description Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. A description is an attribute. "The children are happy." "The books are long."

Feature Maps Some features interact in the grammar –English –s reflects person and number of the subject and tense of the verb. –In expressing the English present progressive tense, the auxiliary verb is in a different place in a question and a statement: He is running. Is he running? We need to check many, but not all combinations of features and values. Using unlimited feature combinations leads to an unmanageable number of sentences

Feature Combination Template ((predicatee ((np-general-type pronoun-type common- noun-type) (np-person person-first person-second person-third) (np-number num-sg num-pl) (np-biological-gender bio-gender-male bio- gender-female))) {[(predicate ((np-general-type common- noun-type) (np-person person-third))) (c-copula-type role)] [(predicate ((adj-general-type quality-type) (c-copula-type attributive)))] [(predicate ((np-general-type common- noun-type) (np-person person-third) (c-copula-type identity)))]} (c-secondary-type secondary-copula) (c- polarity #all) (c-general-type declarative) (c-speech-act sp-act-state) (c-v-grammatical-aspect gram-aspect- neutral) (c-v-lexical-aspect state) (c-v-absolute-tense past present future) (c-v-phase-aspect durative)) Summarizes 288 feature structures, which are automatically generated.

Adding Sentences to Feature Structures srcsent: Mary was not a leader. context: Translate this as though it were spoken to a peer co- worker; ((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…)) (pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…)) (c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical- aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase- aspect phase-aspect-neutral) (c-general-type declarative-clause)(c- polarity polarity-negative)(c-my-causer-intentionality intentionality- n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative- n/a)(c-our-boundary boundary-n/a)…)

Difficult Issues in Adding Sentences Have to remember that the grammatical meanings don’t correspond exactly to English morphemes. –Identifiability and specificity vs the and a –Modality, tense, aspect vs auxiliary verbs The meaning has to be clear to a translator. –If English is going to be the source language for translation, the clearest way to say something may not be the most common way it is said in real text or conversation.

Hard Problems Expressing meanings that are not grammaticalized in English. –Evidentiality: He stole the bread. Context: Translate this as if you do not have first hand knowledge. In English, we might say, “They say that he stole the bread” or “I hear that he stole the bread.”

Hard Problems Reverse annotating things that can be said in several ways in English. –Impersonals: One doesn’t smoke here. You don’t smoke here. They don’t smoke here. There’s no smoking here. Credit cards aren’t accepted. –Problem in the Reflex corpus because space was limited.

Evaluation Current funding has not covered evaluation of the questionnaire. –Except for informal observations as it was translated into several languages. Does it elicit the meanings it was intended to elicit? –Informal observation: usually Is it useful for machine translation?

Navigation Currently, feature combinations are specified by a human. Plan to work in active learning mode. –Build seed questionnaire –Translate some data –Do some learning –Identify most valuable pieces of information to get next –Generate an RTB for those pieces of information –Translate more –Learn more –Generate more, etc.

Summary Feature Specification: –lists features and values –Grammatical meanings Feature Combinations Set of Feature Structures Add English or Spanish Sentences Get a translation and word alignment from a bilingual, literate informant