Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006.

Slides:



Advertisements
Similar presentations
French articles le, la, l’, les un, une, des.
Advertisements

Lexis and Grammar for Translation Dott. M. Gatto Lingue e Culture per il Turismo Lingua e Traduzione Inglese I.
French articles le, la, l’, les un, une, des.
Grammar: Meaning and Contexts * From Presentation at NCTE annual conference in Pittsburgh, 2005.
Semantics (Representing Meaning)
Why study grammar? Knowledge of grammar facilitates language learning
The NOUN 1 General characteristics and classification
Chapter 4 Syntax.
Projecting Grammatical Features in Nominals: 23 March 2010 Jerry T. Ball Senior Research Psychologist 711 th HPW / RHAC Air Force Research Laboratory DISTRIBUTION.
Long Distance Dependencies (Filler-Gap Constructions) and Relative Clauses October 10, : Grammars and Lexicons Lori Levin (Examples from Kroeger.
Discourse Martin Hassel KTH NADA Royal Institute of Technology Stockholm
How Language Use Varies
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 18, March 13, 2007.
Kakia Chatsiou GreekGram: Building a parallel grammar for Modern Greek LAC day GreekGram Building a parallel grammar for Modern Greek Kakia.
Syntax Lecture 4.
Elicitation Corpus April 12, Agenda Tagging with feature vectors or feature structures Combinatorics Extensions.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Meaning and Language Part 1.
French grammar and grammatical analysis
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
Chapter 2 Words and word classes.
Linguistically Targeted Test Suites November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Instructor: Jully Yin Meeting Room: Room 209. Ms. Jully Yin has been instructing at National Taipei University since Education: Ms. Jully Yin has.
WHEN DOES IT HAPPEN? MAKING SENSE OF ENGLISH. EVENTS ARE ANCHORED IN TIME 當小明看到貓在追狗 … When? 甚麼時候發生的? 「標記時間」是描述事件的首要任務 2.
1 Words and rules Linguistics lecture #2 October 31, 2006.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
1 What does “meaning” mean? Linguistics lecture #3 November 2, 2006.
ASPECTS OF LINGUISTIC COMPETENCE 4 SEPT 09, 2013 – DAY 6 Brain & Language LING NSCI Harry Howard Tulane University.
The Partitive Article saying you want SOME of... in French!!
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Formal Properties of Language. Grammar Morphology Syntax Semantics.
Formal Properties of Language: Talk is achieved through the interdependent components of sounds, words, sentences, and meanings.
Why is my English so poor? Hsin-hao Su December 2, 2008.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
Coherence and Coreference Introduction to Discourse and Dialogue CS 359 October 2, 2001.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
Unit 4: REFERRING EXPRESSIONS
From Syntax to Semantics
Unit 6 Good manners Language Study By Hawkins From ShangPan High School.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
SYNTAX.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
The Noun Phrase Jaclyn Cassiere Sara Kamali Nicole Terranova-Clark.
Linguistics Lecture-1: Words Pushpak Bhattacharyya, CSE Department, IIT Bombay 14 June, 2008.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
PRELIMARIES Dr. Sami Ben Salamh. WHAT IS A SENTENCE?  WRITE THREE SENTENCES.  CAN YOU DIVIDE EACH OF THEM INTO TWO MAIN COMPONENTS (PARTS)?  THINK.
Syntax By WJQ. Syntax : Syntax is the study of the rules governing the way words are combined to form sentences in a language, or simply, the study of.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
King Faisal University [ ] 1 E-learning and Distance Education Deanship Department of English Language College of Arts King Faisal University Introduction.
System and the axis of Choice  Systems are list of choices which are available in the grammar of a language.  It could be a list of things b/w which.
Lecture 1 Sentences Verbs.
Chapter 4 Syntax a branch of linguistics that studies how words are combined to form sentences and the rules that govern the formation of sentences.
Eliciting a corpus of word-aligned phrases for MT
Referents and referring expressions Reem Nasser Abdelwahed
Semantics (Representing Meaning)
Comparative Constructions II
CSC 594 Topics in AI – Applied Natural Language Processing
PRELIMARIES Dr. Sami Ben Salamh
English Language Paper 1
The Thirteen Articles of Articles
Presentation transcript:

Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Overview motivation elicitation corpus constraints issue: definiteness status

Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple....

Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple.... 那裡曾經有一個蘋果嗎 ? 那裡不是曾經有一個蘋果嗎 ? 那裡會有一個蘋果嗎 ? 那裡不是會有一個蘋果嗎 ? 那裡曾經有一個蘋果。 那裡曾經沒有一個蘋果。 那裡會有一個蘋果。 那裡不會有一個蘋果。

Uses for parallel corpus statistical MT training data learning about grammar of new language

Motivation how do languages form various constructions (e.g. relative clauses)? 1. The student whom I saw 2. 我見過的學生。

Motivation what semantic distinctions are important in different languages? He is talking.Tā zài jiăng huà.Il parle. They are talking.Tā mén zài jiăng huà. Ils parlent. He talks. {habitually} Tā jiăng huà.Il parle.

The MILE (MInor Language Elicitation) Corpus sentences covering various semantic categories/constructions e.g. number, gender, relative clauses to be translated into language under study semantic representation for each sentence

The MILE (MInor Language Elicitation) Corpus 10,000-20,000 words translations done by one person 7 languages per year for next 5 years E.g., Thai, Bengali, Punjabi May have a lot of speakers, but fewer electronic resources

Constraints maximize range of semantic categories and constructions minimize corpus size

Constraints different languages complex in different areas only one corpus, for this project ultimate goal: dynamically navigate through features e.g. no sing./pl. distinction → no dual

Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((actor ((np-function fn-actor)(np-general-type interrogative-type) (np-person person-third)(np- number num-dual) (np-biological-gender bio-gender-male)(np-animacy anim-human)(np- pronoun-antecedent antecedent-n/a) (np-specificity specificity-neutral)(np-identifiability identifiability-neutral) (np-distance distance-neutral)(np-pronoun-exclusivity inclusivity- n/a))) (undergoer ((np-person person-third)(np-identifiability unidentifiable)(np-number num-pl) (np-specificity non-specific)(np-animacy anim-inanimate)(np-biological-gender bio- gender-n/a)(np-function fn-undergoer)(np-general-type common-noun-type)(np-pronoun- exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance- neutral))) (c-polarity polarity-positive) (c-v-absolute-tense future) (c-general-type open- question)(c-question-gap gap-actor)(c-my-causer-intentionality intentionality-n/a)(c- comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary- n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c- source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness- neutral)(c-solidarity solidarity-neutral)(c-v-grammatical-aspect gram-aspect-neutral)(c- adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v- lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event- modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c- copula-type copula-n/a)(c-power-relationship power-peer)(c-our-shared-subject shared- subject-n/a))

Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE)) Feature name

Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE)) Feature name value

Using semantic representation Advantages: more precise more complete encode actual linguistic features to elicit

Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple....

Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple.... 那裡曾經有一個蘋果嗎 ? 那裡不是曾經有一個蘋果嗎 ? 那裡會有一個蘋果嗎 ? 那裡不是會有一個蘋果嗎 ? 那裡曾經有一個蘋果。 那裡曾經沒有一個蘋果。 那裡會有一個蘋果。 那裡不會有一個蘋果。

1. Naturalness naturalness of sentences vs. holding lexical items constant minimal pairs ideal (A tree fell/The tree fell) but also want natural sentences natural → easier to translate → less mistakes She hurt herself. *It hurt itself. sentences are hand-written vs using natural language generators (GenKit)

2. Restrictions need to find restrictions on combinations of features some combinations invalid/unnatural e.g. inclusive and third-person

3. Definition of values use language-independent semantic categories precise e.g. specificity better than definiteness agreement on definitions intercoder agreement (informal experiment) writers agreed on English forms to use

Avoiding language-specificity many-to-many translations of determiners I have a cat.J’ai un chat. The cat is fat.Le chat est gros. I like chocolate.J’aime le chocolat. I eat chocolates.Je mange des chocolats. Communism failed.Le communisme a échoué. He has (some) money.Il a de l’argent. I am a teacher.Je suis professeur. EnglandL’angleterre I don’t have a/any cat(s).Je n’ai pas de chat.

Avoiding language-specificity Have to break it down by function: Indefinite quantity (some water) Generic (the moose is a noble animal) Predicate nominal (I am a doctor) definite noun phrase (the dog is sick) Etc.

Definiteness example of a problem in design of features and values how to define definiteness, while avoiding using English definiteness categories?

Criteria for definiteness Lyons (1999): uniqueness familiarity identifiability specificity inclusiveness

Criteria for definiteness chose the most important criteria: identifiability specificity

Definiteness You and I are in a room. I say “The chair is on fire!”

Definiteness Why did I say “the chair”? identifiability I know that you know what chair I’m talking about specificity I’m referring to a particular chair

Grammatical feature: specificity John wants to marry a Norwegian. Feature: np-specificity Values specific John wants to marry a (specific) Norwegian. non-specific John wants to marry some Norwegian. specificity-neutral She is a Norwegian.

Grammatical feature: specificity Turkish direct objects: Ali bir kitap okudu. Ali one book read Ali read a book. Ali bir kitab- ı okudu. Ali one book-acc read Ali read a (specific) book.

Definiteness: corner cases e.g. Who will be the manager? Not about a specific manager, but it is about a specific role e.g. She is a teacher. identifiable-neutral, specificity-neutral no article here in French e.g. A dog has four legs. identifiability-generic, specificity-neutral

Layout of Corpus 1. Clause types, negation, and formality 2. Discourse setting/Speaker-hearer features 3. Basic NP features 4. Verbal Tense and Aspect 5. Evidentiality and Modality 6. Causatives 7. Comparatives 8. Modifiers 9. Conjunctions 10. Clause-combining

Layout of Corpus combine feature values systematically why combine some features interact e.g. Will the woman be happy? (interrogative, future tense) what to combine? some features known to interact e.g. person, number (I am, we are, he is)

Status delivered 21,133 words (sampled version) translated into Thai, Bengali Spanish -> Guarani