Download presentation
Presentation is loading. Please wait.
Published byVictor Patterson Modified over 9 years ago
1
Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006
2
Overview motivation elicitation corpus constraints issue: definiteness status
3
Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple....
4
Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple.... 那裡曾經有一個蘋果嗎 ? 那裡不是曾經有一個蘋果嗎 ? 那裡會有一個蘋果嗎 ? 那裡不是會有一個蘋果嗎 ? 那裡曾經有一個蘋果。 那裡曾經沒有一個蘋果。 那裡會有一個蘋果。 那裡不會有一個蘋果。
5
Uses for parallel corpus statistical MT training data learning about grammar of new language
6
Motivation how do languages form various constructions (e.g. relative clauses)? 1. The student whom I saw 2. 我見過的學生。
7
Motivation what semantic distinctions are important in different languages? He is talking.Tā zài jiăng huà.Il parle. They are talking.Tā mén zài jiăng huà. Ils parlent. He talks. {habitually} Tā jiăng huà.Il parle.
8
The MILE (MInor Language Elicitation) Corpus sentences covering various semantic categories/constructions e.g. number, gender, relative clauses to be translated into language under study semantic representation for each sentence
9
The MILE (MInor Language Elicitation) Corpus 10,000-20,000 words translations done by one person 7 languages per year for next 5 years E.g., Thai, Bengali, Punjabi May have a lot of speakers, but fewer electronic resources
10
Constraints maximize range of semantic categories and constructions minimize corpus size
11
Constraints different languages complex in different areas only one corpus, for this project ultimate goal: dynamically navigate through features e.g. no sing./pl. distinction → no dual
12
Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages
13
Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages
14
Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((actor ((np-function fn-actor)(np-general-type interrogative-type) (np-person person-third)(np- number num-dual) (np-biological-gender bio-gender-male)(np-animacy anim-human)(np- pronoun-antecedent antecedent-n/a) (np-specificity specificity-neutral)(np-identifiability identifiability-neutral) (np-distance distance-neutral)(np-pronoun-exclusivity inclusivity- n/a))) (undergoer ((np-person person-third)(np-identifiability unidentifiable)(np-number num-pl) (np-specificity non-specific)(np-animacy anim-inanimate)(np-biological-gender bio- gender-n/a)(np-function fn-undergoer)(np-general-type common-noun-type)(np-pronoun- exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance- neutral))) (c-polarity polarity-positive) (c-v-absolute-tense future) (c-general-type open- question)(c-question-gap gap-actor)(c-my-causer-intentionality intentionality-n/a)(c- comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary- n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c- source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness- neutral)(c-solidarity solidarity-neutral)(c-v-grammatical-aspect gram-aspect-neutral)(c- adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v- lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event- modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c- copula-type copula-n/a)(c-power-relationship power-peer)(c-our-shared-subject shared- subject-n/a))
15
Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))
16
Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))
17
Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE)) Feature name
18
Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE)) Feature name value
19
Using semantic representation Advantages: more precise more complete encode actual linguistic features to elicit
20
Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages
21
Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple....
22
Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages
23
Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple.... 那裡曾經有一個蘋果嗎 ? 那裡不是曾經有一個蘋果嗎 ? 那裡會有一個蘋果嗎 ? 那裡不是會有一個蘋果嗎 ? 那裡曾經有一個蘋果。 那裡曾經沒有一個蘋果。 那裡會有一個蘋果。 那裡不會有一個蘋果。
24
1. Naturalness naturalness of sentences vs. holding lexical items constant minimal pairs ideal (A tree fell/The tree fell) but also want natural sentences natural → easier to translate → less mistakes She hurt herself. *It hurt itself. sentences are hand-written vs using natural language generators (GenKit)
25
2. Restrictions need to find restrictions on combinations of features some combinations invalid/unnatural e.g. inclusive and third-person
26
3. Definition of values use language-independent semantic categories precise e.g. specificity better than definiteness agreement on definitions intercoder agreement (informal experiment) writers agreed on English forms to use
27
Avoiding language-specificity many-to-many translations of determiners I have a cat.J’ai un chat. The cat is fat.Le chat est gros. I like chocolate.J’aime le chocolat. I eat chocolates.Je mange des chocolats. Communism failed.Le communisme a échoué. He has (some) money.Il a de l’argent. I am a teacher.Je suis professeur. EnglandL’angleterre I don’t have a/any cat(s).Je n’ai pas de chat.
28
Avoiding language-specificity Have to break it down by function: Indefinite quantity (some water) Generic (the moose is a noble animal) Predicate nominal (I am a doctor) definite noun phrase (the dog is sick) Etc.
29
Definiteness example of a problem in design of features and values how to define definiteness, while avoiding using English definiteness categories?
30
Criteria for definiteness Lyons (1999): uniqueness familiarity identifiability specificity inclusiveness
31
Criteria for definiteness chose the most important criteria: identifiability specificity
32
Definiteness You and I are in a room. I say “The chair is on fire!”
34
Definiteness Why did I say “the chair”? identifiability I know that you know what chair I’m talking about specificity I’m referring to a particular chair
35
Grammatical feature: specificity John wants to marry a Norwegian. Feature: np-specificity Values specific John wants to marry a (specific) Norwegian. non-specific John wants to marry some Norwegian. specificity-neutral She is a Norwegian.
36
Grammatical feature: specificity Turkish direct objects: Ali bir kitap okudu. Ali one book read Ali read a book. Ali bir kitab- ı okudu. Ali one book-acc read Ali read a (specific) book.
37
Definiteness: corner cases e.g. Who will be the manager? Not about a specific manager, but it is about a specific role e.g. She is a teacher. identifiable-neutral, specificity-neutral no article here in French e.g. A dog has four legs. identifiability-generic, specificity-neutral
38
Layout of Corpus 1. Clause types, negation, and formality 2. Discourse setting/Speaker-hearer features 3. Basic NP features 4. Verbal Tense and Aspect 5. Evidentiality and Modality 6. Causatives 7. Comparatives 8. Modifiers 9. Conjunctions 10. Clause-combining
39
Layout of Corpus combine feature values systematically why combine some features interact e.g. Will the woman be happy? (interrogative, future tense) what to combine? some features known to interact e.g. person, number (I am, we are, he is)
40
Status delivered 21,133 words (sampled version) translated into Thai, Bengali Spanish -> Guarani
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.