Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006.

Similar presentations


Presentation on theme: "Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006."— Presentation transcript:

1 Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

2 Overview motivation elicitation corpus constraints issue: definiteness status

3 Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple....

4 Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple.... 那裡曾經有一個蘋果嗎 ? 那裡不是曾經有一個蘋果嗎 ? 那裡會有一個蘋果嗎 ? 那裡不是會有一個蘋果嗎 ? 那裡曾經有一個蘋果。 那裡曾經沒有一個蘋果。 那裡會有一個蘋果。 那裡不會有一個蘋果。

5 Uses for parallel corpus statistical MT training data learning about grammar of new language

6 Motivation how do languages form various constructions (e.g. relative clauses)? 1. The student whom I saw 2. 我見過的學生。

7 Motivation what semantic distinctions are important in different languages? He is talking.Tā zài jiăng huà.Il parle. They are talking.Tā mén zài jiăng huà. Ils parlent. He talks. {habitually} Tā jiăng huà.Il parle.

8 The MILE (MInor Language Elicitation) Corpus sentences covering various semantic categories/constructions e.g. number, gender, relative clauses to be translated into language under study semantic representation for each sentence

9 The MILE (MInor Language Elicitation) Corpus 10,000-20,000 words translations done by one person 7 languages per year for next 5 years E.g., Thai, Bengali, Punjabi May have a lot of speakers, but fewer electronic resources

10 Constraints maximize range of semantic categories and constructions minimize corpus size

11 Constraints different languages complex in different areas only one corpus, for this project ultimate goal: dynamically navigate through features e.g. no sing./pl. distinction → no dual

12 Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

13 Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

14 Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((actor ((np-function fn-actor)(np-general-type interrogative-type) (np-person person-third)(np- number num-dual) (np-biological-gender bio-gender-male)(np-animacy anim-human)(np- pronoun-antecedent antecedent-n/a) (np-specificity specificity-neutral)(np-identifiability identifiability-neutral) (np-distance distance-neutral)(np-pronoun-exclusivity inclusivity- n/a))) (undergoer ((np-person person-third)(np-identifiability unidentifiable)(np-number num-pl) (np-specificity non-specific)(np-animacy anim-inanimate)(np-biological-gender bio- gender-n/a)(np-function fn-undergoer)(np-general-type common-noun-type)(np-pronoun- exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance- neutral))) (c-polarity polarity-positive) (c-v-absolute-tense future) (c-general-type open- question)(c-question-gap gap-actor)(c-my-causer-intentionality intentionality-n/a)(c- comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary- n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c- source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness- neutral)(c-solidarity solidarity-neutral)(c-v-grammatical-aspect gram-aspect-neutral)(c- adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v- lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event- modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c- copula-type copula-n/a)(c-power-relationship power-peer)(c-our-shared-subject shared- subject-n/a))

15 Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

16 Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

17 Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE)) Feature name

18 Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE))) (UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON- SPECIFIC))) (C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE)) Feature name value

19 Using semantic representation Advantages: more precise more complete encode actual linguistic features to elicit

20 Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

21 Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple....

22 Method 1. create semantic representations first (instead of starting with English) 2. write English sentences based on them 3. translate sentences into various languages

23 Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple.... 那裡曾經有一個蘋果嗎 ? 那裡不是曾經有一個蘋果嗎 ? 那裡會有一個蘋果嗎 ? 那裡不是會有一個蘋果嗎 ? 那裡曾經有一個蘋果。 那裡曾經沒有一個蘋果。 那裡會有一個蘋果。 那裡不會有一個蘋果。

24 1. Naturalness naturalness of sentences vs. holding lexical items constant minimal pairs ideal (A tree fell/The tree fell) but also want natural sentences natural → easier to translate → less mistakes She hurt herself. *It hurt itself. sentences are hand-written vs using natural language generators (GenKit)

25 2. Restrictions need to find restrictions on combinations of features some combinations invalid/unnatural e.g. inclusive and third-person

26 3. Definition of values use language-independent semantic categories precise e.g. specificity better than definiteness agreement on definitions intercoder agreement (informal experiment) writers agreed on English forms to use

27 Avoiding language-specificity many-to-many translations of determiners I have a cat.J’ai un chat. The cat is fat.Le chat est gros. I like chocolate.J’aime le chocolat. I eat chocolates.Je mange des chocolats. Communism failed.Le communisme a échoué. He has (some) money.Il a de l’argent. I am a teacher.Je suis professeur. EnglandL’angleterre I don’t have a/any cat(s).Je n’ai pas de chat.

28 Avoiding language-specificity Have to break it down by function: Indefinite quantity (some water) Generic (the moose is a noble animal) Predicate nominal (I am a doctor) definite noun phrase (the dog is sick) Etc.

29 Definiteness example of a problem in design of features and values how to define definiteness, while avoiding using English definiteness categories?

30 Criteria for definiteness Lyons (1999): uniqueness familiarity identifiability specificity inclusiveness

31 Criteria for definiteness chose the most important criteria: identifiability specificity

32 Definiteness You and I are in a room. I say “The chair is on fire!”

33

34 Definiteness Why did I say “the chair”? identifiability I know that you know what chair I’m talking about specificity I’m referring to a particular chair

35 Grammatical feature: specificity John wants to marry a Norwegian. Feature: np-specificity Values specific John wants to marry a (specific) Norwegian. non-specific John wants to marry some Norwegian. specificity-neutral She is a Norwegian.

36 Grammatical feature: specificity Turkish direct objects: Ali bir kitap okudu. Ali one book read Ali read a book. Ali bir kitab- ı okudu. Ali one book-acc read Ali read a (specific) book.

37 Definiteness: corner cases e.g. Who will be the manager? Not about a specific manager, but it is about a specific role e.g. She is a teacher. identifiable-neutral, specificity-neutral no article here in French e.g. A dog has four legs. identifiability-generic, specificity-neutral

38 Layout of Corpus 1. Clause types, negation, and formality 2. Discourse setting/Speaker-hearer features 3. Basic NP features 4. Verbal Tense and Aspect 5. Evidentiality and Modality 6. Causatives 7. Comparatives 8. Modifiers 9. Conjunctions 10. Clause-combining

39 Layout of Corpus combine feature values systematically why combine some features interact e.g. Will the woman be happy? (interrogative, future tense) what to combine? some features known to interact e.g. person, number (I am, we are, he is)

40 Status delivered 21,133 words (sampled version) translated into Thai, Bengali Spanish -> Guarani


Download ppt "Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006."

Similar presentations


Ads by Google