1 On the Ambiguity of Serbian Texts and Methods to disambiguate it Cvetana Krstev, Duško Vitas, University of Belgrade 8 th Intex/Nooj Workshop.

Slides:



Advertisements
Similar presentations
Grammar and Sentences “It is impossible ..to teach English grammar in the schools for the simple reason that no one knows exactly what it is” Government.
Advertisements

Identifying Parts of Speech & their Functions Nouns, Pronouns, Verbs, Prepositions, Adjectives, & Adverbs; Subjects & Objects.
Chapter 4 Basics of English Grammar
Used in place of a noun pronoun.
PRONOUNS LESSON 1. WHAT IS A PRONOUN? Pronouns take the place of nouns to name persons, places, things, or ideas.
Grammar Workshop The Writing Studio Bate Pronoun and Antecedent Agreement A pronoun is a word that takes the place of a noun Both must be singular,
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
By: Amanda Anthony Sarah Stepanchick & Ashley Morgan
Style, Grammar and Punctuation
Grammar Skills Workshop
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
The Eight Parts of Speech
6th Intex Workshop, Sofia May th Intex Workshop & 10 years of (Silberztein, 1993) Sofia, May 2003.
Rules for Longhorn Jeopardy Points to be taken away for wrong answers Make sure you state your answer in a question. Pay attention to all of the questions.
Daily Grammar Practice
Paul Lwere Teacher of English Language Kyambogo College School ©2013.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
8 Parts of Speech. Nouns  Concrete Nouns  Abstract Nouns  Common Nouns  Proper Nouns  Compound Nouns Grammar Rocks: Nouns Wanna Live Forever? Become.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Language Learning Targets based on CLIMB standards.
Macedonian DELAS – first results Aleksandar Petrovski Tetovo, Macedonia.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Grammar Race!. What is a sentence? Sentences express complete thoughts; they have a subject and a predicate. Subjects are nouns or pronouns (or phrases.
_____________________ Definition Part of Speech (circle one) Picture Antonym (Opposite) Vocab Word Noun Pronoun Adjective Adverb Conjunction Verb Interjection.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
PARTS OF SPEECHPARTS OF SPEECH. NOUNS Definition: A noun names a person, place, or thing. Example: John, computer, honesty, school A singular noun is.
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
English Review for Final These are the chapters to review. In Textbook: Chapter 9 Nouns Chapter 10 Pronouns Chapter 11 Adjectives Chapter 12 Verbs Chapter.
Parts of Speech Review. A Noun is a person, place, thing, or idea.
Daily Grammar & Vocabulary Practice
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Definitions Adjectives or Adverbs Conjunctions or Interjections Nouns or Prepositions Pronouns or Verbs
General characteristics As any other part of speech, the noun can be characterized by three criteria:  Semantic (the meaning)  Morphological (the form.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
1 Dictionary priorities, e- dictionaries of compounds, morphological mode Cvetana Krstev & Duško Vitas.
Dictionary graphs Duško Vitas University of Belgrade, Faculty of Mathematics.
1 E-dictionaries of MWUs Cvetana Krstev University of Belgrade Faculty of Philology.
Daily Grammar & Vocabulary Practice
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
Monday W rite out this week's sentence and add capitalization and punctuation including end punctuation, commas, semicolons, apostrophes, underlining,
Writing 2 ENG 221 Norah AlFayez. Lecture Contents Revision of Writing 1. Introduction to basic grammar. Parts of speech. Parts of sentences. Subordinate.
TRUE or FALSE? Syntax= the order of words in a sentence.
Grammar Skills Parts of Speech.
Parts of Speech Review.
Monday Write out this week's sentence and add capitalization and punctuation including end punctuation, commas, semicolons, apostrophes, underlining, and.
Parts of Speech How Words Function.
Appendix A: Basic Grammar and Punctuation Reference
Daily Grammar Practice Week One Grade 8
Grammar: Issues with Agreement
Chapter 4 Basics of English Grammar
Conjunctions Prepared by: Khaled Hadi Al Ahbabi Grade: 12 LC
PRELIMARIES Dr. Sami Ben Salamh
Monday Write out this week's sentence and add capitalization and punctuation including end punctuation, commas, semicolons, apostrophes, underlining, and.
Monday Write out this week's sentence and add capitalization and punctuation including end punctuation, commas, semicolons, apostrophes, underlining, and.
FIRST SEMESTER GRAMMAR
Parts of Speech Mr. White English I.
Daily Grammar Practice
Daily Grammar Practice Week One Grade 8
Grammar presentation By this guy standing in front of you now…
Parts of Speech How Words Function.
Daily Grammar Practice
PREPOSITIONAL PHRASES
Chapter 4 Basics of English Grammar
Daily Grammar & Vocabulary Practice
Write the vocabulary words in your personal dictionary.
USE "APPENDIX A" AS A REFERENCE TO CORRECTLY COMPLETE EACH STEP
Monday Write out this week's sentence and add capitalization and punctuation including end punctuation, commas, semicolons, apostrophes, underlining, and.
ALI139 – Arabic Grammar I Week 2.
Parts of Speech.
Presentation transcript:

1 On the Ambiguity of Serbian Texts and Methods to disambiguate it Cvetana Krstev, Duško Vitas, University of Belgrade 8 th Intex/Nooj Workshop

2 What is the ambiguity? the assignment of different lemmas the assignment of different grammatical categories

3 The ambiguity in Serbian In Serbian many word forms are homographs although not homophones—stress marks are not recorded: gőreadv.up gőrē adv.worse gòrēP3sgoreti,V+Ekto burn gòreA3s gòrēP3sgorjeti,V+Ijkto burn gòreA3s gòrefs2goraforest shortlong upőô downòó gore

4 The ambiguity in Serbian (2) rodoslovna,rodoslovni.A2+PosQ:akms2g:akms4v:aefs1g:aefs5g:akns2g:a enp1g:aenp4g:aenp5g rodoslovne,rodoslovni.A2+PosQ:aemp4g:aefs2g:aefp1g:aefp4g:aefp5g rodoslovni,rodoslovni.A2+PosQ:adms1g:aems4q:aems5g:aemp1g:aemp5g rodoslovnih,rodoslovni.A2+PosQ:aemp2g:aefp2g:aenp2g rodoslovnim,rodoslovni.A2+PosQ:aems6g:aemp3g:aemp6g:aemp7g:aefp3g: aefp6g:aefp7g:aens6g:aenp3g:aenp6g:aenp7g rodoslovnima,rodoslovni.A2+PosQ:aemp3g:aemp6g:aemp7g:aefp3g:aefp6g :aefp7g:aenp3g:aenp6g:aenp7g rodoslovno,rodoslovni.A2+PosQ:aens1g:aens4g:aens5g rodoslovnog,rodoslovni.A2+PosQ:adms2g:adms4v:adns2g rodoslovnoga,rodoslovni.A2+PosQ:adms2g:adms4v:adns2g rodoslovnoj,rodoslovni.A2+PosQ:aefs3g:aefs7g rodoslovnom,rodoslovni.A2+PosQ:adms3g:adms7g:aefs6g:adns3g:adns7g … ← 9 sets of grammatical categories e : form is the same for definite, indefinite g : form is the same for animate and inanimate

5 Disambiguation process Reconstructing word forms Using filter dictionaries Using restricted dictionaries Using dictionaries of compounds Using disambiguation grammars

6 Reconstructing word forms – date adverbial phrases

7 Reconstructing word forms – date adverbial phrases (2) i izdavanxem YUBA kartica 20. februara godine. celog sistema. Zato je josx pocyetkom godine jedan i U petom mjesecu 2001.godine smo oformlx cxe biti odrzxan u novembru ove godine u Neumu, a za prvog Simple forms Assoc. lemmas ratioLemmas + categ. ratio

8 Reconstructing word forms – forms written with digits, etc.

9 Reconstructing word forms – forms written with digits(2) sxkovi iznosili oko 500 hilxada maraka. Znacyajna usxteda poput SAP-ovog ili IBM-ovog, dobijate i organizaciju firme cyelicyne industrije 1890-ih nije postojao. Ali, poznata je sveta drma tezxinom od 81,7 milijardi dolara u 160 zemalxa, odnosno ukupno bezmalo pola milijarde (464 miliona)! Predxe Simple forms Assoc. lemmas ratioLemmas + categ. ratio

10 Using filter dictionaries mi,ja.PRO01+Prs:sx3i mi,mi.PRO03+Prs:px1r mi,miti.V35+Imperf+Tr+Iref+Ref:Ays:Azs li,li.PAR li,liti.V98+Imperf+Tr+It+Iref:Ays:Azs

11 Using filter dictionaries (2) Very cautious filter dictionary with only 41 entries: Simple forms Assoc. lemmas ratioLemmas + categ. ratio

12 Using restricted dictionaries Dictionaries contain lemmas for both standard pronunciations – Ekavian and Ijekavian. Text, however, are usually written in only one. Dictionaries contain lemmas for both Serbian and Croatian language (or variant of Serbo- Croatian)

13 Using restricted dictionaries (2) crvene,crven.A17+Col:aemp4g:aefs2g:aefp1g:aefp4g:aefp5g crvene,crveneti.V547+Imperf+It+Iref+Ref+Ek:Pzp:Ays:Azs crvene,crveniti.V54+Imperf+Tr+Iref:Pzp crvene,crvenxeti.V747+Imperf+It+Iref+Ref+Ijk:Pzp Simple forms Assoc. lemmas ratioLemmas + categ. ratio

14 Using dictionary of compounds bez obzira na,bez obzira na.PREP+C+Ncn+p4 bez,bez.PREP+p2 na,na.INT na,na.PREP+p4+p7 obzira,obzir.N1:ms2q:mp2q obzira,obzirati.V519+Imperf+It+Ref:Ays:Azs Simple forms Assoc. lemmas ratioLemmas + categ. ratio

15 Using disambiguation grammars – positional constraint It is interjection, if it is followed by an exclamation mark.

16 Using disambiguation grammars – positional constraint (2) After sentence or phrase boundary, “mi” and “ti” are personal pronouns in nominative case (after other possibilities were excluded)

17 Using disambiguation grammars – sequential constraint “da” is a conjunction (and not a form of a verb dati – to give – if is followed by an auxiliary verb in clitic form)

18 Using disambiguation grammars – sequential and positional constraints sxargarepe evropska unija ne samo da je prihvatila nasxu i da,.CONJ da,.ADV da,.INT da,.PAR da,dati.V103+Perf+Tr+Iref+Ref:Pzs:Ays:Azs FormsAssoc. lemmas ratioLemmas + categ. ratio

19 Using disambiguation grammars – agreement An adjective, possessive pronoun or numeral has to agree in gender, number, and case with a noun that follows

20 Using disambiguation grammars – agreement (2) povecxati nxegov proboj u regionu. Rumunska proporcija u,.PREP+p2 u,.PREP+p4 u,.PREP+p7 regionu,region.N1:ms3q regionu,region.N1:ms7q FormsAssoc. lemmas ratioLemmas + categ. ratio

21 Using disambiguation grammars – agreement of personal names Special rules of the agreement of first name and surname

22 Using disambiguation grammars – agreement (2) raspalio je Mladxan Dinkicx sxakom o okrugli sto "Platne kartice - Mladxan,Mladxan.N1002+Hum+NProp+First+SR:ms1v Mladxan,mladxan.A7:akms1g:akms4q Dinkicx,Dinkicx.N28+NProp+Hum+Last+SR:ms1v FormsAssoc. lemmas ratioLemmas + categ. ratio

23 The order of grammar application ←Apply first Apply second →

24 Careful construction of grammars Syntactic ambiguity: Zalagacxu se da ti trosxkovi budu minimalni. I will do my best to minimize these expences. I will do my best to minimize your expences. Although some cases are much more frequent... Kličke je bio voljan da da automobil. Klicke was willing to give the car. Mislio sam da ti tvoja gospođa ne da da je viđaš. I thought that your misses is not giving to you to see her.

25 Thank you!