1 Dictionary priorities, e- dictionaries of compounds, morphological mode Cvetana Krstev & Duško Vitas.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Introduction to C Programming
 2000 Prentice Hall, Inc. All rights reserved. Chapter 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line.
Introduction to C Programming
Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Introduction to C Programming
Entering Data in Excel. Entering numbers, text, a date, or a time n 1Click the cell where you want to enter data. n 2Type the data and press ENTER or.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Chapter 3: Introduction to C Programming Language C development environment A simple program example Characters and tokens Structure of a C program –comment.
UNIX Filters.
Computer Science 1000 Spreadsheets II Permission to redistribute these slides is strictly prohibited without permission.
XP Tutorial 14 New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
 2003 Prentice Hall, Inc. All rights reserved. 1 Introduction to C++ Programming Outline Introduction to C++ Programming A Simple Program: Printing a.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
Microsoft Office Word 2003 Tutorial 1 Creating a Document.
1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
Stack Applications.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Visual Basic 2010 How to Program © by Pearson Education, Inc. All Rights Reserved.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Typing Application Introducing Keyboard Events, Menus, Dialogs and the Dictionary.
Mastering Char to ASCII AND DOING MORE RELATED STRING MANIPULATION Why VB.Net ?  The Language resembles Pseudocode - good for teaching and learning fundamentals.
Lexical Analysis Hira Waseem Lecture
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 2 Chapter 2 - Introduction to C Programming.
Working with the VB IDE. Running a Program u Clicking the”start” tool begins the program u The “break” tool pauses a program in mid-execution u The “end”
ITIP © Ron Poet Lecture 12 1 Finding Out About Objects.
Lecture 2: Introduction to C Programming. OBJECTIVES In this lecture you will learn:  To use simple input and output statements.  The fundamental data.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
VARIABLES Programmes work by manipulating data placed in memory. The data can be numbers, text, objects, pointers to other memory areas, and more besides.
CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 2 - Introduction to C Programming Outline.
Restrictions Objectives of the Lecture : To consider the algebraic Restrict operator; To consider the Restrict operator and its comparators in SQL.
XP Tutorial 7 New Perspectives on JavaScript, Comprehensive 1 Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Chapter Looping 5. The Increment and Decrement Operators 5.1.
1 Local Grammars – 2 nd part Cvetana Krstev University of Belgrade Faculty of Philology.
Dictionary graphs Duško Vitas University of Belgrade, Faculty of Mathematics.
Chapter Looping 5. The Increment and Decrement Operators 5.1.
© 2006 Lawrenceville Press Slide 1 Chapter 4 Variables  A variable is a name for a value stored in memory.  Variables are created using a declaration.
FG Group -Afrilia BP -Liana F.B.I -Maulidatun Nisa -Riza Amini F.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
Bill Tucker Austin Community College COSC 1315
A variable is a name for a value stored in memory.
System Software Unit-1 (Language Processors) A TOY Compiler
Chapter 2 - Introduction to C Programming
Containers and Lists CIS 40 – Introduction to Programming in Python
Compiler Construction
Classes.
Chapter 2 - Introduction to C Programming
Syntax.
Chapter 2 - Introduction to C Programming
Introduction to C++ Programming
Variables ICS2O.
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Chapter 2 - Introduction to C Programming
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Text Analyzer BIS1523 – Lecture 14.
Chapter 2 - Introduction to C Programming
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Chapter 2 - Introduction to C Programming
Python Strings.
Presentation transcript:

1 Dictionary priorities, e- dictionaries of compounds, morphological mode Cvetana Krstev & Duško Vitas

2 The morphological mode  When working with a text that Unitex has already tokenized, then the only way to pose a query that searches inside a token is to use morphological filters (from what we learned by now).  But morphological filters have their limits because they cannot refer to dictionaries.  They cannot be used to pose a following query: a token formed by a string which is a legal prefix and a string which is a legal verb form.  What would be a result of a search with this graph?  Nothing! Because it looks for a prefix which is a token and a token that is a verb form – that is two separate tokens.

3 How to use the morphological mode?  A part of a grammar that we intend to apply in Locate pattern which should be used in the morphological mode should be enclosed with special boxes $.  These boxes will appear in a graph like violet angular brackets.  In this mode matching is performed letter by letter, and not token by token.  What would be result of a search with this graph?  Again nothing! Because graphs should abide to special rules when they enter the morphological mode.

4 Rules of the morphological mode(1)  The implicit space does not exist between boxes (as outside the morphological mode). If a space should be matched in the morphological mode, then it should be explicitly written  .  Sub-graphs can be used, but the beginning and the end of the morphological mode have to be in the same graph.  Variables cannot be introduced in the morphological mode with $x( and $x).  Lexical patterns that refer to dictionaries can be used (like ), as well as morphological filters on.  Left and right contexts are prohibited.  Transducer outputs can be used.  Morphological filters can apply to but they will actually apply only to one character (which is “token” in this case), like in >.

5 Rules of the morphological mode(2)  matches any letter (as defined in Alphabet).  matches any lower-case letter (as defined in Alphabet).  matches any upper-case letter (as defined in Alphabet).  matches any word present in a morphological dictionary.  Lexical patterns referring to morphological dictionaries can be used.  Patterns #,,,, and are forbidden.  If a program Locate reaches the end of the morphological zone (a box $> ) before reaching the end of a token, the match will fail. For instance, the previous graph can not match (pre)(vodi) in prevodilac although it matches a prefix followed by a verb form.

6 What are morphological dictionaries and how to use them?  In the morphological mode it is possible to use queries that refer to dictionaries, in order to recognize, for instance (pot)(krpili).  The verb form krpiti – krpili – (from krpiti ) need not be in a text itself, and so the dictionary of a text cannot be used.  Because of that a user should prepare a list of dictionaries that he/she wishes to use in the morphological mode.  These dictionaries may be chosen from those normally used but can also be specific for recognitions inside tokens (like dictionaries of affixes).

7 Defining a list of morphological dicitonaries  Use the option Preferences menu Info, a card Morphological dictionaries.  Dictionaries – only in.bin format are added by pressing Add, and removing from a list by pressing Remove.  For Serbian a dictionary of prefixes is selected delaf- aprefix.bin that is not used for normal processing, and  a general dictionary delaf- Srpki.bin used for normal processing.

8 Results of searching with a graph in the morphological mode  When applied to a collection 5izvora following verb forms obtained by prefixation from existing verbs (with a LOT of noise).verb forms  What does a form pobede do here? A prefix po and a form of verb bediti, bede.  If we add the following negative context – outside the morphological mode – graph will extract forms that are maybe verbs obtained by prefixation.verbs  What does protivnica do here? A prefix protiv and a verb nicati, a form nica (aorist 3 rd person singualar).

9 Dictionary entry variables  User can associate variables with patterns that refer to morphological dictionaries (except ).  The output of such a box is the associated variable $x$. $x.LEMMA$ - a lemma of a recognized form $x.INFLECTED$ - a recognized form $x.CODE$ - codes associated to a lema  We get the following decomposition if we use this graph in the MERGE mode.decomposition

10 Additional dictionary entry variables  The CODE variable can be used with three additions: $x.CODE.GRAM$ returns the first grammatical code, usually that is a PoS code. $x.CODE.SEM$ returns remaining grammatical codes, separated with plus sign + ; usually semantic markers. $x.CODE.FLEX$ returns all inflectional codes separated with a colon :.  We get the new decomposition if we use this graph in the MERGE mode.decomposition

11 Dictionary graphs that use the morphological mode  The morphological mode, together with dictionary entry variables can be used in dictionary graphs.  This is one such graph – it recognizes adverbs that were constructed by prefixation from adverbs already in a dictionary.  A “prefix” ca be a “true” prefix (from a morphological dictionary of prefixes) or an adjective form in the neuter singular form.  Such a dictionary graph should be applied with the lowest priority ( + ).  In the collection 5-izvora following “new” adverbs. are recognized.“new” adverbs.

12 Output variables  Normal variables, introduced by boxes $xxx( and $xxx) capture a part of a input text – a part that matched a part of a grammar.  Output variables captures a part of an output produced by a grammar.  They are introduced by $|xxx( and $|xxx).  They appear as blue parenthesis in a graph.  Important! They do not actually produce the output – the output is stored as a value of corresponding output variable.  Important! If output is a variable, like $a.LEMMA$, then this string will not be the value of corresponding output variable; its value will be a lemma corresponding to the input string stored in $a$.

13 An example of the use of output variables  The value of the output variable is the type of recognized input strings.  When applied to the text 5izvora-izvod in MERGE mode, following concordance lines are obtained.concordance  Note! No output is produced around recognized input strings.

14 Operations on variables  Two types of operation on variables are possible: testing variables comparing variables  Both operations on variables apply to all kind of variables: normal, output and dictionary.

15 Testing variables  It is possible to test whether a variable is set or not in order to block a current matching operation if a condition is not satisfied.  In order to test whether a variable is set enter an empty box with the output set to $xxx.SET$. This output will be ignored, and if the variable xxx has been defined, the matching operation will continue, otherwise it will fail.  The reverse test is $xxx.UNSET$

16 An example with testing variables  Between a noun phrase and a verb there can be an adverb. This graph produces different output consequently.  When applied to the text 5izvora-izvod in MERGE mode, following concordance lines are obtained.concordance

17 Comparing variables  This is another kind of a test.  User can compare a value of a variable against another variable or a constant value.  Use $xxx.EQUAL=yyy$ as the output of an empty box to test whether variables xxx and yyy have the same value. If the test fails, the grammar will block.  Use $xxx.EQUAL=#yyy$ as the output of an empty box to test whether variables xxx has the value yyy. If the test fails, the grammar will block.  The reverse test is $xxx.UNEQUAL=yyy$

18 An example with comparing variables (1)  The first graph recognizes an adjective/noun construction and produces an output important for agreement.  More such graphs exist: demonstrative pronoun/noun, possessive pronoun/noun, etc.  The second graph recognizes a simple verb phrase: Aux/Adjective, past perfect, present passive. This verb phrase agrees with gender and number.

19 An example with comparing variables (2)  This graph recognizes a simple phrases: NP [ADV] VP  When applied to the text 5izvora-izvod in MERGE mode, following concordance lines are obtained if output agreement variables of noun and verb phrases are NOT tested.concordance  If output agreement variables of noun and verb phrases ARE tested, then retrieval produces different results.different results