Linguistics 187/287 Week 2 Engineering and Linguistic Generalizations.

Slides:



Advertisements
Similar presentations
Natural Language Processing Syntax. Syntactic structure John likes Mary PN VtVt NP VP S DetPNVtVt NP VP S Every man likes Mary Noun.
Advertisements

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Feature Structures and Parsing Unification Grammars Algorithms for NLP 18 November 2014.
Lexical Functional Grammar : Grammar Formalisms Spring Term 2004.
Semantics (Representing Meaning)
Lexical Functional Grammar History: –Joan Bresnan (linguist, MIT and Stanford) –Ron Kaplan (computational psycholinguist, Xerox PARC) –Around 1978.
BİL711 Natural Language Processing1 Problems with CFGs We know that CFGs cannot handle certain things which are available in natural languages. In particular,
Chapter 4 Syntax.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Grammar Development Platform Miriam Butt October 2002.
Grammar Engineering: Set-valued Attributes Various Kinds of Constraints Case Restrictions on Arguments Miriam Butt (University of Konstanz) and Martin.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
Statistical NLP: Lecture 3
MORPHOLOGY - morphemes are the building blocks that make up words.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור עשר Chart Parsing (cont) Features.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Sag et al., Chapter 4 Complex Feature Values 10/7/04 Michael Mulyar.
Issues in Computational Linguistics: Parsing and Generation Dick Crouch and Tracy King.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Kakia Chatsiou Department of Language and Linguistics University of Essex XLE Tutorial & Demo LG517. Introduction to LFG Introduction.
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
Context-Free Parsing Part 2 Features and Unification.
Embedded Clauses in TAG
Feature structures and unification Attributes and values.
1 Features and Unification Chapter 15 October 2012 Lecture #10.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
LING 388: Language and Computers Sandiway Fong Lecture 17.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
2007CLINT-LIN-FEATSTR1 Computational Linguistics for Linguists Feature Structures.
Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology Miriam Butt (University of Konstanz) and Martin Forst (NetBase.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Head-driven Phrase Structure Grammar (HPSG)
Linguistic Essentials
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
Chapter 3 Describing Syntax and Semantics
Section 11.3 Features structures in the Grammar ─ Jin Wang.
October 25, : Grammars and Lexicons Lori Levin.
ISBN Chapter 3 Describing Syntax and Semantics.
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
Linguistics 187 Week 3 Coordination and Functional Uncertainty.
1.[ S I forced him [ S PRO to be kind]] Phrase structure analyses in traditional transformational grammar:
1 Some English Constructions Transformational Framework October 2, 2012 Lecture 7.
 2003 CSLI Publications Ling 566 Oct 17, 2011 How the Grammar Works.
Lexical-Functional Grammar A Formal System for Grammatical Representation Kaplan and Bresnan, 1982 Erin Fitzgerald NLP Reading Group October 18, 2006.
April 2010Semantic Grammar1 A short guide to Blackburn’s Grammar of English.
September 26, : Grammars and Lexicons Lori Levin.
Chapter 11: Parsing with Unification Grammars Heshaam Faili University of Tehran.
Week 12. NP movement Text 9.2 & 9.3 English Syntax.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Natural Language Processing Vasile Rus
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Lexical Functional Grammar
Chapter Eight Syntax.
Instructor: Nick Cercone CSEB -
Department of Language and Linguistics
Chapter Eight Syntax.
Introduction to Computational Linguistics
Ling 566 Oct 14, 2008 How the Grammar Works.
Principles and Parameters (I)
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Presentation transcript:

Linguistics 187/287 Week 2 Engineering and Linguistic Generalizations

Homework: –Due Friday »Can discuss in class or via or ask us for office hours –Last assignment: »How much time? »Trouble: access, procedure? »Issues: XLE, LFG, grammar?

Topics for this week Notation in LFG (more background) Templates Lexical rules Configurations Feature declaration Metarulemacro

Grammar engineering for deep processing Draws on theoretical linguistics, software engineering –Theoretical linguistics => papers »Generalizations, universality, idealization (competence) –Software engineering => programs »Coverage, interface, QA, maintainability, efficiency, practicality Grammar engineering –Grammar::Theory = Program::Programming language –Reflect linguistic generalizations –Respect special cases of ordinary language –Deal with large-scale interactions –Theory/practice trade-offs

Grammar Engineering and Linguistic Theory Description vs. representation –Program vs. data Expressiveness of notation –Regular predicates for c-structure –Boolean combinations (esp. disjunction) –Equality, set-membership Defaults and marking conventions –Constraining vs. defining, existentials, defaults Abbreviation and factoring –Templates, macros, lexical rules Configuration management –Combining rules, templates, lexicons… –Priority of core/specializations/extensions

Description vs. Representation Complexity trades (program vs. data) –Simplify descriptions but complicate representations –Complicate descriptions but simplify representations Example: Arguments and adjuncts –Different behavior »Arguments selected by predicate, unique »Adjuncts modify predicate, multiple instances –Similar behavior: Can both be questioned –Representation solution (HPSG) ARG ADJ DEP = ARG  ADJ (new type) –Description solution (LFG) ARG ADJ ARG | ADJ

Description vs. Representation External constraints on representation –Linguistic theory –Applications –Multilingual/cross-grammar similarity

Expressiveness of notation Regular predicates for c-structure NP --> (Det) N optionality NP --> N NP --> Det N NP --> { N | Pron} disjunctionNP --> N NP -> Pron NP --> N NP --> Det N NP --> Pron NP --> { (Det) N | Pron } Simple context-free rulesCompact notation

Expressiveness of notation and Representation Equality: attribute values Set-membership: sets and elements –Adjuncts: PP: (^ ADJUNCT)=! PP*: ! $ (^ ADJUNCT) –Coordination (more next week) NP --> NP: ! $ ^; CONJ NP: ! $ ^. Semantic forms –(^ PRED)=‘kick ’ –Semantic relations, instantiation, subcategorization

Defaults and Marking Conventions Constraining vs. defining –Must be assigned nom: (^ SUBJ CASE)=c nom –Is nom: (^ SUBJ CASE)=nom Existentials –Must have case: (^ CASE) Defaults –NTYPE proper pronoun common –{ (^ NTYPE) (^ NTYPE)~=common | (^ NTYPE)=common } (make choices disjoint)

Abbreviations and Factoring Templates –Capture generalizations of annotations –Maintainability: changes, mistakes –Compare: HPSG type hierarchy Macros –Capture generalizations of rules Lexical Rules –Theoretical proposal to manipulate predicates –Implemented to expand lexicons consistently

Example: The verb bakes Belongs to several classes –Third-person, singular, present-tense verb –Transitive or intransitive Shares –Some properties with falls –Other properties with cooked

The lexicon à la Kiparsky A dumping ground for exceptions “A kind of appendix to the grammar, whose function is to list what is unpredictable and irregular about the words of a language”

The lexicon à la Bresnan A repository of linguistic generalizations Active and passive forms are related by lexical rules, not syntactic transformations (^ SUBJ)  (^ OBL-AG) (^ OBJ)  (^ SUBJ) Rules relating lexical items are a prime locus of syntactic generalizations

The lexicon à la Flickinger A hierarchical structure of classes Each class represents some piece of syntactic information bakes belongs to: –the third-person singular present-tense class (like appears) –the transitive/intransitive class (like cooked) –and others Classes may be subclasses of other classes Classes may partition other classes along several dimensions

LFG: Relations between descriptions LFG functional description is a collection of equations These can be named This name can stand for those equations in linguistic descriptions Named descriptions are referred to as templates Interpretation: Simple substitution Template-description is substituted for template-name that appears in (is invoked by) another description LFG can encode linguistic generalizations as relations between descriptions of structures

3SG and P RESENT templates 3SG = (^ SUBJ PERSON ) = 3 (^ SUBJ NUM ) = SG. “ 3SG names (^ SUBJ PERSON )=3 (^ SUBJ NUM )= SG” P RESENT = (^ TENSE ) = marks invocation (in lexicon, rules, templates) Substitute (^ TENSE)=PRES RESENT in other descriptions

Templates enable hierarchical generalizations Template definitions can refer to other templates by name –E.g. further divide 3SG into: 3P ERS = (^ SUBJ PERSON ) = 3. S ING = (^ SUBJ NUM ) = SG. then 3SG ING. Hierarchy of references represents inclusion hierarchy of named descriptions Frequently repeated subdescriptions –specified in one place –effective in many

Hierarchy of template invocations Sharing in verb agreement P RES 3SG P RESENT 3SG S ING 3P ERS P RES N OT 3SG P RES N OT 3SG RESENT. ⇒ ~ PERS ] ⇒ ~ [(^ SUBJ NUM)=SG (^ SUBJ PERS=3 ] Boolean combinations of template references (just like ordinary descriptions) Sharing is distinct from mode of combination

Functional description for bakes {(^ PRED )=‘bake ’ | (^ PRED )=‘bake ’ } (^ TENSE ) =PRES (^ SUBJ PERS )=3 (^ SUBJ NUM ) =SG With agreement template: { (^ PRED )=‘bake ’ | (^ PRED )=‘bake ’ RES 3SG Agreement template invoked by other verbs

Templates with parameters: Valency T RANS - OR -I NTRANS (_p) = { (^ PRED) = ‘_p ’ | (^ PRED) = ‘_p ’ }. PRED value as a parameter of the RANS - OR -I NTRANS (bake) ⇒ { (^ PRED) = ‘bake ’ | (^ PRED) = ‘bake ’ } Arguments can substitute for any part of an f-description –Attributes –Values –Semantic relation-names –Descriptions Pargram convention: Parameters begin with _

Valency hierarchy T RANS - OR -I NTRANS (p) = NTRANSITIVE (p) RANSITIVE (p) }. I NTRANSITIVE (p) = (^ PRED )=‘p T RANSITIVE (p) = (^ PRED )=‘p ’. T RANS - OR - INTRANS I NTRANSITIVE T RANSITIVE

Templates and generalizations: bakes RANS - OR - INTRANS RES 3SG T RANS - OR -I NTRANS (p): shared by eat, cooked,… P RES 3SG: shared by appears, goes, cooks,… P RESENT : –used by P RES 3SG template –shared by bake, laugh, etc.

Lexical sharing T RANS - OR - INTRANS I NTRANSITIVE T RANSITIVE P RES 3SG P RESENT 3SG 3P ERS S ING bakescookedfalls

Type hierarchy vs. templates Templates can play the same role as hierarchical type systems in theories like HPSG A notational device for factoring descriptions –Interpreted as simple substitution –Not part of a formal ontology –Do not require an elaborate mathematical characterization

Templates also invoked by Rules Rule annotations can also call templates –Global changes, typo prevention Example: adjunct annotation PP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VP ADVP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VP ADJ(_T) = ! $ (^ ADJUNCT) (! ADJ-TYPE)=_T. VP) NP) VP) S)

Templates: Rules Example: null pronouns Push it! They left (in order) to be on time. NULL-PRON(_P) = (_P PRED)=‘pro’ (_P PRON-TYPE)=null. VPimp --> (^ SUBJ)). VPimp --> VP: (^ SUBJ PRED)=‘pro’ (^ SUBJ PRON-TYPE)=null.

Templates: Extend notation DEFAULT(D V) = { D D~=V | D=V }. (^ NTYPE) common) IF(P1 P2) = { ~P1 | P2 } IFF(P1 P2) = { P1 P2 | ~P1 ~P2 }.

Templates and “Principles” Subject principle: every verb has a subject. Implementaton: VERB = (^ SUBJ). in every verbal entry. or in the templates called by the verbal entries.

Lexical Rules Theoretical construct Templates can often achieve the same result –Disjunction of several templates –Parameterization of a complex template

Lexical Rules: Example Active: They ate the cake. (^ PRED)=‘eat ' Passive: The cake was eaten. (^ PRED)='eat ' Could have VTRANS have two disjuncts Or: manipulate PRED with lexical rule

Lexical Rules: Example Passive lexical rule _SCHEMA is a subcategorization frame PASSIVE(_SCHEMA) = { _SCHEMA (^ PASSIVE)=- | _SCHEMA (^ SUBJ) --> NULL (^ OBJ) --> (^ SUBJ) (^ PASSIVE)=c +}. Example calls –TRANS(_P) (^ PRED)='_P '). –DITRANS(_P) (^ PRED)='_P ').

Lexical Rules: Summary Lexical rules manipulate arguments of predicates –capture systematic alternations like active-passive Rename and remove roles No good implementation for adding roles –causative –complex predicates –benefactives

Configuration Management Combining rules, templates, lexicons, … –System needs to know where everything is –For large grammars, need modularization (multiple grammar rule files, multiple lexicons) Priority of core/specializations/extentions –Want to specialize a grammar »No questions in instruction manuals »Loosen subj-V agreement –Have lexicons of varying quality

Combining Rules, Templates, Lexicons XLE: configuration section –Specify what files are called –Specify which rule, template, and lexicon sections are used RULES (TOY ENGLISH). RULES (CORE ENGLISH) (SPECIAL ENGLISH). –Other grammar information

Configurations and Declarations Configurations –File management –Priority Declarations –Governable relations and semantics –Features Global Operators –METARULEMACRO

Files Priority ordered; rules/entries in later files override those in earlier ones Example: FILES standard-english-rules.lfg eureka-english-rules.lfg standard-english-lexicon.lfg eureka-english-lexicon.lfg.

Eureka vs. Standard rules STANDARD ENGLISH RULES (1.0) N --> NOUN-COMMON -> … NOUN-PROPER -> … EUREKA ENGLISH RULES (1.0) N --> | N PL }. NOUN-EUREKA --> { EUR-PART | EUR-NUM }.

Sections Used All lexicon, rule, and template sections have names and versions*. These are called in priority order in the config. Use with the file order to create overrides. RULES (STANDARD RULES) (EUREKA RULES). LEXENTRIES (all all). *Versions allow for future XLE upgrades

Multiple Lexicon Sections LEXENTRIES (AUTOMATIC ENGLISH) (CORRECTED ENGLISH). AUTOMATIC ENGLISH LEXICON (1.0) appear V XLE appear) appear)}. CORRECTED ENGLISH LEXICON (1.0) appear V XLE appear) appear)}.

Other Configuration Information ROOTCAT: default top level category –Standard: ROOT, Eureka: FIELD Nondistributives for coordination External attributes for applications Character encoding Reparse category and Optimality order for robustness See XLE documentation for complete list

Declarations Must declare grammatical and semantic functions for each grammar. –Used for completeness and coherence GOVERNABLERELATIONS –Functions (features) that must be subcategorized for in the PRED –SUBJ OBJ OBL-?* ?COMP etc. SEMANTICFUNCTIONS –Functions that must have a PRED –ADJUNCT NMOD

Feature Declaration List of all the features –GGF and semantic functions need not be listed –all other features must be listed List of their possible values –atomic –f-structure Multiple feature declarations –multilingual setting –grammar specialization

Why a feature declaration? Good engineering practice Catch typos and old analyses Grammar easier to read NB: Theory doesn’t have typos

Declaration format STANDARD LANGUAGE FEATURES (1.0) feature1: -> $ { val1 val2 val3 }. feature2: -> $ {val4 val 5 }. feature3: -> << [ feature1 feature2 ]. feature

Sample feature declaration TOY ENGLISH FEATURES (1.0) NUM: -> $ { sg pl }. PERS: -> $ { }. TNS-ASP: -> << [ TENSE MOOD ASPECT ]. TENSE. MOOD: -> $ { indicative subjunctive }. ASPECT: -> << [ PERF PROG ]. PERF: -> $ { + - }. PROG: -> $ {+ - }.

XLE and the feature declaration XLE will not load a grammar with a violation of the feature declaration. To catch violations in the lexicon, the generator must be loaded. –regenerate “some-sentence-to-parse” –parse, then choose “generate” in f-str window –create-generator grammar-name.lfg print-unused-feature-declarations

Multiple feature declarations List in priority order in the configuration –FEATURES (STANDARD COMMON) (STANDARD ENGLISH). –New features are listed as usual –Changes to features use edit operators + add a new value & intersect the values ! replace the feature entirely

Multiple feature declarations STANDARD COMMON FEATURES (1.0) NUM: -> $ { sg pl dual }. CASE: -> $ { nom acc }. TENSE: -> << [ PAST FUTURE ]. PAST: -> $ { + - }. FUTURE: -> $ { + - }. STANDARD ENGLISH FEATURES (1.0) PERS: -> $ { }. PERS: -> $ { }. &NUM: -> $ { sg pl }. NUM: -> $ { sg pl }. +CASE: -> $ { gen }. CASE: -> $ { nom acc gen }. !TENSE: -> $ { pres past fut }. TENSE: -> $ {pres past fut }. !PAST: -> $ { }. !FUTURE: -> $ { }.

Using Multiple Feature Decl. Multilingual contexts –Language universal features –Customize to particular language Grammar specialization –Add new features for odd constructions –Remove unused choices

Global Operations: METARULEMACRO System defined function –Operates on every category Global statements –Linguistic: subject condition SUBJ < OBJ coordination –Engineering: quotes bracketing

METARULEMACRO Right-hand side of each grammar rule is the result of applying the macro to the rule METARULEMACRO(_CAT _BASECAT _RHS) = _RHS.

Punctuation and METARULEMACRO Surround any constituent with quotes METARULEMACRO( _CAT _BASECAT _RHS) = { _RHS | L-QT _CAT R-QT |L-DQT _CAT R-DQT}.

Punctuation cont. `Mary and John’ left them there. We saw them “in the garden”. They `appeared and then disappeared.' NP L-QTNPR-QT CONJNP MaryandJohn

Punctuation: Problem Vacuous branching results in many analyses NP Nzero N bagels NP NPzero Nzero N bagels L-QTR-QT NP NPzero N N bagels L-QTR-QT etc.

Solution: PUSHUP If non-branching, push up to highest node. METARULEMACRO(_CAT _BASECAT _RHS) = { _RHS | L-QT R-QT }. How to define PUSHUP? –Need to test existence of sister nodes: * MOTHER SISTER PUSHUP = { (* MOTHER LEFT_SISTER) |(* MOTHER RIGHT_SISTER) ~(* MOTHER LEFT_SISTER) |~(* MOTHER MOTHER) }.

Summary Lexical rules allow for generalizations over predicate alternations Configurations and declarations allow management of large-scale grammars –readability and consistency –maintenance –specialization Global operators allow for cross-grammar generalizations –coordination

The HPSG lexicon: a type hierarchy More specific types inherit information from less specific Types and subtypes: –A mathematical relation between structures: AND/OR lattice –Different subtypes represent alternatives/disjunction –Multiple supertypes represent conjunction … but type inheritance is not the only (best?) way to express generalizations LFG does not use typed feature structures for lexical generalizations head noun relational c-noun gerundverb AND OR (Malouf)

Coordination without METARULEMACRO Want to coordinate any constituent Coordination macro SCCOORD(_CAT) = [ _CAT: ! $ ^; COMMA]* _CAT: ! $ ^; CONJ _CAT: ! $ ^. Put call in each rule: NP: { (DET) AP* N PP* NP)}. Engineering problem: –forget to call –put in wrong category

Coordination with METARULEMACRO Call SCCOORD as part of MRM METARULEMACRO(_CAT _BASECAT _RHS) = { _RHS _CAT)}. NP rule now: NP: (DET) AP* N PP*. Effectively: NP: { (DET) AP* N PP* NP}.