Presentation is loading. Please wait.

Presentation is loading. Please wait.

Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

Similar presentations


Presentation on theme: "Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational."— Presentation transcript:

1 Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational Linguistic Project Tibor Laczkó, György Rákosi & Ágoston Tóth Department of English Linguistics University of Debrecen {laczkot, rakosigy, tagoston}@delfin.unideb.hu Sponsored by OTKA research grant K 72983

2 Overview 1.Lexical-Functional Grammar (LFG) 2.The ParGram Project at PARC 3.The HunGram Project in Debrecen 4.A short demonstration: possible ParGram treatments of certain elliptical noun phrases in English and Hungarian

3 1/1 Stanford and LFG LFG as a linguistic theory was developed in the late 1970s. LFG as a linguistic theory was developed in the late 1970s. One of the principal aims was to create a framework suitable for massive computational applications, and there has been a lively co- operation between theory and computational linguistic practice ever since. One of the principal aims was to create a framework suitable for massive computational applications, and there has been a lively co- operation between theory and computational linguistic practice ever since. The two co-founders: The two co-founders: Joan Bresnan (Stanford University, SU) Joan Bresnan (Stanford University, SU)  mainly linguistic aspects Ronald Kaplan (Palo Alto Research Center, PARC and SU, now at Powerset, Inc.) Ronald Kaplan (Palo Alto Research Center, PARC and SU, now at Powerset, Inc.)  mainly computational aspects General information on LFG is available at: General information on LFG is available at:http://www.essex.ac.uk/linguistics/LFG/

4 1/2 Design Principles of LFG Lexicalism Lexicalism Modularism Modularism Parallel architecture Parallel architecture Generating and parsing structures are equally important Generating and parsing structures are equally important Rule system that is directly renderable in a mathematical formalism Rule system that is directly renderable in a mathematical formalism

5 1/3 Central Modules of LFG constituent structurephonology (language-specific) constituent structurephonology (language-specific) word order word orderlexicon (powerful) (powerful) functional structure semantics (universal) grammatical relations grammatical relations

6 1/4 Adpositional phrases in LFG PPPP NP PPPP NP Pr NPNP Po DetN Det N Det N nearthe box a doboz mellett a doboz-ban in PREDnear/in/mellett/-ban, Pr ‘NEAR/IN ’ OBJPRED box, N ‘BOX’ DEF+ PERS3 NUMsg near/in, Pr ‘NEAR/IN ’ mellett, Po ‘NEAR ’ -ban, Nsuff ‘IN ’ -ban, Nsuff ‘IN ’

7 2/1 PargGram at PARC The Parallel Grammar (ParGram) project – launched and organized by PARC The Parallel Grammar (ParGram) project – launched and organized by PARC LFG-based computational program LFG-based computational program Capitalizes on LFG’s flexible general linguistic and computationally implementable architecture Capitalizes on LFG’s flexible general linguistic and computationally implementable architecture Parser and generator Parser and generator Goal: to analyze more and more languages on a maximally uniform platform – in the spirit of Universal Grammar Goal: to analyze more and more languages on a maximally uniform platform – in the spirit of Universal Grammar

8 2/2 PargGram at PARC A truly international project: A truly international project: English, German, French, Norwegian, Japanese, Chinese, Urdu (India), Malagasy (Madagascar), Arabic, Vietnamese, Spanish, Welsh, Indonesian, Turkish, Georgian, & Hungarian Further information: Further information: http://www2.parc.com/isl/groups/nltt/default.html

9 2/3 XLE parser a deep, grammar-based parsing system for implementing lexical-functional grammars; constructed as part of the ParGram project output: c-structures and f-structures supports tokenization and morphological analysis through finite-state transducers (with alternative analyses) can select the most probable analysis from the potentially large candidate set using stochastic disambiguation (if implemented) has a generator mode implemented in C; runs on Solaris, linux, and MacOSX. bottom line: a facility for writing syntactic rules and lexical entries, and for testing and editing them

10 toy-eng.lfg the D * (^ DEF)=+. girl N * (^ PRED) = 'GIRL'. walk V * (^ PRED)='WALK '; N * (^ PRED)='WALK'. c-structure context-free phrase- structure tree encoding constituency and linear order f-structure attribute- value matrices that encode predicate- argument relations and other grammatical information (e.g. number, tense, case)

11 2/4 Challenging natural language phenomena Lexical ambiguity Homonymy: Polysemy: Lexical ambiguity Homonymy: bank, fluke; ár, légy, ír Polysemy: bulb, line; körte, toll Structural ambiguity  Structural ambiguity I saw the girl with the telescope.  Részegen láttam Jánost. // Egész nap a hajókat néztük a Dunán. // Word formation (compounding, derivation, minor processes) Word formation (compounding, derivation, minor processes) horror, horrid, horrify; terror, (*terrid), terrify; candor, candid, (*candify) student film society committee scandal video… Anaphoric references Anaphoric references a)We gave the bananas to the monkeys because they were hungry. b)We gave the bananas to the monkeys because they were ripe. Ellipsis Ellipsis

12 2/5 Direct challenges Non-toy lexicon of the Hungarian language, empirical techniques Non-toy lexicon of the Hungarian language, empirical techniques Tokenization, morphological analysis walks Named entity recognition Tokenization, morphological analysis walkswalk +Verb +Pres +3sg walk +Noun +Pl Named entity recognition Types: person, role, location, organization, brand, title, etc. This is the website of [the University of Debrecen org ]. [The University of Debrecen loc ] is not far from us. Parsing performance trade-off between accuracy, usability and speed Parsing performance trade-off between accuracy, usability and speed

13 3/1 HunGram Tibor Laczkó – 2005/2006: Fulbright research grant to Stanford University Tibor Laczkó – 2005/2006: Fulbright research grant to Stanford University a ParGram invitation to PARC a ParGram invitation to PARC  research at two host institutions two goals at PARC: two goals at PARC: (i)familiarity with the formalism (XLE) (ii)starting the implementation in XLE of the results of the research on the morpho-syntax of Hungarian noun phrases (in an LFG framework)

14 3/2 HunGram LFG Research Group (LFGRG) at the Department of English Linguistics, UD LFG Research Group (LFGRG) at the Department of English Linguistics, UD Tibor Laczkó Tibor Laczkó György Rákosi György Rákosi Ágoston Tóth Ágoston Tóth 2 PhD students 2 PhD students XLE software licence from PARC XLE software licence from PARC

15 3/3 HunGram OTKA research grant for 2008—2012 (K 72983) OTKA research grant for 2008—2012 (K 72983) (Hungarian Scientific Research Fund) objectives objectives 1.developing a comprehensive LFG grammar of the Hungarian language (morphology, syntax, lexicon, semantic issues) 2.implementing it in HunGram/ParGram 3.launching an English vs. Hungarian comparative research project on the ParGram platform 4.incorporating the results in various course materials at the English Linguistics Department (1 & 2)  3  4 (1 & 2)  3  4

16 4/1 Demo Elliptical noun phrases

17 4/2 “az öt nagy zöldet” c-structurec-structure + morphology f-structure

18 4/3 “the five large green ones”

19 4/4 “a három ügyes fiú öt nagy zöldjét”

20 4/5 “the three boys’ five large green ones”

21 4/6 Elliptical noun phrases differences differences 1.c-structure 2.morphology (e. g.: +case vs. –case) 3.English: ‘pro’ realized by an overt element, in the lexicon, in c- structure, in f-structure 4.Hungarian: ‘pro’ is covert, introduced by a functional annotation in c-structure, present in f-structure 5.EngGram vs. Hungram wrt to the number of features similarities similarities 1.f-structure – except for typological differences (case etc. features) 2.as Hungram gets more and more developed, more and more shared EngGram (ParGram) features proposal proposal (previous talk) – a more lexical solution: ‘pro’ introduced by (case-marked) adjectival lexical items (previous talk) – a more lexical solution: ‘pro’ introduced by (case-marked) adjectival lexical items plan plan testing its implementability in HunGram (and EngGram?) testing its implementability in HunGram (and EngGram?)

22 Hungram1.lfg FIRST HUNGARIAN CONFIG (1.0) ROOTCAT ROOT. FILES common.templates.lfg hun-lex.lfg hun-templates.lfg hun- morphconfig.lfg hun-rules.lfg. LEXENTRIES (FIRST HUNGARIAN). CHARACTERENCODING iso-8859-2. MORPHOLOGY (STANDARD HUNGARIAN). RULES (FIRST HUNGARIAN). TEMPLATES (STANDARD COMMON) (FIRST HUNGARIAN). GOVERNABLERELATIONS SUBJ OBJ POSS OBL OBL-? COMP XCOMP PREDLINK. SEMANTICFUNCTIONS ADJUNCT TOPIC FOCUS. ----


Download ppt "Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational."

Similar presentations


Ads by Google