Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002.

Similar presentations


Presentation on theme: "Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002."— Presentation transcript:

1 Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002

2 Overview Metalex goals Background: DATR, Hyprlex, Speech, Language Documentation Metalex design: theory and practice Lexical documents & metadocuments Lexical objects, properties, structures Metalex implementation Ivory Coast encyclopaedia project Ega documentation model project The Modelex (multimodal lexicon) project Ivory Coast + Nigeria documentation curriculum project Extending metalex Modalities & submodalities Data-driven lexicography Data structures & algorithms: trees, lattices; induction, inference

3 General objectives:  Versatile high quality spoken language lexicography  Motivated balance of high-tech + low tech  Good resources are data-driven and theory-informed Specific project objectives:  DATR/ILEX: formal lexicon theory and implementation  VerbMobil: integrated HyprLex dissemination model  HyprLex encyclopaedia model for Ivory Coast Languages  Ega endangered language documentation model  Modelex - theory and design of multimodal lexica  Ivory Coast and Nigeria curricula for language documentation Metalex goals: background

4

5

6 Data-driven data + metadata acqusition: Systematic metatext derived from and supporting...  Computational fieldwork  Induction of lexica Theory-informed data + metadata acquisition: Integrated Lexicon (ILEX) consisting of...  Abstract Lexicon (ALEX) - "theory" in the mathematical sense  Object Lexicon (OLEX) - "model" in the mathematical sense Metalex design: data and theory

7 Data-driven acquisition:  Computational fieldwork Portable metadatabase with restricted vocabulary and general metatext, and  Definition of and support for transcription + annotation  Portable support for scenarios, scripts  Portable support for lexicon processing  Induction of lexica Lexicon tools for  Extraction of macrostructural elements (lexeme elements)  Induction of microstructural information (media concordance, POS,...)  Induction of mesostructural regularities and subregularities (grammar,...) Metalex design: data

8 Theory-informed formalisation:  Abstract Lexicon (ALEX) - "theory" in the mathematical sense  Decomposition (componential A-V description)  Generalisation (inheritance)  Composition (multilinear operations)  Object Lexicon (OLEX) - "model" in the mathematical sense  XML archiving and dissemination formats  object-relational database acquisition and processing formats = Integrated Lexicon (ILEX) Metalex design: theory

9

10

11 Data model  Theory = shared lexicon architecture:  Macrostructure: declarative and procedural components  Lexicon architecture: relational, inheritance, text,...  Lexical objects: entry types  Lexical access: fact query, semasiological / onomasiological indexing  Mesostructure:  Generalisations: grammar, phonetics, cultural background,...  Composition of lexicon object types: idioms, words, morphemes,...  Lexical access: inferential query  Microstructure:  Lexical entry (article, lemma structure - atom, string, tree,...)  Types of lexical information - standardly: "lexicon model" Metalex implementation: architecture

12 Microstructure specification philosophy:  Anybody can specify any kind of unpredictable detail  Questionnaire / Experiment / Corpus / Archive dependence  Lexicon architecture: relational, inheritance, text,...  Intelligent (semi-)automatic classification, not fixed attributes  Theory-informed coarse grouping is possible  Media attributes: visual, auditory, tactile,...  Meaning attributes: definition, gloss, lexical relations,...  Composition attributes: context/category, parts, operations  Use attributes: style, register, concordance, media illustrations,...  Micrometadata attributes: lexicographer DB indices, source (e.g. fieldwork metadata) DB indices, modification,... Metalex implementation: microstructure

13 Metalex implementation: fieldwork metadata source (1) Situation dimensions  participant: fieldworker, partners, contacts  channel: modalities, media  locale: indoor/outdoor, spatial configuration  temporal: date, time, calendar event  functional: affiliation, role, occasion; observation (prompt, metadata management) Language dimension  affiliation  discourse level: discourse type, genre + prosody  phrase level: recursive phrasal categories/relations + prosody  word level: clitics, inflexion, word formation + prosody

14 Metalex implementation: fieldwork metadata source (2) Technical dimension  physical characteristics of participants: age, sex, health  physical characteristics of locale: indoor/outdoor, spatial configuration, temporal sequence, date (season), time (of day)  audio: mike type, position, room; A/D; channels, f sample, resolution; formats  video: camera & microphone type, analogue/digital; filters, lenses; audio; formats  other sensors: laryngograph, airflow, data glove,... Metalinguistic dimension  empirical method: introspection, experiment, corpus elicitation  materials: questionnaire, experiment layout, corpus scenario  metadata specification: index, metatext type, metacatalogue type

15 Metalex implementation: fieldwork metadata entry tool LREC 2002, Workshop on Portability Issues

16 Metalex implementation: fieldwork metadata entry tool HanDBase DBMS for PalmOS

17 Metalex objects in conjunction with work in ISLE CLWG (Computational Lexicon Working Group) (see Gibbon in reading list) LEXICON:  {, }  Macrostructure: Ordering( {ENTRY,...} )  Mesostructure:  Mesostructure: ENTRY:  

18 The LEXICON object Front Matter Metadata:  Bibliographical: creator, publisher, title, date,...  Medium / format: paper, CD-ROM/DVD, web,... Macrostructure type:  access: semasiological/onomasiological,  n-lingual/langue(s),  special: taxonomy (thesaurus), concordance  structure, e.g. tabular: f(type,attrib)=value

19 The ENTRY object: metadata Entry Metadata: (see Gibbon & al. in reading list)  Entry type (wrt macrostructure specification):  encyclopaedic  multiword unit, word,...  Microstructure data model specification:  entry structure: flat, tree, graph (net),...  dta categories specification (atribute, field, information type)  DC groups - structural skeleton  DCs  DC substructure - homography, homophony, polysemy...

20 The ENTRY object: DC groups Media ("surface"):  acoustic (phonetic, earcon, sonification,), visual (orthography, icon, gesture,...) Composition (structure):  part (e.g. morphology for words), context (e.g. POS, subcat for words) Meaning (definition, illustration):  semantic (components, relations, senses, ontology)  pragmatic (speech act, dialogue, disfluency,...) Use: typically: media (e.g. audio) concordance,... Metadata: lexicographer,...

21 The ENTRY object: DCs Countless Data Category models: (see reading list)  every existing dictionary  linguistic "types of lexical information"  several European projects (GENELEX, MULTILEX, ACQUILEX,...)  ISO terminology norms (cf. MARTIF etc....)

22 The ENTRY object: DC structures Computationally relevant properties of fields:  type (atomic, complex: tree, string, xyz-formatted text)  character encoding spec.: ASCII, Unicode, xyz  tree (or other graph/net):  finite depth  flat, disjunctive disjunctive tree  recursive graph (net)  table, non-tree graph, anchor/link/index structure  generated text:  print, hypertext (compiled vs. dynamic (generated on the fly)

23 Metalex microstruture application Media ("surface"):  phonemic & tonemic transcription (SAMPA ASCII - still waiting for Unicode...) Composition (structure):  morphemic substructure, category & subcategory Meaning (definition, illustration):  glosses (English, French, German)  definitions, senses, relations, components; audio-visual illustration Use: genres; examples (e.g. concordance link); free text notes Metadata: first record; last field

24 Metalex field lexicon microstruture Anouman_1:  Media attributes:  Phonemic tier: `an'U~m`'a~  Skeletal tier: VNVNV  Tonal tier: L H LH  Signal tier: Audio  Meaning attributes:  F-gloss: Oiseau  E-gloss: Bird  G-gloss: Vogel  Definition: avis  Homophone full: Anouman_2: grandchild  Homophone phonemic: Anouman_3: yesterday  Use:    Genre: narrative  Metadata:  Lexicographer: S. Adouakou  Source: Bielefeld-Anyi-Corpus, Adaou village, CI  Date: March 2002

25 Metalex portable lexical database Relational database:  Metalex specs flattened  structure re-constitution via metalex specs  HanDBase for PalmOS  Features:  standard full RelDBMS  XML, CSV, text export  export/import via GSM  inexpensive (wrt laptop)  stylus, keyboard, sync input  light weight  low power consumption  inconspicous in use  interfaces to Scheme, C

26 Metalex extension The Modelex project: "Theory and Design of Multimodal Lexica" Goals:  Data-driven, theory-informed lexicon models  Formal properties of abstract data models for multimodal lexica  Interpretation of abstract data models in XML  Integration of parallel annotation lattices for modalities and submodalities  Development of a prototype multimodal lexicon

27 The Modelex domain: modalities and submodalities

28 Modelex: data driven lexicography

29 Modelex: gesture annotation Time Aligned Signal Corpus System (Java, GPL) Jan-Torsten Milde, U Bielefeld TASX annotator:  Phonological tier  ToBI tiers  Gesture tier  Speech Act tier Anyi, Ega, German

30 Model-theoretic compilation in ILEX: INTERPRETATION ( ALEX ) = OLEX

31 Metalex in the Modelex project: M ultimodal concordance as microstructure DC Prototype: http://www.spectrum.uni-bielefeld.de/langdoc/PAX/

32 Metalex in the Modelex project: underspecified ALEX microstructure for gesture coordinates Hand: == "Palm" "Digit" == " " "> == " " " " " " " " <> ==. Palm: == == palm == pw == ph == == ( + ( - ) / 3 ) == ( + ( - ) * 2 / 3 ) == == px1 == py1 == ( + ) <> == Hand.

33 Metalex in the Modelex project: fully specified ALEX microstructure for gesture coordinates Hand: = palm px1 py1 ( px1 + pw ) ( py1 + ph ) thumb px1 py1 ( px1 - lt ) py1 fore px1 py1 px1 ( py1 - lf ) middle ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) ( py1 - lm ) ring ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) ( py1 - lr ) pinky ( px1 + pw ) py1 ( px1 + pw ) ( py1 - lp )

34 Metalex: conclusion & prospects User complexity:  demands an open, data-driven approach Domain:  demands a theory-informed approach  with computational acquisition & inference Data-driven and theory-informed lexica  are possible (METALEX)  need integrated model-theoretic approach (ILEX): INTERPRETATION (ALEX) = OLEX  a formal problem remains: differing complexity of trees (archive): simulation of other graphs via semantics only annotation lattices (data), tables (lexica): regular relations if non-recursive, indexed grammars if recursive?


Download ppt "Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002."

Similar presentations


Ads by Google