Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Similar presentations


Presentation on theme: "E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd."— Presentation transcript:

1 E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd Gibbon

2 Definitions The macrostructure of a lexicon is the arrangement of lexical entries in the lexicon (extended meaning includes front matter, mesostructure, …) Declarative determining factors: microstructure (arrangement of types of lexical information) mesostructure (arrangement of generalisations) Procedural/operational determining factors: medium: print, electronic, multimodal + multimedia channels consultation, navigation: onomasiological semasiological general search

3 Main points discussed Types of lexicon in OLAC linguistic type vocabulary: dictionary, wordlist, wordnet, thesaurus, terminology, proper NOUNS, bilingual, etymological, phonetic, frequency, analytical PLUS concordance, glossary, multilingual, encyclopaedic, help text index, thesaurus index, … Granularity of linguistic type hierarchy (additional levels beyond Dublin Core) and complexity of lexicon type Factorization of common subtypes out of hierarchy (not only for lexicon type Heterogeneity of types (structural, subject and functional types)

4 Structure criteria (3 rd level subtypes) Semasiological: dictionary (complex microstructure) wordlist (glossed; comparative; …) glossary (with definitions) terminology (ISO (non-)conformant) concordance Onomasiological: wordnet thesaurus encyclopaedia … index (help, thesaurus…), catalogue?

5 Formats, media Format + Medium: Mime-types, modalities, … Print format Database format Word-processor format Hypertext System component (e.g. for spell checker, dictation) Multimedia (digitized signals: audio, photos, video, …) XML + stylesheets, XSLT mappings … Question: are there lexicon specific formats which are not covered in the OLAC format type?

6 Subject, content criteria Specialized lexica based on subject.linguistic types of lexical information: Domain: fish, work, … Linguistic levels of description and categories: phonetic/pronunciation, verb, proper name … Rank: idiom, (un-)inflected word, stem, morpheme, … Other: frequency, etymological/historical, translation, bilingual, multilingual, …

7 User criteria (construction and/or consultation) Non-linguist (L1 speaker, L2 speaker, …) Research linguist (field, theoretical, …) Computational linguist (machine learning from corpora, inheritance lexica, …) Language and/or speech system developer (currently several such projects for minority languages)

8 Recommendations  Consider revising OLAC linguistic type controlled vocabulary to factor out linguistic levels as a common parameter.  Consider using actual linguistic genres such as “sketch grammar”, “field notes”, “domain lexicon”.  Consider cross-classifying a low granularity type vocabulary with format, content and user types.  Definitely provide improved definitions of lexicon types.  Definitely point to examples of existing lexica of a given lexicon genre to help users.

9 Some remaining questions Specific points to address: Is the OLAC list of lexicon types comprehensive enough? Is a taxonomy of lexicon types adequate, or must we parametrise? Which sub-attributes are needed from the relevant components? And of course a very basic question: Can all macrostructures be derived formally, i.e. automatically, from a generic declarative macrostructure (like views/indexings of a database) with appropriate microstructure and mesostructure?

10 Working Group participants Helen Aristar-Dry Dafydd Gibbon Veronica Grondona Michael Maxwell David Weber Jeff … … oops, sorry - I forgot to make a participant list 


Download ppt "E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd."

Similar presentations


Ads by Google