E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Slides:



Advertisements
Similar presentations
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
Advertisements

IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
The Seven Pillars of Open Language Archiving: A Vision Statement Gary Simons and Steven Bird Workshop on Web-based Language Documentation and Description.
White Paper on Establishing an Infrastructure for Open Language Archiving Steven Bird and Gary Simons.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LREC Symposium: The Open Language Archives Community.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
The Wichita lexicon in LEXUS Armik Mirzayan University of Colorado at Boulder Jacquelijn Ringersma Max Planck Institute for Psycholinguistics RELISH Workshop.
English Lexicography.
Lexicography versus Terminography
Stage 5 Prepare the front and back matter.. The five stages of developing a dictionary. 1.Collect words (using semantic domains). 2.Add fields (automated.
Lexicography ( Dictionary Skills) Lecture 2
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
C SC 620 Advanced Topics in Natural Language Processing Sandiway Fong.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Thesaurus Design and Development
Kirrkirr: a Bidirectional Warlpiri- English Dictionary Kristen Parton.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
1. 2 Content WSK Online is a new online database of specialized dictionaries covering all the major areas of linguistics and communication science: Biannual.
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
Thesaurusmanagement Quickstart Introduction. What are controlled vocabularies? organized arrangement of words and phrases used to index content and/or.
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Learning Objects Stephen Downes Leaders in Learning May 5, 2000.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
EMELD Workshop on Digitizing Lexical Information Modeling Lexical Entries in Bilingual Dictionaries —Or— Exegeting the UML Model Mike Maxwell Linguistic.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Indexes/Abstracts Ready Reference Dr. Dania Bilal IS 530 Spring 2002.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Chapter 1: By: Ms. Ola Al-arjani
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Gary Holton ANLC E-MELD Workshop August 2002 Alaska Native Language Archive.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Introducing MorphoLogic to LIRICS Gábor Prószéky MorphoLogic Pázmány Péter Catholic University Faculty.
Lexicography versus Terminography Dr Mariëtta Alberts Manager: Standardisation and Terminology Development Pan South African Language Board.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Grade 8 – Writing Standards Text Types and Purposes (1b) Write arguments to support claims with clear reasons and relevant evidence. Support claim(s) with.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Aug 2-5, 2002 EMELD Workshop Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization.
LEXUS a flexible web based lexicon tool LEXUS a flexible web based lexicon tool, august 21 th, 2005 Marc Kemps-Snijders Peter Wittenburg
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
1. 2 Content The Historisches Wörterbuch der Rhetorik [Historical Dictionary of Rhetoric] is the only comprehensive academic reference work in the field.
Lexicography Lexicon has two different meanings:
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA The School of Best Practice How Standards can Matter Anthony Aristar, Wayne State University.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Types of Dictionaries A. Types of Dictionaries in terms of form/medium: - Books (advantages & disadvantages) - CDs (advantages & disadvantages) - Internet/Online.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
INFORMATION SOURCES Resources in a library are determined by the information requirements of the users of the Library.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
3.0 Map of Subject Areas.
LACONEC A Large-scale Multilingual Semantics-based Dictionary
TERMINOLOGY AND TRANSLATION
Márton Németh – László Drótos How to catalogue a web archive?
Using Dictionaries in Translation (223 TRAJ)
Using Dictionaries in Translation (223 TRAJ)
Presentation transcript:

E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd Gibbon

Definitions The macrostructure of a lexicon is the arrangement of lexical entries in the lexicon (extended meaning includes front matter, mesostructure, …) Declarative determining factors: microstructure (arrangement of types of lexical information) mesostructure (arrangement of generalisations) Procedural/operational determining factors: medium: print, electronic, multimodal + multimedia channels consultation, navigation: onomasiological semasiological general search

Main points discussed Types of lexicon in OLAC linguistic type vocabulary: dictionary, wordlist, wordnet, thesaurus, terminology, proper NOUNS, bilingual, etymological, phonetic, frequency, analytical PLUS concordance, glossary, multilingual, encyclopaedic, help text index, thesaurus index, … Granularity of linguistic type hierarchy (additional levels beyond Dublin Core) and complexity of lexicon type Factorization of common subtypes out of hierarchy (not only for lexicon type Heterogeneity of types (structural, subject and functional types)

Structure criteria (3 rd level subtypes) Semasiological: dictionary (complex microstructure) wordlist (glossed; comparative; …) glossary (with definitions) terminology (ISO (non-)conformant) concordance Onomasiological: wordnet thesaurus encyclopaedia … index (help, thesaurus…), catalogue?

Formats, media Format + Medium: Mime-types, modalities, … Print format Database format Word-processor format Hypertext System component (e.g. for spell checker, dictation) Multimedia (digitized signals: audio, photos, video, …) XML + stylesheets, XSLT mappings … Question: are there lexicon specific formats which are not covered in the OLAC format type?

Subject, content criteria Specialized lexica based on subject.linguistic types of lexical information: Domain: fish, work, … Linguistic levels of description and categories: phonetic/pronunciation, verb, proper name … Rank: idiom, (un-)inflected word, stem, morpheme, … Other: frequency, etymological/historical, translation, bilingual, multilingual, …

User criteria (construction and/or consultation) Non-linguist (L1 speaker, L2 speaker, …) Research linguist (field, theoretical, …) Computational linguist (machine learning from corpora, inheritance lexica, …) Language and/or speech system developer (currently several such projects for minority languages)

Recommendations  Consider revising OLAC linguistic type controlled vocabulary to factor out linguistic levels as a common parameter.  Consider using actual linguistic genres such as “sketch grammar”, “field notes”, “domain lexicon”.  Consider cross-classifying a low granularity type vocabulary with format, content and user types.  Definitely provide improved definitions of lexicon types.  Definitely point to examples of existing lexica of a given lexicon genre to help users.

Some remaining questions Specific points to address: Is the OLAC list of lexicon types comprehensive enough? Is a taxonomy of lexicon types adequate, or must we parametrise? Which sub-attributes are needed from the relevant components? And of course a very basic question: Can all macrostructures be derived formally, i.e. automatically, from a generic declarative macrostructure (like views/indexings of a database) with appropriate microstructure and mesostructure?

Working Group participants Helen Aristar-Dry Dafydd Gibbon Veronica Grondona Michael Maxwell David Weber Jeff … … oops, sorry - I forgot to make a participant list 