Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003
The Computing Research Laboratory (CRL) New Mexico State University Las Cruces, New Mexico Stephen Helmreich (505)
Machine Translation (MT) Component technologies Comparable technologies Composed technologies
MT--Purposes Dissemination (high quality) sublanguages, controlled languages Assimilation (broad coverage) Communication (speed)
MT -- Types Direct – string-for-string Transfer – structure-for-structure Interlingual – to and from a meaning representation Statistical – most probable translation given a corpus
Component technologies -- I Character encoding and representation, text editing (Unicode) Text segmenting (OCR, sandhi?) Morphological analysis Lexical annotation (part of speech tagging, proper name identification, others)
Component technologies -- II Syntactic analyzers (grammars, parsers) Bilingual/multilingual dictionaries Ontologies (WordNet, OntoSem, Cyc)(lexical, linguistic, world-knowledge) Generation systems
Comparable technologies Information Retrieval (IE) (URSA) Information Extraction (IR) (MUC) Text Summarization (DUC) Word Sense Disambiguation (SensEval) Cross-Document Named Entity Identification (Coreference Resolution)
Composed Technologies All of the above (IR/IE/Summarization) multi-lingual multi-modal with attention to human-computer interaction (HCI)
Composed technologies -- II Personal Profiler – searches the web to find information about a particular person, translates it if appropriate, and organizes in temporal order Quick Ramp-up MT (Expedition) – allows a non-linguist language user and a computer expert to construct a simple MT system
Question-Answering Systems Advanced Question and Answering for Intelligence (AQUAINT) MOQA – Meaning-Oriented Question Answering Allows user to pose structured or natural language queries, obtains answer from a variety of sources, and presents the answer appropriately
Summary Choose an appropriate purpose and type Look at related technologies: component, comparable, composed Search for an appropriate research partner