Language Technology 2005/06 Hans Uszkoreit Universität des Saarlandes

Slides:



Advertisements
Similar presentations
Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Chapter 5: Introduction to Information Retrieval
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.
Overview of Search Engines
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Introduction to NLP.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
1 Computational Linguistics Ling 200 Spring 2006.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
Introduction to CL & NLP CMSC April 1, 2003.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Introduction to Computational Linguistics
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Contact : Bernadette Bouchon-Meunier, Patrick Gallinari, Jean-Gabriel Ganascia LIP6, UPMC, 8 rue du Capitaine Scott, Paris, France
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
INTRODUCTION TO APPLIED LINGUISTICS
School of Computer Science & Engineering
Information Retrieval and Web Search
Information Retrieval and Web Search
Machine Learning in Natural Language Processing
CSE 635 Multimedia Information Retrieval
Lecture 8 Information Retrieval Introduction
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Presentation transcript:

Language Technology 2005/06 Hans Uszkoreit Universität des Saarlandes German Research Center for Artificial Intelligence (DFKI)

Overview What is Language Technology Some Selected Technologies Some Selected Applications Information Extraction Cross-Linguistic Information Retrieval Email Management Language Checking

Motivations linguistics CL engineering cognition

Motivationen modells of grammar linguistics engineering cognition language technology applications models of human language processing

What is a Technology Technology: methods and techniques that together enable some application. In real life usage of the word there is a continuum between methods and applications. method/technique finite state transduction component technology tokenizer technology named entity recognition high precision text indexing application concept based search engine

Types of Technologies Communication partners: humans and machines (technology), humans and humans humans and infostructure Modes and media for input and output: text, speech, pictures, gestures Synchronicity: synchronous vs. asynchronous Situatedness: sensitivity to context, location, time, plans Type of linguality: monolingual, multilingual, translingual Type of processing: Categorization, summarization, extraction, understanding, translating, responding Level of linguistic description: phonology, morphology, syntax, semantics,pragmatics

Language Technologies multimedia & multimodality technologies speech technologies text technologies language technologies knowledge technologies

Language Technologies

Language Technologies Text Technologies

LANGUAGE TECHNOLOGIES Text Technologies Speech Technologies

Language Technologies gathering indexing categorization clustering summarization Text Technologies Speech Technologies

Language Technologies text understanding text translation information extraction report generation Text Technologies Speech Technologies

Language Technologies Voice Recognition Speech Verification Speech Recognition Voice Modelling Speech Synthesis Speaker Identification Language Indentification Text Technologies Speech Technologies

Language Technologies Speech Generation Speech Unterstanding Spoken Dialogue Systems Speech Translation Systems Text Technologies Speech Technologies

language understanding LANGUAGE TECHNOLOGIES Language Technologies language understanding language generation dialogue modelling machine translation Text Technologies Speech Technologies

Speech recognition Spoken language is recognized and transformed in into text as in dictation systems, into commands as in robot control systems, or into some other internal representation.

Speech Synthesis (also Speech Generation) Utterances in spoken language are produced from text (text-to-speech systems) or from internal representations of words or sentences (concept-to-speech systems)

Text Categorization This technology assigns texts to categories. Texts may belong to more than one category, categories may contain other categories. Filtering is a special case of categorization with just two categories.

Text Summarization The most relevant portions of a text are extracted as a summary. The task depends on the needed lengths of the summaries. Summarization is harder if the summary has to be specific to a certain query.

Text Indexing As a precondition for document retrieval, texts are are stored in an indexed database. Usually a text is indexed for all word forms or – after lemmatization – for all lemmas. Sometimes indexing is combined with categorization and summarization.

Text Retrieval Texts are retrieved from a database that best match a given query or document. The candidate documents are ordered with respect to their expected relevance. Indexing, categorization, summarization and retrieval are often subsumed under the term information retrieval.

Information Extraction Relevant information pieces of information are discovered and marked for extraction. The extracted pieces can be: the topic, named entities such as company, place or person names, simple relations such as prices, desti- nations, functions etc. or complex relations describing accidents, company mergers or football matches.

Data Fusion and Text Data Mining Extracted pieces of information from several sources are combined in one database. Previously undetected relationships may be discovered.

Question Answering Question Answering Natural language queries are used to access information in a database. The database may be a base of structured data or a repository of digital texts in which certain parts have been marked as potential answers.

Report Generation A report in natural language is produced that describes the essential contents or changes of a database. The report can contain accumulated numbers, maxima, minima and the most drastic changes.

Spoken Dialogue Systems The system can carry out a dialogue with a human user in which the user can solicit information or conduct purchases, reservations or other transactions.

Translation Technologies Technologies that translate texts or assist human trans- lators. Automatic translation is called machine translation. Translation memories use large amounts of texts together with existing translations for efficient look-up of possible translations for words, phrases and sentences.

Formal and Computational Methods Generic CS Methods Programming languages, algorithms for generic data types, and software engineering methods for structuring and organizing software development and quality assurance. Specialized Algorithms Dedicated algorithms have been designed for parsing, generation and translation, for morphological and syntactic processing with finite state automata/transducers and many other tasks. Nondiscrete Mathematical Methods Statistical techniques have become especially successful in speech processing, information retrieval, and the automatic acquisition of language models. Other methods in this class are neural networks and powerful techniques for optimization and search.

Linguistic Methods and Resources Logical and Linguistic Formalisms For deep linguistic processing, constraint based grammar formalisms are employed. Complex formalisms have been developed for the representation of semantic content and knowledge. Linguistic Knowledge Linguistic knowledge resources for many languages are utilized: dictionaries, morphological and syntactic grammars, rules for semantic interpretation, pronunciation and intonation. Corpora and Corpus Tools Large collections of application-specific or generic collections of spoken and written language are exploited for the acquisition and testing of statistical or rule-based language models.

Methods from Cognitive Science (Psychology) Models of Cognitive Systems and their Components The interaction of perception, knowledge, reasoning and action including communication is modelled in cognitive psychology. Such models can be consulted or employed for the design of language processing systems. Formalized models of components such as memory, reasoning and auditive perception are also often utilized for models of language processing. Empirical methods fromn Experimental Psychology Since cognitive psychology investigates the intelligent behavior of human organisms, many methods have been developed for the observation and empirical analysis of language production and comprehension. Such methods can be extremely useful for building computer models of human language processing (Examples: "Wizard of Oz Experiments" and measurements of syntactic and semantic processing complexity.

State of the Art 95%-98% Correct recognition of word categories (part-of-speech-tagging) recognition of names of people, companies, places, products (named-entity-recognition) statistical recognition of major phrases (HMM chunk parsing) parsing of newspaper texts by statistically trained parsers (probibilistic context free parsing) deep parsing of newspaper texts (HPSG or LFG parsing with large lexicon) 85%-98% 95% 91% 40%-60%

Maturity of Speech Technologies Voice Control Systems Dictation Systems Text-to-Speech Systems Machine Initiative Spoken Dialogue Systems Identification and Verification Systems Spoken Information Access Mixed Initiative Spoken Dialogue Systems Speech Translation Systems Deployed. On the market Mature or close to maturity Research prototypes in R&D

Maturity of Text Technologies Spell Checkers Machine-Assisted Human Translation Translation Memories Indicative Machine Translation Grammar Checkers Information Extraction Human Assisted Machine Translation Report Generation High Quality Text Translation Text Generation Systems Deployed. On the market Mature or close to maturity Research prototypes in R&D

Maturity of IM Technologies Word-Based Information Retrieval Summarization by Simple Condensation Simple Statistical Categorization Simple Automatic Hyperlinking Cross-Lingual Information Retrieval Automatic Hyperlinking With Disambiguation Simple Information Extraction (Unary, Binary Relations) Complex Information Extraction (Ternary+ Relations) Dense Associative Hyperlinking Concept-Based Information Retrieval Text Understanding Deployed. On the market Mature or close to maturity Research prototypes in R&D

MEGATRENDS global infostructure ambient computing collective memory collective knowledge learning organizations meta-knowledge repositories ambient computing ubiquitous computing situated computing pervasive computing disappearing computers ubiquitous access personalization adaptation learning

Vector Space Model Imagine a vector whose length is equal to the number of content words of the language. v= (w1. w2, ..., wn) A document is represented as a vector d= (t1, t2, ..., tn) where ti represents the number of occurences of word wi in the document. a query is represented as a vector as well q= (t1, t2, ..., tn) The distance between vectors is expressed by the cosine value.

Classification Methods knn (k nearest neighbours) simple neural networks hierarchically organized neural network built up from a number of independent self-organizing maps Kohonen type self-organizing maps support vector machines genetic programming naive Bayes classifier hierarchical Bayesian clustering Bayesian network classifier