The contribution of NLP Corpus processing Ontologies and terminologies

Slides:



Advertisements
Similar presentations
Corpus Linguistics Richard Xiao
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Development of a German- English Translator Felix Zhang.
Corpus Processing and NLP
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Using Link Grammar and WordNet on Fact Extraction for the Travel Domain.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
1 Words and the Lexicon September 10th 2009 Lecture #3.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Introduction to Computational Linguistics Lecture 2.
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
Stemming, tagging and chunking Text analysis short of parsing.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Natural Language Processing AI - Weeks 19 & 20 Natural Language Processing Lee McCluskey, room 2/07
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Course G Web Search Engines 3/9/2011 Wei Xu
1 Statistical NLP: Lecture 6 Corpus-Based Work. 2 4 Text Corpora are usually big. They also need to be representative samples of the population of interest.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
1 CPE 641 Natural Language Processing Lecture 2: Levels of Linguistic Analysis, Tokenization & Part- of-speech Tagging Asst. Prof. Dr. Nuttanart Facundes.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Survey of Semantic Annotation Platforms
1 Corpus-Based Work Chapter 4 Foundations of statistical natural language processing.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Ontology-Based Information Extraction: Current Approaches.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Language Learning Targets based on CLIMB standards.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Introduction to CL & NLP CMSC April 1, 2003.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Ontologies and Terminology and how they relate to lexicography Adam Kilgarriff Auckland 20121Kilgarriff: Ontologies and Terminology.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Natural Language Processing Chapter 1 : Introduction.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Welcome to Stanah School
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Foundations of Statistical NLP Chapter 4. Corpus-Based Work 박 태 원박 태 원.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Chunk Parsing CS1573: AI Application Development, Spring 2003
Presentation transcript:

The contribution of NLP Corpus processing Ontologies and terminologies Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies What is NLP? Natural Language Processing natural language vs. computer languages Other names Computational Linguistics emphasizes scientific not technological Language Engineering Language Technology Kivik 2013 NLP. Corpus processing, Ontologies

NLP and linguistics L I N N L G P supply ideas interpret results test theories expose gaps plus turn into technology Kivik 2013 NLP. Corpus processing, Ontologies

Example: regular morphology LINGUISTICS: Rules: stems -> inflected forms NLP: program the rules apply rules to a lexicon of stems Is the output correct? Errors? refine the theory Needed for: web search, spell-checkers, machine translation, speech recognition systems etc. Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Applications web search Basic search Filtering results spelling and grammar checking machine translation (MT) talk to computers speech processing as well information extraction finding facts in a database of documents answering questions Kivik 2013 NLP. Corpus processing, Ontologies

How can NLP make better dictionaries? By pre-processing a corpus: tokenization sentence splitting lemmatization POS-tagging parsing Each step builds on predecessors Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Tokenization “identifying the words” from: he didn't arrive. to: He did n’t arrive . Kivik 2013 NLP. Corpus processing, Ontologies

Automatic tokenization Western writing systems easy! space is separator Chinese, Japanese, some other writing systems do not use word-separator hard like POS-tagging (below) Kivik 2013 NLP. Corpus processing, Ontologies

Why isn't space=separator enough (even for English)? what is a space linebreaks, paragraph breaks, tabs Punctuation characters do not form parts of words but may be attached to words (with no spaces)‏ brackets, quotation marks Hyphenation is co-op one word or two? is well-managed? Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Sentence splitting to: <s> He did n’t arrive . </s> “identifying the sentences” from: he didn't arrive. to: He did n’t arrive . Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Lemmatization Mapping from text-word to lemma help (verb)‏ text-word to lemma help help (v)‏ helps help (v)‏ helping help (v)‏ helped help (v) . Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Lemmatization Mapping from text-word to lemma help (verb) help (noun), helping (noun)‏ text-word to lemma help help (v), help (n)‏ helps help (v), helps (n)** helping help (v), helping (n)‏ helped help (v) helpings helping (n)‏ **help (n): usually a mass noun, but part of compound home help which is a count noun, taking the "s" ending. . Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Lemmatization Dictionary entries are for lemmas Match between text-word and dictionary-word lemmatization Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Lemmatization Searching by lemma English: little inflection French: 36 forms per verb Finno-Ugric: 2000. Not always wanted: English royalty singular: kings and queens plural royalties: payments to authors Kivik 2013 NLP. Corpus processing, Ontologies

Automatic lemmatization Write rules: if word ends in "ing", delete "ing"; if the remainder is verb lemma, add to list of possible lemmas If detailed grammar available, use it full lemma list is also required Often available from dictionary companies Kivik 2013 NLP. Corpus processing, Ontologies

Part-of-speech (POS) tagging He PNP pers pronoun did VVD past tense verb n’t XNOT not arrive VV base form of verb . C punctuation </s> “identifying parts of speech” from: he didn't arrive. to: … . Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Tagsets The set of part-of-speech tags to choose between Basic: noun, verb, pronoun … Advanced: examples - CLAWS English tagset NN2 plural noun VVG -ing form of lexical verb Based on linguistics of the language. Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies POS-tagging: why? Use grammar when searching Nouns modified by buckle Verbs that buckle is object of Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies POS-tagging: how? Big topic for computational linguistics well understood taggers available for major languages Some taggers use lemmatized input, others do not Methods constraint-based: set of rules of the form if previous word is "the" and VERB is one of the possibilities, delete VERB Statistical: Machine learning from tagged corpus Various methods Ref: Manning and Schutze, Foundations of Statistical Natural Language Processing, MIT Press 1999. Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Parsing Find the structure: Phrase structure (trees)‏ The cat sat on the mat Dependency structure (links)‏ The cat sat on the mat Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Automatic parsing Big topic see Jurafsky and Martin or other NLP textbook Many methods too slow for large corpora Sketch Engine usually uses “shallow parsing” Patterns of POS-tags Regular expressions Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Summary What is NLP? How can it help? Tokenizing Sentence splitting Lemmatizing POS-tagging Parsing Kivik 2013 NLP. Corpus processing, Ontologies

Ontologies and Terminology and how they relate to lexicography Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Terminology Contains terms for the objects and concepts in a domain organized according to relations between objects Different language Same objects, so Same organization Different terms Kivik 2013 NLP. Corpus processing, Ontologies

Ontology Artificial Intelligence Like terminology with reasoning Tweety is-a swallow A swallow is-a bird Birds fly Inference----------------------- Tweety flies the rationalist dream of automated reasoning Bird flies swallow robin … Tweety Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Ontology Chris is-a dentist Chris has-practice in Lancing Chris works 9am-3pm Mon-Fri … You live-near Lancing You want-to-visit dentist You are-available … Inference--------------------------------------------------------- Appointment, you, Chris, Lancing, 10am, Thursday Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Items in an ontology Defined by relations in ontology Labelled (only) by words/phrases in various languages X1 EN: bird FR: oiseau X2 EN: swallow FR: hirondelle … Ontology/things: language independent Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Mismatches and gaps Y1 EN: body parts SP: … Y2 SP: dedo Y5 EN: arm SP: bras Y3 EN: finger Y4 EN: toe Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Thesaurus (eg Roget)‏ Looks like a simple ontology hierarchy only supports inference? usually fudged Language independent? Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies WordNet Princeton Univ project, from ca 1990 Thesaurus Synonym sets or synsets Hyponyms/hyperonyms, antonyms, part-of, other lexical relations Free, online and available for download Very widely used Replicated for many languages, Global WN Assn Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Lexicon/dictionary About words Organized by words Language specific Kivik 2013 NLP. Corpus processing, Ontologies

Rationalists Empiricists Structure Depth Logic Semantic Web Terminology Data Breadth Statistics Google Lexicography Kivik 2013 NLP. Corpus processing, Ontologies

Terminology Lexicography What is the thing called in languages x, y, z What kind of thing is it Is-a link Its place in ontology Well-structured hierarchy How does the word behave? what does it denote? Where does it occur? Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Synthesis Thesis Ontology, terminology, taxonomical lexicography Semantic web, Roget, WordNets Antithesis Corpus lexicography Synthesis: integrating language-independent structure language-specific word/phrase behaviour Corpus-based terminology FrameNet Kivik 2013 NLP. Corpus processing, Ontologies

NLP. Corpus processing, Ontologies Summary words Lexicon Thesaurus/Terminology Ontology things Kivik 2013 NLP. Corpus processing, Ontologies