LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Advanced Google Becoming a Power Googler. (c) Thomas T. Kaun 2005 How Google Works PageRank: The number of pages link to any given page. “Importance”
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Intelligent Information Retrieval CS 336 –Lecture 3: Text Operations Xiaoyan Li Spring 2006.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/22.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LING 388: Language and Computers Sandiway Fong Lecture 28: 12/6.
LING 364: Introduction to Formal Semantics Lecture 4 January 24th.
CSCI 5832 Natural Language Processing Lecture 5 Jim Martin.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 16: 10/23.
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
Morphological analysis
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 5: 9/5.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 14: 10/12.
Accessing Sources of Evidence For Practice How to search Karen Smith Department of Health Sciences University of York.
LING 364: Introduction to Formal Semantics Lecture 5 January 26th.
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
LING 388: Language and Computers Sandiway Fong Lecture 13: 10/10.
LING 388: Language and Computers Sandiway Fong 10/4 Lecture 12.
Introduction to English Morphology Finite State Transducers
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong. Administrivia Grading – Midterm grading not finished yet – Homework 3 graded Reminder – Next Monday:
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Lesson 12 — The Internet and Research
Extracting Lexical Features Development of software tools for a search engine 1. convert an arbitrary pile of textual objects into a well-defined corpus.
LING 388: Language and Computers Sandiway Fong Lecture 17.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
LING 388: Language and Computers Sandiway Fong Lecture 4.
LING 388: Language and Computers Sandiway Fong Lecture 7.
Effective Searching Techniques Getting the most from Electronic Information Resources Ibrar Muahammad Chief Librarian Tahir Jan Assistant Librarian University.
LING 388: Language and Computers Sandiway Fong Lecture 30 12/8.
Finite State Transducers for Morphological Parsing
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
NATURAL LANGUAGE PROCESSING
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Speech and Language Processing
Programming Languages Translator
Basic Parsing with Context Free Grammars Chapter 13
Prepared by Rao Umar Anwar For Detail information Visit my blog:
LING/C SC/PSYC 438/538 Lecture 26 Sandiway Fong.
Speech and Language Processing
CSCI 5832 Natural Language Processing
Natural Language - General
Morphological Parsing
CSCI 5832 Natural Language Processing
Presentation transcript:

LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Administrivia Homework #4 –due tonight – by midnight to

Homework #4 Help on Homework Question 1 –(A) (1pt) How many English sentences are explored by the translator before a compatible sentence is found? –Depending on how you run SWI-Prolog, the window may not be large enough to hold all the answers Query –?- sbar(P,S,[]). –P = buy(who,who), –S = [who,does,who,buy] ? ; –P = buy(what,who), –S = [who,does,what,buy] ? ; Following query writes each S into the file output –You can use an editor or wordprocessor program on output to find the appropriate line –?- tell(output), sbar(P,S,[]), write(S), nl, fail ; told. tells Prolog to send writes to file output we’re done writing to the file write sentence S write a newline fail send Prolog back to look for more solutions

Last Time Started on a new area: –Morphology and Stemming Morphology –root + morphemes –Inflectional: no change in category, e.g. V -ed  V –Derivational: category-changing, e.g. V -able  A Stemming –Important not only in sentence parsing –but also for internet-related applications such as information retrieval (IR) Porter Stemmer: Rule-based, no dictionary Efficacy questioned: Harman (1991)

Today’s Topics Stemming and Search A brief look at Finite State Transducers (FST) for Morphological Processing

Stemming and Search Up until very recently... –Word Variations (Stemming) To provide the most accurate results, Google does not use "stemming" or support "wildcard" searches. In other words, Google searches for exactly the words that you enter in the search box. Searching for "book" or "book*" will not yield "books" or "bookstore". If in doubt, try both forms: "airline" and "airlines," for instance Google didn’t use stemming

Stemming and Search Google is more successful than other search engines in part because it returns “better”, i.e. more relevant, information –its algorithm (a trade secret) is called PageRank SEO (Search Engine Optimization) is a topic of considerable commercial interest –Goal: How to get your webpage listed higher by PageRank –Techniques: –e.g. by writing keyword-rich text in your page –e.g. by listing morphological variants of keywords Google does not use stemming everywhere –and it does not reveal its algorithm to prevent people “optimizing” their pages

Stemming and Search Search on: diet requirements Notes: can’t use quotes around phrase –blocks stemming Statistics: –“dietary requirements” 117,000 hits –“diet requirements” 5,200 hits 6th-ranked page

Stemming and Search Search on: diet requirements Notes: Top-ranked page has words diet, dietary and requirements but not the phrase “diet requirements” BBC - Health - Healthy living - Dietary requirements <meta name="keywords" content="health, cancer, diabetes, cardiovascular disease, osteoporosis, restricted diets, vegetarian, vegan"/>

Stemming and Search Search on: diet requirements Notes: 5th-ranked page does not have the word diet in it an academic conference –unlikely to be “optimized” for page hits ranks above 6th-ranked page which does have the exact phrase “diet requirements” in it PATLIB Dietary requirements

The Mapping Problem To map between a surface form and the decomposition of a word into its components –e.g. root +  (person/number/gender) and other features using spelling rules Example: (3.11) Notes: ^ marks a morpheme boundary # is the end-of-word marker Stage 1Stage 2

Stage 1: Lexical  Intermediate Levels Example: –f o x +N +PL (lexical) –f o x ^s# (intermediate) Key –+N : noun fox is also a verb, but fox +V cannot combine with +PL –+PL : (abstract) plural morpheme realized in English as s (basic case) –boundary markers ^ and # to be used by the spelling rule machine (later)

Stage 1: Lexical  Intermediate Levels Example: –f o x +N +PL (lexical) –f o x ^s# (intermediate) Idea (Correspondences) –f  f –o  o –x  x –+N   (  = empty string) –+PL  ^s# use a Finite State Machine with correspondences: –Finite State Transducer (FST)

Stage 1: Lexical  Intermediate Levels Example: –g o o s e +N +PL (lexical) –g e e s e # (intermediate) Example: –g o o s e +N +SG (lexical) –g o o s e # (intermediate) Example: –m o u s e +N +PL (lexical) –m i  c e # (intermediate) Example: –s h e e p +N +PL (lexical) –s h e e p # (intermediate)

Stage 1: Lexical  Intermediate Levels Figure 3.11 Jurafsky & Martin (2000) Notation: input : output f means f:f

Stage 1: Lexical  Intermediate Levels Figure 3.17 (top half):

The Mapping Problem Example: (3.11) (Context-Sensitive) Spelling Rule: (3.5) –   e / { x, s, z } ^ __ s#  rewrites to letter e given –left context x^ or s^ or z^, and –right context s# i.e. insert e after the ^ –when you see x^s# or s^s# or z^s# in particular, we have x^s#  x^es#

Stage 2: Intermediate  Surface Levels Also can be implemented using a FST –we can implement FST in Prolog

Stage 2: Intermediate  Surface Levels Example (3.17)