Download presentation
Presentation is loading. Please wait.
Published byDrusilla Simpson Modified over 9 years ago
1
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10
2
Administrivia Homework #4 –due tonight –email by midnight to sandiway@email.arizona.edu
3
Homework #4 Help on Homework Question 1 –(A) (1pt) How many English sentences are explored by the translator before a compatible sentence is found? –Depending on how you run SWI-Prolog, the window may not be large enough to hold all the answers Query –?- sbar(P,S,[]). –P = buy(who,who), –S = [who,does,who,buy] ? ; –P = buy(what,who), –S = [who,does,what,buy] ? ; Following query writes each S into the file output –You can use an editor or wordprocessor program on output to find the appropriate line –?- tell(output), sbar(P,S,[]), write(S), nl, fail ; told. tells Prolog to send writes to file output we’re done writing to the file write sentence S write a newline fail send Prolog back to look for more solutions
4
Last Time Started on a new area: –Morphology and Stemming Morphology –root + morphemes –Inflectional: no change in category, e.g. V -ed V –Derivational: category-changing, e.g. V -able A Stemming –Important not only in sentence parsing –but also for internet-related applications such as information retrieval (IR) Porter Stemmer: Rule-based, no dictionary Efficacy questioned: Harman (1991)
5
Today’s Topics Stemming and Search A brief look at Finite State Transducers (FST) for Morphological Processing
6
Stemming and Search Up until very recently... –Word Variations (Stemming) To provide the most accurate results, Google does not use "stemming" or support "wildcard" searches. In other words, Google searches for exactly the words that you enter in the search box. Searching for "book" or "book*" will not yield "books" or "bookstore". If in doubt, try both forms: "airline" and "airlines," for instance Google didn’t use stemming
7
Stemming and Search Google is more successful than other search engines in part because it returns “better”, i.e. more relevant, information –its algorithm (a trade secret) is called PageRank SEO (Search Engine Optimization) is a topic of considerable commercial interest –Goal: How to get your webpage listed higher by PageRank –Techniques: –e.g. by writing keyword-rich text in your page –e.g. by listing morphological variants of keywords Google does not use stemming everywhere –and it does not reveal its algorithm to prevent people “optimizing” their pages
8
Stemming and Search Search on: diet requirements Notes: can’t use quotes around phrase –blocks stemming Statistics: –“dietary requirements” 117,000 hits –“diet requirements” 5,200 hits 6th-ranked page
9
Stemming and Search Search on: diet requirements Notes: Top-ranked page has words diet, dietary and requirements but not the phrase “diet requirements” BBC - Health - Healthy living - Dietary requirements <meta name="keywords" content="health, cancer, diabetes, cardiovascular disease, osteoporosis, restricted diets, vegetarian, vegan"/>
10
Stemming and Search Search on: diet requirements Notes: 5th-ranked page does not have the word diet in it an academic conference –unlikely to be “optimized” for page hits ranks above 6th-ranked page which does have the exact phrase “diet requirements” in it PATLIB 2002 - Dietary requirements
11
The Mapping Problem To map between a surface form and the decomposition of a word into its components –e.g. root + (person/number/gender) and other features using spelling rules Example: (3.11) Notes: ^ marks a morpheme boundary # is the end-of-word marker Stage 1Stage 2
12
Stage 1: Lexical Intermediate Levels Example: –f o x +N +PL (lexical) –f o x ^s# (intermediate) Key –+N : noun fox is also a verb, but fox +V cannot combine with +PL –+PL : (abstract) plural morpheme realized in English as s (basic case) –boundary markers ^ and # to be used by the spelling rule machine (later)
13
Stage 1: Lexical Intermediate Levels Example: –f o x +N +PL (lexical) –f o x ^s# (intermediate) Idea (Correspondences) –f f –o o –x x –+N ( = empty string) –+PL ^s# use a Finite State Machine with correspondences: –Finite State Transducer (FST)
14
Stage 1: Lexical Intermediate Levels Example: –g o o s e +N +PL (lexical) –g e e s e # (intermediate) Example: –g o o s e +N +SG (lexical) –g o o s e # (intermediate) Example: –m o u s e +N +PL (lexical) –m i c e # (intermediate) Example: –s h e e p +N +PL (lexical) –s h e e p # (intermediate)
15
Stage 1: Lexical Intermediate Levels Figure 3.11 Jurafsky & Martin (2000) Notation: input : output f means f:f
16
Stage 1: Lexical Intermediate Levels Figure 3.17 (top half):
17
The Mapping Problem Example: (3.11) (Context-Sensitive) Spelling Rule: (3.5) – e / { x, s, z } ^ __ s# rewrites to letter e given –left context x^ or s^ or z^, and –right context s# i.e. insert e after the ^ –when you see x^s# or s^s# or z^s# in particular, we have x^s# x^es#
18
Stage 2: Intermediate Surface Levels Also can be implemented using a FST –we can implement FST in Prolog
19
Stage 2: Intermediate Surface Levels Example (3.17)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.