Download presentation
Presentation is loading. Please wait.
1
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine
2
ANLE 2 Goal of this class We’ll go in more detail over the assignment SW that may be used.
3
ANLE 3 The system you have to implement Input: A string of words (possibly a complete sentence) LIST THE ESTATE AGENTS IN STRATFORD, LONDON. I AM LOOKING FOR A CAR MECHANIC IN WIVENHOE Minimum Output: a query for a Web search engine (“ESTATE AGENT” OR PROPERTY OR “REAL ESTATE”) AND STRATFORD AND LONDON Possible extension (10%): Actually access search engine E.g., GOOGLE: http://www.google.com/search?q=stratford+london+%22estate+agent %22+OR+%22real+estate%22+OR+property
4
ANLE 4 Reminder: the basic pipeline in IE systems PREPROCESSING LEXICAL PROCESSING SYNTACTIC PROCESSING SEMANTIC PROCESSING DISCOURSE PROCESSING
5
ANLE 5 The pipeline for a query expansion system PREPROCESSING LEXICAL PROCESSING SYNTACTIC PROCESSING SEMANTIC PROCESSING WEB ACCESS List the estate agents in Stratford, London. TOKENIZATION POS TAGGING TERM IDENTIFICATION STOP WORDS SYNONYMS
6
ANLE 6 Processing Steps, II Preprocessing: Possibly: eliminate stop words LIST THE ESTATE AGENTS IN STRATFORD LONDON Possibly: XML markup
7
ANLE 7 Preprocessing, I: tokenizing List the estate agents in Stratford, London PARAGRAPH MARKUP; TOKENIZER List the estate agents in Stratford, London
8
ANLE 8 Processing Steps, II LEXICAL PROCESSING: POS TAGGING THE -> THE/DT; ESTATE -> ESTATE/NN STEMMING / LEMMATIZATION AGENTS -> AGENT (or even: AGENT + N +PL)
9
ANLE 9 Lexical Processing, I: POS tagging List the estate agents in Stratford, London
10
ANLE 10 Lexical Processing, II: lemmatizing / stemming List the estate agent in Stratford, London
11
ANLE 11 Processing Steps, II SYNTACTIC PROCESSING: Identify terms: “ESTATE AGENT” Remove stopwords (e.g., words tagged as DT, IN, VB, … )
12
ANLE 12 Practical (partial) parsing: identifying search terms, filtering estate agent Stratford, London
13
ANLE 13 Processing Steps, II SEMANTIC PROCESSING: “ESTATE AGENT” OR PROPERTY QUERY FORMATION: Abstract query Concrete query
14
ANLE 14 Semantic processing: finding synonyms, (or better keywords); interpreting stop words. estate agent real estate Stratford, London
15
ANLE 15 Available tools: LINUX: Overall system control: Shell scripts, Perl, Java Tokenizing: Java / Perl + Regular Expressions POS: Brill tagger, QTAG Lexical Expansion: WordNet (Java interface, command line) WINDOWS: Overall system control: Java, Batch files, Perl Tokenizing: Java / Perl + Regular expressions Tokenizing, POS tagging: Connexor (Tokenizer, POS + Lemmatizer) POS: QTAG WordNet: Use Java interface
16
ANLE 16 Marking Scheme Engineering a complete system that takes input, produces output, and calls the appropriate modules 20% Pre-processing (tokenizing, normalization)15% Part-of-speech tagging15% Removing stopwords15% Lexical expansion using WordNet15% Report10% Calling a search engine10% Total100%
17
ANLE 17 Optionals Write a simple Web page interface to your search engine Write your own lexical resource (see following classes)
18
ANLE 18 Deadline Friday, December 16 th, 12:00
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.