Presentation is loading. Please wait.

Presentation is loading. Please wait.

ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.

Similar presentations


Presentation on theme: "ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine."— Presentation transcript:

1 ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine

2 ANLE 2 Goal of this class We’ll go in more detail over the assignment SW that may be used.

3 ANLE 3 The system you have to implement Input: A string of words (possibly a complete sentence) LIST THE ESTATE AGENTS IN STRATFORD, LONDON. I AM LOOKING FOR A CAR MECHANIC IN WIVENHOE Minimum Output: a query for a Web search engine (“ESTATE AGENT” OR PROPERTY OR “REAL ESTATE”) AND STRATFORD AND LONDON Possible extension (10%): Actually access search engine E.g., GOOGLE: http://www.google.com/search?q=stratford+london+%22estate+agent %22+OR+%22real+estate%22+OR+property

4 ANLE 4 Reminder: the basic pipeline in IE systems PREPROCESSING LEXICAL PROCESSING SYNTACTIC PROCESSING SEMANTIC PROCESSING DISCOURSE PROCESSING

5 ANLE 5 The pipeline for a query expansion system PREPROCESSING LEXICAL PROCESSING SYNTACTIC PROCESSING SEMANTIC PROCESSING WEB ACCESS List the estate agents in Stratford, London. TOKENIZATION POS TAGGING TERM IDENTIFICATION STOP WORDS SYNONYMS

6 ANLE 6 Processing Steps, II Preprocessing: Possibly: eliminate stop words LIST THE ESTATE AGENTS IN STRATFORD LONDON Possibly: XML markup

7 ANLE 7 Preprocessing, I: tokenizing List the estate agents in Stratford, London PARAGRAPH MARKUP; TOKENIZER List the estate agents in Stratford, London

8 ANLE 8 Processing Steps, II LEXICAL PROCESSING: POS TAGGING THE -> THE/DT; ESTATE -> ESTATE/NN STEMMING / LEMMATIZATION AGENTS -> AGENT (or even: AGENT + N +PL)

9 ANLE 9 Lexical Processing, I: POS tagging List the estate agents in Stratford, London

10 ANLE 10 Lexical Processing, II: lemmatizing / stemming List the estate agent in Stratford, London

11 ANLE 11 Processing Steps, II SYNTACTIC PROCESSING: Identify terms: “ESTATE AGENT” Remove stopwords (e.g., words tagged as DT, IN, VB, … )

12 ANLE 12 Practical (partial) parsing: identifying search terms, filtering estate agent Stratford, London

13 ANLE 13 Processing Steps, II SEMANTIC PROCESSING: “ESTATE AGENT” OR PROPERTY QUERY FORMATION: Abstract query Concrete query

14 ANLE 14 Semantic processing: finding synonyms, (or better keywords); interpreting stop words. estate agent real estate Stratford, London

15 ANLE 15 Available tools: LINUX: Overall system control: Shell scripts, Perl, Java Tokenizing: Java / Perl + Regular Expressions POS: Brill tagger, QTAG Lexical Expansion: WordNet (Java interface, command line) WINDOWS: Overall system control: Java, Batch files, Perl Tokenizing: Java / Perl + Regular expressions Tokenizing, POS tagging: Connexor (Tokenizer, POS + Lemmatizer) POS: QTAG WordNet: Use Java interface

16 ANLE 16 Marking Scheme Engineering a complete system that takes input, produces output, and calls the appropriate modules 20% Pre-processing (tokenizing, normalization)15% Part-of-speech tagging15% Removing stopwords15% Lexical expansion using WordNet15% Report10% Calling a search engine10% Total100%

17 ANLE 17 Optionals Write a simple Web page interface to your search engine Write your own lexical resource (see following classes)

18 ANLE 18 Deadline Friday, December 16 th, 12:00


Download ppt "ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine."

Similar presentations


Ads by Google