2009.05.04 - SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.

Slides:



Advertisements
Similar presentations
ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
Advertisements

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Intelligent Information Retrieval CS 336 –Lecture 2: Query Language Xiaoyan Li Spring 2006 Modified from Lisa Ballesteros’s slides.
Evaluation of NLP Systems Martin Hassel KTH NADA Royal Institute of Technology Stockholm
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
7/16/2002JCDL 2002, Ray Larson The “Entry Vocabulary Index” Approach to Multilingual Search Ray R. Larson, Fredric Gey, Aitao Chen, Michael Buckland University.
SLIDE 1IS 202 – FALL 2002 Lecture 20: Evaluation Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 240 – Spring 2006 Prof. Ray Larson University of California, Berkeley School of Information Management & Systems Tuesday and Thursday.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
ISP 433/633 Week 9 NLP in IR. Natural Language Processing Simple Definition: –A study of how to use computers to do things with human languages. What.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval Lecture.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
SLIDE 1IS 240 – Spring 2006 Prof. Ray Larson University of California, Berkeley School of Information Management & Systems Tuesday and Thursday.
SLIDE 1IS 202 – FALL 2002 Lecture 20: Lexical Relations & WordNet Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday.
- SLAYT 1BBY220 Content Analysis & Stemming Yaşar Tonta Hacettepe Üniversitesi yunus.hacettepe.edu.tr/~tonta/ BBY220 Bilgi Erişim.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval Lecture.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Indexing Overview Approaches to indexing Automatic indexing Information extraction.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Statistical NLP: Lecture 6 Corpus-Based Work. 2 4 Text Corpora are usually big. They also need to be representative samples of the population of interest.
Artificial intelligence & natural language processing Mark Sanderson Porto, 2000.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
SLIDE 1IS 240 – Spring 2013 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval Lecture.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
SLIDE 1IS 240 – Spring 2013 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Data Mining: Text Mining
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Using Semantic Relations to Improve Information Retrieval
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Information Retrieval and Web Search
Information Retrieval and Web Search
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Finding Out About I (Belew)
Thanks to Bill Arms, Marti Hearst
Question Answering via Question-to-Question Mapping
CSE 635 Multimedia Information Retrieval
CS246: Information Retrieval
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval and Web Design
Presentation transcript:

SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval Lecture 23: More NLP and IE

SLIDE 2IS 240 – Spring 2009 Today Review –NLP for IR –Text Summarization Cross-Language Information Retrieval –Introduction –Cross-Language EVIs Credit for some of the material in this lecture goes to Doug Oard (University of Maryland) and to Fredric Gey and Aitao Chen

SLIDE 3IS 240 – Spring 2009 Today Review –NLP for IR More on NLP and Information Extraction –From Christopher Manning (Stanford) “Opportunities in Natural Language Processing”

SLIDE 4IS 240 – Spring 2009 Natural Language Processing and IR The main approach in applying NLP to IR has been to attempt to address –Phrase usage vs individual terms –Search expansion using related terms/concepts –Attempts to automatically exploit or assign controlled vocabularies

SLIDE 5IS 240 – Spring 2009 NLP and IR Much early research showed that (at least in the restricted test databases tested) –Indexing documents by individual terms corresponding to words and word stems produces retrieval results at least as good as when indexes use controlled vocabularies (whether applied manually or automatically) –Constructing phrases or “pre-coordinated” terms provides only marginal and inconsistent improvements

SLIDE 6IS 240 – Spring 2009 NLP and IR Not clear why intuitively plausible improvements to document representation have had little effect on retrieval results when compared to statistical methods –E.g. Use of syntactic role relations between terms has shown no improvement in performance over “bag of words” approaches

SLIDE 7IS 240 – Spring 2009 General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Semantic Analysis Context processing Interpretation John runs. John run+s. P-N V 3-pre N plu S NP P-N John VP V run Pred: RUN Agent:John John is a student. He runs. Slide from Prof. J. Tsujii, Univ of Tokyo and Univ of Manchester

SLIDE 8IS 240 – Spring 2009 Using NLP Strzalkowski (in Reader) TextNLPrepres Dbase search TAGGER NLP: PARSERTERMS

SLIDE 9IS 240 – Spring 2009 Using NLP INPUT SENTENCE The former Soviet President has been a local hero ever since a Russian tank invaded Wisconsin. TAGGED SENTENCE The/dt former/jj Soviet/jj President/nn has/vbz been/vbn a/dt local/jj hero/nn ever/rb since/in a/dt Russian/jj tank/nn invaded/vbd Wisconsin/np./per

SLIDE 10IS 240 – Spring 2009 Using NLP TAGGED & STEMMED SENTENCE the/dt former/jj soviet/jj president/nn have/vbz be/vbn a/dt local/jj hero/nn ever/rb since/in a/dt russian/jj tank/nn invade/vbd wisconsin/np./per

SLIDE 11IS 240 – Spring 2009 Using NLP PARSED SENTENCE [assert [[perf [have]][[verb[BE]] [subject [np[n PRESIDENT][t_pos THE] [adj[FORMER]][adj[SOVIET]]]] [adv EVER] [sub_ord[SINCE [[verb[INVADE]] [subject [np [n TANK][t_pos A] [adj [RUSSIAN]]]] [object [np [name [WISCONSIN]]]]]]]]]

SLIDE 12IS 240 – Spring 2009 Using NLP EXTRACTED TERMS & WEIGHTS President soviet President+soviet president+former Hero hero+local Invade tank Tank+invade tank+russian Russian wisconsin

SLIDE 13IS 240 – Spring 2009 NLP & IR Indexing –Use of NLP methods to identify phrases Test weighting schemes for phrases –Use of more sophisticated morphological analysis Searching –Use of two-stage retrieval Statistical retrieval Followed by more sophisticated NLP filtering

SLIDE 14IS 240 – Spring 2009 NLP & IR New “Question Answering” track at TREC has been exploring these areas –Usually statistical methods are used to retrieve candidate documents –NLP techniques are used to extract the likely answers from the text of the documents

SLIDE 15IS 240 – Spring 2009 Mark’s idle speculation What people think is going on always Keywords NLP From Mark Sanderson, University of Sheffield

SLIDE 16IS 240 – Spring 2009 Mark’s idle speculation What’s usually actually going on Keywords NLP From Mark Sanderson, University of Sheffield

SLIDE 17IS 240 – Spring 2009 Additional Slides on NLP and IE From Christopher Manning, Stanford