The Linguist’s Search Engine 02/04/2004. Background Address: Address:

Slides:



Advertisements
Similar presentations
Managing References : Mendeley
Advertisements

ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
Search Engines and Information Retrieval
Predicting Cloze Task Quality for Vocabulary Training Adam Skory, Maxine Eskenazi Language Technologies Institute Carnegie Mellon University
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
By Intellext Presented By: Neha Bhatt. What is Watson? Watson is an information access assistant that automatically retrieves useful information in the.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Live Meeting APIs Robert Devine Program Manager Microsoft Corporation.
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
WIKI IN EDUCATION Giti Javidi. W HAT IS WIKI ? A Wiki can be thought of as a combination of a Web site and a Word document. At its simplest, it can be.
1 Lesson 29 Web Content Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Academic Research to Support Arguments.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
1 Corpora: Annotating and Searching LING 5200 Computational Corpus Linguistics Martha Palmer.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
1999 Asian Women's Network Training Workshop Tools for Searching Information on the Web  Search Engines  Meta-searchers  Information Gateways  Subject.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
The WWW as a Database: WWW Query Languages Curtis Dyreson James Cook University ( Townsville, Australia ) Aalborg University.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
A Language Independent Method for Question Classification COLING 2004.
Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Search Tools and Search Engines Searching for Information and common found internet file types.
1 Wichtige Aspekte des eLearning Hermann MAURER Technische Universität Graz Präsentation für die Universität Graz
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Supertagging CMSC Natural Language Processing January 31, 2006.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones School of Computer Science Carnegie Mellon University.
+ Publishing Your First Post USING WORDPRESS. + A CMS (content management system) is an application that allows you to publish, edit, modify, organize,
A Linguist’s Search Engine Philip Resnik University of Maryland JHU Conference on Spatial Language and Spatial Cognition September 18, 2003.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
Information Retrieval in Practice
EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA
Exploring Microsoft Office PowerPoint 2000 Chapter 2
BIS 221 MENTOR Lessons in Excellence- -bis221mentor.com.
BIS 221 Education for Service-- tutorialrank.com
Information Retrieval on the World Wide Web
LING/C SC 581: Advanced Computational Linguistics
Automatic Detection of Causal Relations for Question Answering
CS224N Section 3: Corpora, etc.
Presentation transcript:

The Linguist’s Search Engine 02/04/2004

Background Address: Address: Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). Accessible to a general audience since 20 January 2004 (brand new!) Accessible to a general audience since 20 January 2004 (brand new!) No fees or complicated registration process No fees or complicated registration process

Some Facts – Built-in Corpus Preprocessed corpus of about three million sentences taken from the Internet Archive Preprocessed corpus of about three million sentences taken from the Internet Archive Automatically annotated in Penn Treebank style syntactic bracketing Automatically annotated in Penn Treebank style syntactic bracketing Relies on computational linguistic tools (such as MXTERMINATOR, MXPOST, Charniak’s stochastic parser, the Minipar Parser, Wordnet, etc.) Relies on computational linguistic tools (such as MXTERMINATOR, MXPOST, Charniak’s stochastic parser, the Minipar Parser, Wordnet, etc.)

Searching the built-in corpus Nice features: Nice features: –Query by example –Limited regular expressions support (e.g. disjunction, negation) –Wordnet relations are supported –Save queries for later reuse –Offensive content filter (for less embarrassing live demonstrations) Problems: Problems: –Only English is supported (without even once mentioning this fact anywhere in the documentation!)

Demo – Simple Search Simple search of the built-in corpus Simple search of the built-in corpus –Query by example Search for of-genitive constructions Search for of-genitive constructions –Query by hand Search for ‘s-genitives where the possessor is not a proper name (i.e. NNP / NNPS) Search for ‘s-genitives where the possessor is not a proper name (i.e. NNP / NNPS) Searching for synonyms of fearsome: fearsome#a#1/syns Searching for synonyms of fearsome: fearsome#a#1/syns GO TO THE LSE GO TO THE LSE

Some Facts – Customized Corpora You can build your own collection of sentences and have them annotated You can build your own collection of sentences and have them annotated Uses AltaVista as a basis for web-wide search (about pages) Uses AltaVista as a basis for web-wide search (about pages) Extracts sentences from retrieved pages and annotates them Extracts sentences from retrieved pages and annotates them Job-based with fair scheduling procedures Job-based with fair scheduling procedures Query syntax restricted to AltaVista queries plus expansion of inflectional forms Query syntax restricted to AltaVista queries plus expansion of inflectional forms

Demo – Customized Collection Demo search on a collection of sentences with the verb give Demo search on a collection of sentences with the verb give How to start a new collection How to start a new collection GO TO THE LSE GO TO THE LSE

Further Information LSE Starter’s Guide: lse.umiacs.umd.edu/lse_guide.html LSE Starter’s Guide: lse.umiacs.umd.edu/lse_guide.htmllse.umiacs.umd.edu/lse_guide.html LSE User’s Guide: lse.umiacs.umd.edu/lseuser/lseuser.pdf LSE User’s Guide: lse.umiacs.umd.edu/lseuser/lseuser.pdf lse.umiacs.umd.edu/lseuser/lseuser.pdf LSE Users’ Forum: lse.umiacs.umd.edu/forum LSE Users’ Forum: lse.umiacs.umd.edu/forumlse.umiacs.umd.edu/forum AltaVista Documentation: AltaVista Documentation: Penn Tagset: Penn Tagset: Still ugly but flexible alternative: Still ugly but flexible alternative: