The Linguist’s Search Engine 02/04/2004
Background Address: Address: Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). Accessible to a general audience since 20 January 2004 (brand new!) Accessible to a general audience since 20 January 2004 (brand new!) No fees or complicated registration process No fees or complicated registration process
Some Facts – Built-in Corpus Preprocessed corpus of about three million sentences taken from the Internet Archive Preprocessed corpus of about three million sentences taken from the Internet Archive Automatically annotated in Penn Treebank style syntactic bracketing Automatically annotated in Penn Treebank style syntactic bracketing Relies on computational linguistic tools (such as MXTERMINATOR, MXPOST, Charniak’s stochastic parser, the Minipar Parser, Wordnet, etc.) Relies on computational linguistic tools (such as MXTERMINATOR, MXPOST, Charniak’s stochastic parser, the Minipar Parser, Wordnet, etc.)
Searching the built-in corpus Nice features: Nice features: –Query by example –Limited regular expressions support (e.g. disjunction, negation) –Wordnet relations are supported –Save queries for later reuse –Offensive content filter (for less embarrassing live demonstrations) Problems: Problems: –Only English is supported (without even once mentioning this fact anywhere in the documentation!)
Demo – Simple Search Simple search of the built-in corpus Simple search of the built-in corpus –Query by example Search for of-genitive constructions Search for of-genitive constructions –Query by hand Search for ‘s-genitives where the possessor is not a proper name (i.e. NNP / NNPS) Search for ‘s-genitives where the possessor is not a proper name (i.e. NNP / NNPS) Searching for synonyms of fearsome: fearsome#a#1/syns Searching for synonyms of fearsome: fearsome#a#1/syns GO TO THE LSE GO TO THE LSE
Some Facts – Customized Corpora You can build your own collection of sentences and have them annotated You can build your own collection of sentences and have them annotated Uses AltaVista as a basis for web-wide search (about pages) Uses AltaVista as a basis for web-wide search (about pages) Extracts sentences from retrieved pages and annotates them Extracts sentences from retrieved pages and annotates them Job-based with fair scheduling procedures Job-based with fair scheduling procedures Query syntax restricted to AltaVista queries plus expansion of inflectional forms Query syntax restricted to AltaVista queries plus expansion of inflectional forms
Demo – Customized Collection Demo search on a collection of sentences with the verb give Demo search on a collection of sentences with the verb give How to start a new collection How to start a new collection GO TO THE LSE GO TO THE LSE
Further Information LSE Starter’s Guide: lse.umiacs.umd.edu/lse_guide.html LSE Starter’s Guide: lse.umiacs.umd.edu/lse_guide.htmllse.umiacs.umd.edu/lse_guide.html LSE User’s Guide: lse.umiacs.umd.edu/lseuser/lseuser.pdf LSE User’s Guide: lse.umiacs.umd.edu/lseuser/lseuser.pdf lse.umiacs.umd.edu/lseuser/lseuser.pdf LSE Users’ Forum: lse.umiacs.umd.edu/forum LSE Users’ Forum: lse.umiacs.umd.edu/forumlse.umiacs.umd.edu/forum AltaVista Documentation: AltaVista Documentation: Penn Tagset: Penn Tagset: Still ugly but flexible alternative: Still ugly but flexible alternative: