STEWARD: A Spatio-Textual Document Search Engine for HUDUSER.ORG Prof. Hanan Samet Department of Computer Science, University of Maryland, College Park,

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Project 1 Introduction to HTML.
The user entered the query “What is the historical relation between Greek and Roma”. Here are the query’s results. The user clicked the topic “Roman copies.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 Information Retrieval and Web Search Introduction.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Words & Definitions By: Naftaly Garcia Birruete. Address Bar  The space provided on a web browser that shows the addresses of websites.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Computers, The Internet & The Web Jacie Yang Texas State University.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
What is Web Design?  Web design is the creation of a Web page using hypertext or hypermedia to be viewed on the World Wide Web.
The Internet What is the Internet? The Internet is a global web of computers connected to each other by wires, (mostly phone lines). If you look at a.
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Introduction to the Internet. What is the Internet The Internet is a worldwide group of connected networks that allows public access to information and.
Survey of Semantic Annotation Platforms
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 8.1 Chapter 8 : The Mobile Web Mobile computing.
Chapter 1 Introduction to Data Mining
What is Web Mining? Discovering desired and useful information from the World-Wide Web.
Internet Browsers and Add-ons Popular browsers Browser stats (shown in talk) What a browser does Javascript (shown in talk) * Add-ons * Also see an explanation.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
1 An Overview of Telecommunications Telecommunications: the electronic transmission of signals for communications Telecommunications medium: anything that.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
Kingdom of Saudi Arabia Ministry of Higher Education Al-Imam Muhammad ibn Saud Islamic University College of Computer and Information Sciences Types of.
The INTERNET Worldwide network of computers linked together.
World Wide Web. Browser Use browser to access the web –Internet Explorer (Microsoft) –Firefox (Mozilla) On all PCs Requires internet connection Provides.
UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
World Wide Web Library 150 Week 8. The Web The World Wide Web is one part of the Internet. No one controls the web Diverse kinds of services accessed.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Web Design. What is the Internet? A worldwide collection of computer networks that links millions of computers by – Businesses (.com.net) – the government.
Lecture 2- Internet, Basic Search, Advanced Search COE 201- Computer Proficiency.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
CSCI-235 Micro-Computers in Science The Internet and World Wide Web.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Ziyad Ibrahim 10DD.  What is Internet? What is Internet?  Who owns the internet? Who owns the internet?  How do you connect to the internet? How do.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
The Internet is a Big Collection of Computers and Cables. -"interconnection of computer networks". Millions of personal, business, and governmental.
Invitation to Computer Science 6 th Edition Chapter 10 The Tower of Babel.
The Internet What is the Internet? The Internet is a lot of computers over the whole world connected together so that they can share information. It.
INTERNET VOCAB. WEB BROWSER An app for finding info on the web.
Internet Searching the World Wide Web. The Internet and the World Wide Web The Internet is a worldwide collection of networks that allows people to communicate.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
PromoMap: Promotions on the Map Xiangye Xiao Oct. 12, 2009.
Information Networks. Internet It is a global system of interconnected computer networks that link several billion devices worldwide. It is an international.
Web Page Design The Basics. The Web Page A document (file) created using the HTML scripting language. A document (file) created using the HTML scripting.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Web Page Programming Terms. Chapter 1 Objectives Describe Internet and Understand Key terms Describe World Wide Web and its Key terms Identify types and.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Glencoe Introduction to Multimedia Chapter 2 Multimedia Online 1 Internet A huge network that connects computers all over the world. Show Definition.
How Much Do You Know About the Internet?. What is the Internet? The Internet is the world’s largest computer network, connecting more than 4 million computers.
Objective % Select and utilize tools to design and develop websites.
Exploring Microsoft Word 2000
Chapter 1 Introduction to HTML.
Business Administrative Support Vocabulary
Objective % Select and utilize tools to design and develop websites.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Secondary Data, Databases,
CSE 635 Multimedia Information Retrieval
All About the Internet.
CGS 3066: Web Programming and Design Fall 2019
Presentation transcript:

STEWARD: A Spatio-Textual Document Search Engine for HUDUSER.ORG Prof. Hanan Samet Department of Computer Science, University of Maryland, College Park, MD Joint work of Mike Lieberman and Jagan Sankaranarayanan of UMD and Jon Sperling of HUD PD&R.

STEWARD! Steward is a Document Search Engine, like Google. User specifies a search consisting of a Keyword and a Location Specifier. –“HUD Housing Projects” – Keyword –“El Paso, TX” – Location. It uses a Document Tagger that identifies Geographical locations in English (text, DOC, PDF, HTML ) documents. Alternative: Spatio-Textual Extraction on the Web Aiding Retrieval of Documents

The Document Tagger Uses a huge corpus of geographical locations –2.06 million locations in USA and 1.6 million locations around the world –Gleaned from GNIS Uses data mining and document modeling techniques to disambiguate and correctly identify geographical locations in documents.

Tagging Issues Identify Geographical references in Text –Is “Jefferson” the name of a person or a geographical location? Disambiguation of a geographical reference. –“London” in a document can correspond to “London, UK” or “London, Ontario” or to 2570 other geographical entities in our corpus. Spatial Focus of a document –Is “Singapore” relevant to a news article printed in the Singapore Straits times, about hurricane Katrina?

The STEWARD System Maps provided by Google Maps Search results powered by the SAND database system Available to anyone with an Internet connection and a Web Browser –E.g. Microsoft Internet Explorer, Firefox or Mozilla STEWARD is on the WEB at –

STEWARD as a research tool STEWARD could be used for document retrieval, data exploration and knowledge discovery Potential users –Researchers at HUD –Users of HUDUSER.ORG STEWARD complements the existing search tools at HUDUSER.ORG

Natural Language Cues Research Named-Entity Tagging –Tags text phrases with the type of information they represent, such as “location,” “organization,” or “person” –Improperly trained tagger will produce incorrect entity classifications Part-of-speech Tagging –Tags every word with its part of speech –Locations tend to be tagged as proper nouns –Does not distinguish between locations and peoples’ names Other language-related cues –Addresses and zip codes –City, State combinations

Future Work 1.Hidden Web 2.Incorporation into Other Mapping APIs a)Google Earth b)Microsoft Virtual Earth 3.Full Spatial Query Capabilities à la the SAND Spatial Browser 4.Natural Language Cues 5.Document Meta-language 6.Incorporation of Machine Learning Techniques to Identify Principal Geographic Focus 7.User Interface and Graphics 8.Applications a)Other Federal Agencies b)News Reading c)Common Alerting Protocol (CAP) of USGS for exchanging all-hazard emergency alerts and public warnings

Acknowledgements HUD PD & R Digital Government Program at the NSF University of Maryland

Live Demo