Design and Implementation of a Geographic Search Engine Alexander Markowetz Yen-Yu Chen Torsten Suel Xiaohui Long Bernhard Seeger.

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Geographic Web Information Retrieval Alexander Markowetz, University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
A Quality Focused Crawler for Health Information Tim Tang.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Search Engines & Search Engine Optimization (SEO) Presentation by Saeed El-Darahali 7 th World Congress on the Management of e-Business.
Information Retrieval in Practice
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Anatomy of a Large-Scale Hypertextual Web Search Engine ECE 7995: Term.
Efficient Search in Large Textual Collections with Redundancy Jiangong Zhang and Torsten Suel Review by Newton Alex
Web Exploration and Search Technology Lab Department of Computer and Information Science Polytechnic University Brooklyn, NY Faculty: Torsten Suel.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
How Search Engines Work Source:
Information Retrieval
Search Engine Optimization (SEO)
Overview of Search Engines
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Adversarial Information Retrieval The Manipulation of Web Content.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
WAES 3308 Numerical Methods for AI
Search Engines & Search Engine Optimization (SEO).
Using Hyperlink structure information for web search.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Querying Structured Text in an XML Database By Xuemei Luo.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Date: 2012/3/5 Source: Marcus Fontouraet. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1 Efficiently encoding term co-occurrences in inverted.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Web Search Algorithms By Matt Richard and Kyle Krueger.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.
Mobile Search Engine Based on idea presented in paper Data mining for personal navigation, Hariharan, G., Fränti, P., Mehta S. (2002)
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Week 1 Introduction to Search Engine Optimization.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Data mining in web applications
Search Engine Optimization
CSC 102 Lecture 12 Nicholas R. Howe
SEARCH ENGINE OPTIMIZATION. P RESENTATION O VERVIEW  Search Engine Basics  What is SEO?  Key Concepts  Why is Search Engine marketing important? 
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
What is a Search Engine EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
The Search Engine Architecture
Presentation transcript:

Design and Implementation of a Geographic Search Engine Alexander Markowetz Yen-Yu Chen Torsten Suel Xiaohui Long Bernhard Seeger

The Internet is so big Most web search returns hundreds of thousands of results Most are not that interesting The interesting ones might be buried inside the iceberg Adding just more terms to the query is probably no solution

Geography is a useful constraint It is one of the two fundamental human conditions: – Space – Time It allows intuitive constraints It reflects our everyday perception of the world

Many of us already search geographically By adding terms with a geographic meaning : – Yoga “ New York ” – Yoga Brooklyn – Yoga “ Park Slope ” – Yoga Queens But this is far from perfect

Problems Multiple queries for the same search task – Many results have to be seen over and over User needs to know the geographic surrounding Many geographic hints are ignored: – Telephone numbers, zip code, etc. – Link structure No concept of continuous space

Applications Location-based services Locally targeted web advertising Mining geographic properties – Market research

Related Work L. Gravano. Geosearch Divine Inc. Northern Light Geosearch. Eventax GmbH. Yahoo Local Search Google Local Search K. McCurley. “Geo Coding” Ding, Gravano, Shivakumar. “Geo Scope” Raber Information Management GmbH Open GIS Consortium Daviel.

Our Contributions Actual implementation of large-scale geographic web search Combining known and new techniques for deriving geographic data from the web Efficient query execution in large geographic search engines

Structure of Engine Crawler to gather pages – We crawled 31 million pages in.de domain Build text inverted index Calculate global ranking (i.e. PageRank) Preprocess geographic information Running a search engine on top of these

Geo Coding Three steps 1. Geo extraction Find all elements that might indicate a location 2. Geo matching Map elements to actual locations/coordinates 3. Geo propagation Increase quality and coverage of the geo coding

Geo Extraction Reduce a document to the subset of its terms that have geographic meaning. – Town names – Phone numbers – Zip codes strong terms vs. weak terms killer terms and validator terms

Geo Matching Geo-geo ambiguity Two assumptions: – Single source of discourse – The author most likely meant the largest town with that name Measuring geo matching – Number of matched terms – Fraction of matched terms

Matching Strategy Best of the Big towns First algorithm 1. Group towns into several categories according to their size 2. Start with the category of the largest towns 3. Determine the subset of all towns from this category that contain at least one term in found-strong 4. Rank them according to a mix of the measures 5. Add the best matched town to the result 6. Remove all terms found in this town name from the set 7. Start over at 3, as long as there are new results 8. If there are no new results, repeat the algorithm for the next category

Geographic Footprints of Web Pages Raster data model Representing geographic footprint of a page as a bitmap on an underlying 1024x1024 grid of Germany Each point on the grid has an integer amplitude Bitmaps are kept as quad tree structures

Geographic Footprints of Web Pages Two advantages: 1. Aggregation and other operations are efficient 2. Highly compressed – less than 100 bytes on average after simplification 0-badewanne.baby--shop.de

Geo Propagation Links: propagation of footprints through forward and backward links – Radius-one hypothesis – Radius-two hypothesis (Co-Citation) Sites: aggregation of bitmaps across site

Geographic Query Processing Ranking according to subject-relevance and Distance Ranking according to subject-relevance Boolean operations on inv. index and Footprints Boolean operations on inverted index. User enters key words and geographic position User enters key words Geographic SearchTraditional Search

Geographic Ranking Customizable query footprint Intersection part is the idea of the geographic score Combined with PageRank, term- based score

Efficient Geo Query Processing Intersection from inverted index Calculate approximate geo score For top k results, calculate precise geo scores

Conclusion and Future Work Automatically identify and exploit geographic terms through the use of data mining techniques. Optimized geographic query processing algorithms. Focused crawling to a given geographic area. Mining geographic properties

Thank You