Smarter Search Engines Using Personalization to Improve Search Results Eugene Cushman Dan Murphy George Stuart Advised by Professor Mark Claypool.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
CPSC 335 Application of Trees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Search Engines and Information Retrieval
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
Internet Research Search Engines & Subject Directories.
Library 10 – Information Competency Search Engines.
Search Engine Optimization HOW AND WHY Introduction to SEO SEO stands for “Search Engine Optimization” and often refers to the ability to easily locate.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Roy McElmurry EXPLORATION SEMINAR 2 SEARCHING AND GOOGLE.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
Graph-RAT Overview By Daniel McEnnis. 2/32 What is Graph-RAT  Relational Analysis Toolkit  Database abstraction layer  Evaluation platform  Robustly.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
WHAT IS A SEARCH ENGINE. Widescreen Presentation Proteus, Keeper of Knowledge. Proteus is synonymous with change and success.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
HOW BIG IS THE INTERNET? As of 2005, Internet size is estimated at 5 million terabytes: 5.
Lecture 4 Title: Search Engines By: Mr Hashem Alaidaros MKT 445.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Search Engine Architecture
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Search Engines.
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Internet Search Tools Understand Internet search tools and methods.
Week 1 Introduction to Search Engine Optimization.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
Data mining in web applications
Search Engine Architecture
Understand Internet Search Tools
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Objective % Explain concepts used to create websites.
Search Engines & Subject Directories
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
1.01- Understand Internet search tools and methods.
1.01- Understand Internet search tools and methods.
1.01- Understand Internet search tools and methods.
How Search Engines Work?
Search Engines & Subject Directories
1.01- Understand Internet search tools and methods.
Search Engines & Subject Directories
1.01- Understand Internet search tools and methods.
1.01- Understand Internet search tools and methods.
Presentation transcript:

Smarter Search Engines Using Personalization to Improve Search Results Eugene Cushman Dan Murphy George Stuart Advised by Professor Mark Claypool

The Problem There are billions of web pages on the Internet There are billions of web pages on the Internet They vary greatly in quality They vary greatly in quality Growth is Exponential Growth is Exponential Search engines must adapt to keep up Search engines must adapt to keep up

Existing Systems Google Google Layered Architecture Layered Architecture PageRank™ PageRank™ GroupLens GroupLens Applied to USENET Applied to USENET Different domain space Different domain space Uses collaborative filtering Uses collaborative filtering

Personalization “Qualitative” rankings “Qualitative” rankings Example: “Good Low-Fat Dessert Recipes” Example: “Good Low-Fat Dessert Recipes” Example: “Theories of dinosaur extinction” Example: “Theories of dinosaur extinction” Contrast with specific, factual searches Contrast with specific, factual searches Example: “The batting lineup for the Boston Red Sox on October 28, 1986” Example: “The batting lineup for the Boston Red Sox on October 28, 1986” Exploratory versus “narrow-band” searches Exploratory versus “narrow-band” searches

Collaborative Filtering Uses aggregate data to predict user preference Uses aggregate data to predict user preference User A like Foo User A like Foo User B trusts User A’s preference User B trusts User A’s preference User B can be predicted to prefer Foo User B can be predicted to prefer Foo (extremely simplified) (extremely simplified) Algorithms Algorithms Pearson PearsonCorrelationCoefficient

Foible: the best of both worlds Foible integrates disparate technologies to provide a powerful web-searching experience Foible integrates disparate technologies to provide a powerful web-searching experience Search Engine Indexing Search Engine Indexing Collaborative Filtering Collaborative Filtering Results in demonstrable improvement in search results Results in demonstrable improvement in search results

Foible Architecture Spider Spider Analyzer Analyzer Cache Cache Collaborative CollaborativeEngine Search Engine Search Engine Web Interface Web Interface

Web Spider Parallelized Depth-first crawl of web Parallelized Depth-first crawl of web Create lists of nodes by parsing HTML, looking for links Create lists of nodes by parsing HTML, looking for links Starts with link-heavy “seed node” Starts with link-heavy “seed node” Custom seed node incorporating search results on “dinosaurs” from Yahoo, Google, and others Custom seed node incorporating search results on “dinosaurs” from Yahoo, Google, and others Foible Statistics Foible Statistics Over 27,000 web pages crawled Over 27,000 web pages crawled In excess of 500 Megs of web data cached In excess of 500 Megs of web data cached Total database size of 1 Gigabyte Total database size of 1 Gigabyte Million rows in Word Frequency table Million rows in Word Frequency table

Analyzer Parses HTML to create describe attributes of web page Parses HTML to create describe attributes of web page Document Size, Number of Sentences Document Size, Number of Sentences Reading Level (Fog, Flesch-Kincaid) Reading Level (Fog, Flesch-Kincaid) Number of Images Number of Images Content-to-HTML ratio Content-to-HTML ratio Number of Links Number of Links Precomputes word-frequency tables Precomputes word-frequency tables

Collaborative Searching Three components of search algorithm Three components of search algorithm 1. Word Frequency 2. Profile Correlation 3. Recommender System Computes ranking of all pages Computes ranking of all pages Returns results to user Returns results to user

User Study Approximately 50 Users Approximately 50 Users 20 Completed study in its entirety 20 Completed study in its entirety Consisted of 5 Searches Consisted of 5 Searches Predefined broad topics Predefined broad topics Users provided explicit feedback Users provided explicit feedback Search results presented in two column format Search results presented in two column format Enhanced Collaborative Results Enhanced Collaborative Results Control – Word Frequency Only Control – Word Frequency Only

User Study Data 1

User Study Data 2

Results and Conclusion Users unanimously prefer collaborative ratings to non-collaborative Users unanimously prefer collaborative ratings to non-collaborative Smarter searches produced pages ranked in better order according to study Smarter searches produced pages ranked in better order according to study Introducing collaborative filtering into traditional search engine technology results in better search results! Introducing collaborative filtering into traditional search engine technology results in better search results!