Download presentation
Presentation is loading. Please wait.
1
Smarter Search Engines Using Personalization to Improve Search Results Eugene Cushman Dan Murphy George Stuart Advised by Professor Mark Claypool
2
The Problem There are billions of web pages on the Internet There are billions of web pages on the Internet They vary greatly in quality They vary greatly in quality Growth is Exponential Growth is Exponential Search engines must adapt to keep up Search engines must adapt to keep up
3
Existing Systems Google Google Layered Architecture Layered Architecture PageRank™ PageRank™ GroupLens GroupLens Applied to USENET Applied to USENET Different domain space Different domain space Uses collaborative filtering Uses collaborative filtering
4
Personalization “Qualitative” rankings “Qualitative” rankings Example: “Good Low-Fat Dessert Recipes” Example: “Good Low-Fat Dessert Recipes” Example: “Theories of dinosaur extinction” Example: “Theories of dinosaur extinction” Contrast with specific, factual searches Contrast with specific, factual searches Example: “The batting lineup for the Boston Red Sox on October 28, 1986” Example: “The batting lineup for the Boston Red Sox on October 28, 1986” Exploratory versus “narrow-band” searches Exploratory versus “narrow-band” searches
5
Collaborative Filtering Uses aggregate data to predict user preference Uses aggregate data to predict user preference User A like Foo User A like Foo User B trusts User A’s preference User B trusts User A’s preference User B can be predicted to prefer Foo User B can be predicted to prefer Foo (extremely simplified) (extremely simplified) Algorithms Algorithms Pearson PearsonCorrelationCoefficient
6
Foible: the best of both worlds Foible integrates disparate technologies to provide a powerful web-searching experience Foible integrates disparate technologies to provide a powerful web-searching experience Search Engine Indexing Search Engine Indexing Collaborative Filtering Collaborative Filtering Results in demonstrable improvement in search results Results in demonstrable improvement in search results
7
Foible Architecture Spider Spider Analyzer Analyzer Cache Cache Collaborative CollaborativeEngine Search Engine Search Engine Web Interface Web Interface
8
Web Spider Parallelized Depth-first crawl of web Parallelized Depth-first crawl of web Create lists of nodes by parsing HTML, looking for links Create lists of nodes by parsing HTML, looking for links Starts with link-heavy “seed node” Starts with link-heavy “seed node” Custom seed node incorporating search results on “dinosaurs” from Yahoo, Google, and others Custom seed node incorporating search results on “dinosaurs” from Yahoo, Google, and others Foible Statistics Foible Statistics Over 27,000 web pages crawled Over 27,000 web pages crawled In excess of 500 Megs of web data cached In excess of 500 Megs of web data cached Total database size of 1 Gigabyte Total database size of 1 Gigabyte 7.269 Million rows in Word Frequency table 7.269 Million rows in Word Frequency table
9
Analyzer Parses HTML to create describe attributes of web page Parses HTML to create describe attributes of web page Document Size, Number of Sentences Document Size, Number of Sentences Reading Level (Fog, Flesch-Kincaid) Reading Level (Fog, Flesch-Kincaid) Number of Images Number of Images Content-to-HTML ratio Content-to-HTML ratio Number of Links Number of Links Precomputes word-frequency tables Precomputes word-frequency tables
10
Collaborative Searching Three components of search algorithm Three components of search algorithm 1. Word Frequency 2. Profile Correlation 3. Recommender System Computes ranking of all pages Computes ranking of all pages Returns results to user Returns results to user
11
User Study Approximately 50 Users Approximately 50 Users 20 Completed study in its entirety 20 Completed study in its entirety Consisted of 5 Searches Consisted of 5 Searches Predefined broad topics Predefined broad topics Users provided explicit feedback Users provided explicit feedback Search results presented in two column format Search results presented in two column format Enhanced Collaborative Results Enhanced Collaborative Results Control – Word Frequency Only Control – Word Frequency Only
12
User Study Data 1
13
User Study Data 2
14
Results and Conclusion Users unanimously prefer collaborative ratings to non-collaborative Users unanimously prefer collaborative ratings to non-collaborative Smarter searches produced pages ranked in better order according to study Smarter searches produced pages ranked in better order according to study Introducing collaborative filtering into traditional search engine technology results in better search results! Introducing collaborative filtering into traditional search engine technology results in better search results!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.