Avi Rappoport, SearchTools.com InternetWorld NY 2001 Site Search That Doesn't Stink
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Speaker Info Avi Rappoport Libraries, software, information architecture, user experience, always search engines SearchTools.com Guides, analysis, product info and news Search Tools Consulting Help sites and intranets implement effective search engines
Avi Rappoport, SearchTools.com InternetWorld NY 2001 What is Site Search? Search engine for a single web site or Intranet Local search for local information –Not web wide search or vertical portal Server process or remote service (ASP) –Great products available –Don’t bother to write your own
Avi Rappoport, SearchTools.com InternetWorld NY 2001 What Sites Need Search? Informational sites Commerce sites Sites with support materials –Documentation –FAQs –Message boards –Return policies
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Why Do People Search? 40% of users are “search-dominant” –Jakob Nielsen, July 1997 Supplement site navigation –Skip layers of organizational hierarchy When they don’t see a perfect category –Power users and frequent visitors Many search by part number –Avoid language confusion
Avi Rappoport, SearchTools.com InternetWorld NY 2001 How a Search Engine Works Create an Index Receive a query -- a set of search terms and commands Look in the index file for matches Gather the matching page entries and rank them by relevance Format the results Return the result page in HTML to the searcher’s web browser
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Search Engine Diagram Search Form Search Engine Index Indexed Pages Results Page send search query look in index get list of results return formatted results user opens a found page Indexer
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Database vs. Text Search Databases –number oriented –problems with multiword searches –field limitations –sort results by field: date, price, product ID Text Search –text-oriented, with operators –separate query processing –relevance ranking!
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Indexing Process Indexer Application –Gathers and stores text Inverted Index File contains entries for each instance of each word: –Location within file ( for phrase matching) –Enclosing field or meta tag –Pointer to document info Document Information File –URL, title, size, date, description, etc.
Avi Rappoport, SearchTools.com InternetWorld NY 2001 What Gets Indexed Plain text works –Graphic text ignored –Problems with Flash, Java, JavaScript, etc. –Binaries: PDF, MS Word, Excel, WordPerfect Ignore navigation & boilerplate –Special pages for indexing – or similar tagging Short vs. long documents
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Other Indexing Issues Duplicate Detection Completeness –Index everything –Hide archives a little Freshness –Must keep the index in sync with the content
Avi Rappoport, SearchTools.com InternetWorld NY 2001 File System Indexers Indexes files from local disks and mounted servers Simple, fast, easy update Requires tidy server folders –Nothing obsolete –Nothing in-progress –Synchronized with access controls
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Robot Spider Indexers Crawl from URLs and follow links View pages like an end-user –dynamic page content pre-rendered Channeled through server access control Multiple and remote servers Slower than local indexing Problem links, especially. JavaScript
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Database Indexers Work best locally –Most use JDBC or ODBC –Can index via the web Easiest with straightforward tables –Perform a join to build listings for indexing –Problems with legacy systems May not include modification dates for records
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Search Operators –Internet Query (+, -, "quotes") –Boolean (AND, OR, NOT, parentheses) –Radio buttons or menus Field Searching –File info –Meta tags & XML –Database fields Major Search Features
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Search Synonyms (Thesaurus) Search for alternate terms –Numbers: 40 / forty –Alternate spellings: color / colour –Spacing issues: Super Bowl / Superbowl –Technical terms: hives / uticaria –True synonyms: shears / clippers, doctor / physician Exact match option Stemming (language-based)
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Possibly Useful Features Spellchecking Fuzzy Matching –Handle typos and misspelling –Tend to return way too many results Natural-Language Searching –Logs show few sentence searches Concept Search –Hard to determine "aboutness"
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Basic search field everywhere! –Site home page –Navigation area Simple search page –Few most useful options –Zones can be very helpful –Include help and/or tips on search pages Search form on the Help page Search Form Interfaces
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Simple Search Page
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Advanced Search Forms Provide all available options –Graphic interface for multiple query elements –Standard field searching (title, URL, text) –Modification date (often inaccurate) –File types Metadata, XML tags or Catalog Fields –Keywords and descriptions –Size, materials, color, etc.
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Advanced Search Page
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Commerce Search Issues Product records tend to lack detail Index all text fields –Product name, description, color, material Terms need synonyms Pictures in results are very useful Better to find too much than too little
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Relevance Ranking Real Relevance How well a page answers the underlying question Search Relevance How well the words on a page match the query Algorithms vary: test with real data Use search engine weighting options Recommendations for special searches
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Pages Conform to web conventions Provide site context –use site colors and images –include site navigation options Show search metadata –Search box with options –Number of results –Location with results list Language localization
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Items - Basic For each page or record –Title or product name –Link and possibly URL Optional –Size –Date modified –File Format Hit highlighting –Emphasize match terms
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Items - Description Page Description –Meta description tag –Properties description field Text Extract –Top of page (gets navigation) –Or first tag Context –Snippet shows text around word match
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Page - Info Site Title Description URL Category
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Page - Commerce
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Page Commerce with Extras Don’t just search products –People look for general information and return policies.
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Page - Not Enough Info
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Results Page - Too Much Info
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Too Many Results Common when searching large sites Track common searches, show recommended pages Sort phrase & all matches at the top Display matched terms in context Allow search in zones Consider clustering results in categories
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Clustered Results Titles Cat. MatchesCategory Word Matches
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Why Searches Fail Common reasons –topic out of scope for site –vocabulary mismatch (car vs. auto) –misspellings or typos –complex search requirements not met –search syntax error –server errors (should be rare!) Provide good no-matches pages –Include site context & navigation
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Zero Matches Page Bad Example
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Zero MatchesPage Good Example
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Usability Testing Define test suites –Evaluate relevance & interface –Add problems as encountered Examine syntax issues –Evaluate default options carefully Informal testing is OK –Five people a good start Watch for surprises
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Search Log Analysis Store basic search data –Query, number of results, date/time, IP address or session ID –Integrate with web log referral pages Market research, for free! –Top searches –No matches –New topics and trends
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Effective Site Search Index everything and keep it fresh Add synonym and spell checking Tweak relevance until it works for you Customize results pages Provide help for search failure Watch your search logs for guidance Check out SearchTools.com or call us for help