1
2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving a site’s rankings 5.Q&A – actually, ask questions anytime!
3 MIT's Google Configuration MIT license is for 3M documents Two collections of 1.5M documents each MIT has over 1M web pages on 1,000 web servers Google follows links from the MIT Home Page web.mit.edu – crawled three times a week Other MIT web servers – crawled twice a week
4 MIT Google does Performs twice as well as Inktomi in a “blind test” Indexes 220 different file formats220 different file formats Provides control over our own crawling schedule Allows user customization of search results format Indexes certificate-restricted content (not implemented yet)
5 MIT Google does NOT Cache old pages Index image files (our decision) Index image ALT tags (Google’s decision) Allow us to fiddle with the relevancy algorithm Tell you “who’s linking to my page” because the GSA does not share that information across collections. When your pages move, we recommend using a 301 redirect.
6 MIT Google does NOT index Java, Perl, Python documentation Debian, GNU/Linux mirrors URLs containing these strings: sipb.mit.edu dev.mit.edu net.mit.edu lees.mit.edu ops.mit.edu classics.mit.edu hypermail pipermail Certificate protected pages No robots sites, no index pages Dynamically generated pages containing ‘?’ except by request URLs containing cgi-bin URLs containing /afs/
7 Telling Google not to index No robots in server No robots in locker/directory No robots in html file No index, follow
8 Avg. daily views - January 2005 Total queries Jan : 340,656
9 Gooogle search forms
10 Simple search form
11 Sample search code Doc
12 Restrict to one directory tree name='as_sitesearch' value=' ' use web.mit.edu/newsoffice not web/newsoffice The slash / matters web.mit.edu/newsoffice to include sub-directories web.mit.edu/newsoffice/ to exclude sub-directories as_sitesearch allows allows you to specify one directory (and all its sub-directories) as the domain to be searched—you cannot specify multiple disparate directories using this option If you want the search feature on your site to search the entire MIT web site, delete this parameter. Doc
13 Restrict to multiple directories or servers Doc Contact and we will create a subcollection for A subcollection is a list of URL patterns that can be referred to by a single name, such as "Library".
14 Advanced search exampleexample
15 Gooogle Custom Results You can customize the look and feel of Google’s search results by providing a stylesheet.
16 Site-wide MIT template
17 IS&T custom results
18 IS&T Search
19 IS&T Custom Results
20 Customizing results You provide the header and footer (HTML) wrapper, and any desired content formatting Google provides the raw data (XML) Google Results Data Your HTML header/footer
21 Results content “title” only
22 How customization works The form points to an XSLT stylesheet Google returns results to query in XML An XSLT document translates the XML into your custom HTML MIT-Google Index MIT-Google Index MIT-Google Index MIT-Google Index Search Query Search Results Stylesheet + HTMLResults =
23 Notes It is not necessary to customize the results. –You can place a search form on your site, and Google will use the site-wide MIT XSLT stylesheet. Updates to the Google service may require you to make changes in your stylesheet. –Subscribe to WCS will provide fee-based production services for custom search results.
24 How to customize the results Plan how you want the results to look Copy the MIT Google XSLT stylesheet Save it to web readable space, naming it google-mysite.xsl
25 Point to your XSL <input type='hidden' name='proxystylesheet' value=' Update your search form to point the MIT-Google server to your custom XSLT style sheet.
26 Step-by-step customization See
27 Documentation (Includes the “official” Google documentation, including their XML specification; also XSLT tips.) Search Engine Submission Tips SS for an Effective SEO Campaign
28 Support The MIT Google team will support your creating a Google search form and answer queries sent to WCS offers fee-based production services for custom search results HTMLResults
29 Q&A