Google Search Appliance November 2, 2010 Susan Fagan.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Our Social Media Why and how to compose a social media release.
2008 EPA and Partners Metadata Training Program: 2008 CAP Project Geospatial Metadata: Intermediate Course Module 3: Metadata Catalogs and Geospatial One.
WillHelpYouOut.com Hits 1000 Let’s get Started.
Final Project Instructor: Nguyen Anh Tu Students: Tran Tien Tai Tran Tien Tai Tran Ngoc Mai Tran Ngoc Mai Tu Kim Tuan Tu Kim Tuan Nguyen Ngoc Phuong Nguyen.
1 IDX. 2 What you will learn: What IDX is Why its important How to use it Tips and tricks Introduction Q & A.
BeKnown How-to: Company Profiles & Jobs App for Timeline.
Chapter 1: The Database Environment
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Take another look Alison Hayman Search Solutions Unit Dissemination Divison February 2011 Statistics Canada site search.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
eClassifier: Tool for Taxonomies
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
1 Discovering SharePoint What is SharePoint? What is SharePoint? How does it do that? How does it do that? How do we use it? How do we use it? What shape.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Advantages of CMS CMS Debate: Challenging the Consensus Piero Tintori - TERMINALFOUR Stephen Pope - Eduserv
Internet Search Engine freshness by Web Server help Presented by: Barilari Alessandro.
History Fair NHD Web Site Take out your packet and turn to page 15.
Top 10 things you need to know about SharePoint Site Administration
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Microsoft Office 2010 Basics and the Internet
How are we CREATING Your Web and Global Presence.
ABC Technology Project
Word Lesson 7 Working with Documents
1 Contract Inactivation & Replacement Fly-in Action ( Continue to Page Down/Click on each page…) Electronic Document Access (EDA)
CSU Extension Webpage Template Session 8 April 2010.
© S Haughton more than 3?
New User Interface Training Guide for eCat November 2013.
1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013.
Presented by Douglas Greer Creating and Maintaining Business Objects Universes.
]po[ Docu Wiki.  ]project-opem[ 2008, Rollout Methodology / Frank Bergmann / 2 Types of Readers  Beginners – These users have just started using ]po[.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Addition 1’s to 20.
Seek And Ye Shall Find The Collected Wisdom Gleaned from the EdSeek Project Enlightenments of the Glaringly Obvious Only After We Learned How Glaringly.
25 seconds left…...
COMPLETE MARKETING SOLUTIONS 1 BY. Thinking Of Marketing Is about providing YOU with the products you need to be seen by thousands of people – looking.
® Microsoft Office 2010 Browser and Basics.
Week 1.
We will resume in: 25 Minutes.
12 January 2009SDS batch generation, distribution and web interface 1 ExESS IT tool for SDS batch generation, distribution and web interface ExESS IT tool.
1 Search Update Webmasters User Group by Kevin Paddock, DTS Search Administrator State of California Webmasters User Group Wednesday,
What’s new in WebSpace Changes and improvements with Xythos 7.2 Effective June 24,
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Help the users find what they need using the Search Speaker: Frédérique Harmsze 15 th November 2014 Host: Matthew Hughes.
SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
SEO Techniques Tech Talk 29 th August 2013 (By PEN Vannak)
Databases & Data Warehouses Chapter 3 Database Processing.
Making You Explore the Potential of Online Business CMS Based - Web Development Solutions.
RSS Feeds What, Why, & How… …without a CMS Don Parsons
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
Natural Resource Program Center Dissolving Data Boundaries Search Mar /17/2011 Dan Kocol Functional Analyst I&M.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Keywords and Search Results & Upcoming Updates August 30, 2011.
Module 10 Administering and Configuring SharePoint Search.
State of Search Peter Buch – CSC EPA Web Workgroup Training Conference February 14, 2008 Potomac Yard Facility (Crystal City), Arlington, VA.
1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
Introduction to Enterprise Search Corey Roth Blog: Twitter: twitter.com/coreyrothtwitter.com/coreyroth.
Google Search Appliance: improving the search experience
Internet Searching: Finding Quality Information
Presentation transcript:

Google Search Appliance November 2, 2010 Susan Fagan

2 Why Google Search Appliance? A different approach to search at EPA Smarter ranking Improved indexing Easier operations A future Were going to call it GSA from here on in

3 How GSA ranks documents Its a secret, but we know some things –Page rank –Self learning We can control some things –Date biasing –Source biasing –Metadata biasing –Best bets Were going to let it do its thing before we tune it too much

4 How GSA ranks documents: Page Rank Who links to your pages? Who links to pages that link to your pages? How does everybody link? –What does it say in the link text? –Is the link always the primary URL (because if it isnt, you dont get any points)? A primary URL is a URL that contains no aliases that are not primary. Primary as defined by what you put in the TSSMS Alias Tool.

5 How GSA Ranks Documents: Things We Can Control Date biasing –Newer is better –We control how much better Source biasing –Boost or decrease chunks of our website –Regions are slightly decreased for Agency search Metadata biasing –We control how much each metadata field counts –We can turn up the bias as metadata quality improves

6 How GSA Ranks Documents: More Things We Can Control Best Bets –Like buying keywords from Google.com –Specific pages for specific keywords or phrases –Always featured at the top –Take effect immediately

7 How GSA Indexes Documents Continuous crawl Learns by experience Crawl rates tunable by host and time Requires some starting points (seeds) Restricted by Do Not Crawl list A manually maintained list in the GSA Admin UI, of URL patterns that the crawler should not visit. Respects robots.txt (in its own way)

8 How EPA is implementing GSA Same Java webapp on the same servers Your search form will stay the same Area search wont change much Your XML search application may change (most wont) Smart, fast indexing, with some help Only indexing primary URLs

9 Implementing GSA: Your search form will stay the same Implemented Northern Light via an object-oriented Java application –We get to keep our code this time –6 weeks to change it, instead of 6 months –Nothing changes for client pages Two Model 7007 Google Search Appliances - -Primary -Hot spare for failover -Parallel indexes 2,000,000 document license

10 Implementing GSA: Your search form URL is the same All common elements work the same Some obscure elements go away –weighted_search, search_crumbs Custom result templates work the same Advanced search works the same

11 Implementing GSA: Area Search Area search is here for now If you search by TSSMS –We will translate it on the fly to URL –We will only translate TSSMS to primary alias If you search by URL –Nothing changes… –…. But aliases are your problem Contact Peter to test your area search

12 Implementing GSA: Your XML search app Parameters and templates are unchanged GSA response packet automatically transformed to original NL format Only 1,000 results are available for a single query 3 applications have been observed exceeding that limit

13 Implementing GSA: Smart, fast indexing Continuous crawl – scans the website at least daily for new links If its not linked, it wont be found Librarian looks daily for new content If all this doesnt work (quickly), tell the librarian Notes databases do not require Verity Views

14 Implementing GSA: Indexing your primary URL Search engines think different URLs are different documents This means duplicates in search results All non-primary aliases are being placed in the Do Not Crawl list

15 What will our customers see? The same thing…. At first. Breadcrumbs are gone…what were they, anyway? Folders replaced by Related Searches FAQ will come back Best Bets for top documents The document theyre looking for!

16 What do we have to do? Plan our November 19 public access implementation Test (with your help) Implement Make it better

17 What do you have to do? Keep working on ROT Keep working on metadata Dont change your search form… … Area search will work, if you want it Tell us what you think

18 What are we leaving out … for now? EPA thesaurus –Contains only general terms –We will add EPA vocabulary Googles spellchecker –Well use our own for now –Well compare and use the winner RSS presentation – delivers only raw XML in search results, for now Recent searches

19 Whats in our future? Marketplace of One Box modules –Faceted search? –Contextual search? –Business intelligence? More social media OneEPA integration Web CMS integration Advanced analytics Special collections Geographic search? GSA for intranet

20 Contact: Susan Fagan

21