Download presentation
Presentation is loading. Please wait.
Published byGregory Reed Modified over 8 years ago
1
Department of Information Technology e-Michigan Web Development
2
Inktomi - A little history. called which is owned by which is now owned by Yahoo.
3
Department of Information Technology e-Michigan Web Development Issues we had with the search Multiple records for the same piece of content. Limited advanced search functionality. Assets are stored in the same document directory. All documents are searched regardless of what agency site you are searching from.
4
Department of Information Technology e-Michigan Web Development What we did to fix the problems Changed Inktomi settings to only index one URL for content that has the same Title and Body. Added/Enhanced an Advance Search form. Created a new collection for Mi News Wire because the changes to Inktomi were excluding MI News Wire Planned enhancements for Advanced search Separation of documents by agency.
5
Department of Information Technology e-Michigan Web Development Spider collections Spider collections - We have three collections that are used when crawling the State of Michigan websites. Crawl all sites that do not contain query Crawl all site that use queries Crawl MI News Wire
6
Department of Information Technology e-Michigan Web Development Time of Crawl The search engine crawls every night at from 05:00 PM – 7:30 AM of the next day. There are times when the crawl does not complete in this time period. The crawl will then pick up where it left off the next evening when this crawl process is run.
7
Department of Information Technology e-Michigan Web Development Crawl Intervals There are 6 revisit queues. The revisit queues are used to determine when to re-crawl a document, which are based upon time intervals. Minimum document revisit interval – 2 days After being placed in the search index, a document will be be placed in queue to be revisited in 2 days. If the document has not been changed then it is placed in the next revisit queue. Maximum document revisit interval – 10 days A document will never go more than 10 days without being revisited.
8
Department of Information Technology e-Michigan Web Development Weight of a document Index Weights: The importance of text relative to the body text of a document. Determine where a document will appear in the search results, based upon the text that is being searched. Title is weighted 8. Keywords are weighted 4. Description is weighted 4.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.