Presentation is loading. Please wait.

Presentation is loading. Please wait.

Meet the crawlers May 4, 2005 Matias Cuenca-Acuna Research Scientist Teoma Search Development.

Similar presentations


Presentation on theme: "Meet the crawlers May 4, 2005 Matias Cuenca-Acuna Research Scientist Teoma Search Development."— Presentation transcript:

1 Meet the crawlers May 4, 2005 Matias Cuenca-Acuna Research Scientist Teoma Search Development

2 Ask Jeeves Properties

3 Ask Jeeves, Inc. January 2005 #7 U.S. Web Property; #8 Global #4 Ranked Search Property 25% Reach - Active U.S. Audience 41M Domestic & 130M Global Unique Users 6% Share-U.S. Searches Over 4 Billion Page Views/Month January 2004 #26 U.S. Web Property #5 Ranked Search Property 10% Reach - Active U.S. Audience 16M Domestic & 30M Global Unique Users 2% Share - U.S. Searches 900M Page Views/Month Comscore Media Metrix, Jan '05

4 InterActiveCorp ( IAC) News IAC announced plan to acquire Ask Jeeves on March 23, 2005 IAC operates more than 40 specialized consumer brands in industries including travel, retailing, media, financial services and real estate IAC's businesses collectively reach 44 million U.S. unique users monthly Ask Jeeves will remain an independent brand with operations headquartered in Oakland, California Acquisition expected to close late in the second quarter or early in the third quarter of 2005 and is subject to customary closing conditions

5 Search Engine Overview + = Crawler Indexing Ranking Algorithm Search Results

6 Our Crawling Goals Politeness –Follow Robots.txt standard. –‘Crawl Delay’ directive to specify download minimum download interval –NOARCHIVE – don’t cache this page –NOINDEX – don’t index this page –NOFOLLOW – index but don’t follow links –Use compression to save your bandwidth up to 75% savings with gzip Freshness –Variable rates of crawling Completeness –Support multiple file formats HTML, PDF, Flash, MS-Office file types, XML –Content partnerships help (structured & un-structured) The New York Times, Bloglines, Wikipedia, etc.

7 Making the best use of us Simplify site organization and navigation –It ensures that crawlers and your users can reach every part of your site –Use site maps Date your content –Less than 20% of the pages ever change ! –Put ‘last modified on’ within the page –Provide an HTTP Last-Modified header Watch out for infinite pages –Calendars Don’t serve pages for year 3001 Block calendars altogether –Session Ids on URLs

8 FAQs Can I submit my site for indexing? –We discontinued paid Site Submission. We rely on finding sites organically and its been effective My site pages not in the index yet? –Patience please! Various crawl cycles take varying amounts of time to process pages. Further, quality of page is ascertained before inclusion in the Teoma index. Our spider FAQ: http://sp.ask.com/docs/about/tech_crawling.html http://sp.ask.com/docs/about/tech_crawling.html


Download ppt "Meet the crawlers May 4, 2005 Matias Cuenca-Acuna Research Scientist Teoma Search Development."

Similar presentations


Ads by Google