Download presentation
Presentation is loading. Please wait.
Published byCecil Montgomery Modified over 9 years ago
1
1 Datamining the Internet: Alexa Brewster Kahle President, Alexa Internet brewster@alexa.com
2
2 To Answer Any Question... F Know a lot F Know what is important F Be right enough Alexa: The web navigation service that learns from people
3
3 Know Alot: Other Repositories F Library of Alexandria: 800GB (400k scrolls @2MB) F Library of Congress: 20TB (20M books, ascii) F Dialog Information Service: 3-5TB F Video Store: 8TB (5k videos, 1GB/hr) F Public Branch Library: 3TB (300k scanned books) F Radio Station: 1TB (15k hrs of music) F... Alexa’s Internet Archive: 10TB
4
4 Know A lot: Gathering F Web Snapshot on T3 in 20 days F User’s Paths essential as well
5
5 8 Terabytes so far
6
6 Web Stats F 1million sites, doubling every 6 months (millions of authors) F More videos, dynamic pages, Java etc. F 15 links on each page
7
7 Storage Snapshot of the Web on Tape Jukebox costs $80k
8
8 Knowing what is Important: Mining the WWW for Quality F Content: 100 million pages F Link Structure: 750 million links F Usage paths: many 100 million hits
9
9 Be Right Enough: being useful F Competition –Directories: u Biggest only links to < 1% of the WebPages –Search Engines: u Returning 1000’s of hits (sometimes millions) F Trends: –Move to “channels” of less content, but good –limit crawling (50M pages and holding)
10
10 Be Right Enough: Alexa F Where am I? F Where do I want to go? F Alexa: F “Can I trust this information” F What should I look at next?
11
11
12
12
13
13
14
14 Travel Agents
15
15 Conde Naste Travel
16
16 Ford Vehicles Homepage
17
17 Ford’s Mustang Page
18
18 Independent Mustang Page
19
19 Surrealism Page
20
20 Women Surrealists
21
21 Archive in action
22
22 Alexa Conclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.