Download presentation
Presentation is loading. Please wait.
1
Search Engine Developments Jon Warbrick University of Cambridge Computing Service jon.warbrick@ucs.cam.ac.uk
2
web-search.cam.ac.uk ● Indexes 400 more-or-less "official" web servers ● Provides 'packaged searches' for Ucam websites ● Uses 'Verity Ultraseek' ● Currently a single search engine accessed by internal and external users
3
Current architecture CUDN & cam.ac.uk Rest of the world
4
Current architecture
8
The Problem ● The Search engine is inside the CUDN boundary ●... and so can see 'ucam-only' material ●... which it will tell external users about ●... so we advise webmasters not to index it ●... but that means that internal users can't find it ● It's also tricky to get the restrictions right
9
Future architecture
13
X
17
● Consider giving the internal search engine access to material currently excluded from indexing – User Agent is Ultraseek (internal search; webmaster@ucs.cam.ac.uk) ● Set-up 192.153.213.0 – 192.153.213.255 as being 'outside the University' – the external search engine will use one of these addresses and will NOT have a ' cam.ac.uk ' name ● Details at http://www.cam.ac.uk/cs/web-search/developments.html Things for you to do
18
Other possible enhancements ● Spell checking ● Passage-based summaries ● Wild card queries ● Thesaurus and smart 'no hits' page ● Quick links ● Following links in JavaScript ● Page expert
19
The Raven question ● Should the search engine be able to index Raven-protected pages?
20
Any more questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.