Search Engine Developments Jon Warbrick University of Cambridge Computing Service
web-search.cam.ac.uk ● Indexes 400 more-or-less "official" web servers ● Provides 'packaged searches' for Ucam websites ● Uses 'Verity Ultraseek' ● Currently a single search engine accessed by internal and external users
Current architecture CUDN & cam.ac.uk Rest of the world
Current architecture
The Problem ● The Search engine is inside the CUDN boundary ●... and so can see 'ucam-only' material ●... which it will tell external users about ●... so we advise webmasters not to index it ●... but that means that internal users can't find it ● It's also tricky to get the restrictions right
Future architecture
X
● Consider giving the internal search engine access to material currently excluded from indexing – User Agent is Ultraseek (internal search; ● Set-up – as being 'outside the University' – the external search engine will use one of these addresses and will NOT have a ' cam.ac.uk ' name ● Details at Things for you to do
Other possible enhancements ● Spell checking ● Passage-based summaries ● Wild card queries ● Thesaurus and smart 'no hits' page ● Quick links ● Following links in JavaScript ● Page expert
The Raven question ● Should the search engine be able to index Raven-protected pages?
Any more questions?