Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search Engine Developments Jon Warbrick University of Cambridge Computing Service

Similar presentations


Presentation on theme: "Search Engine Developments Jon Warbrick University of Cambridge Computing Service"— Presentation transcript:

1 Search Engine Developments Jon Warbrick University of Cambridge Computing Service jon.warbrick@ucs.cam.ac.uk

2 web-search.cam.ac.uk ● Indexes 400 more-or-less "official" web servers ● Provides 'packaged searches' for Ucam websites ● Uses 'Verity Ultraseek' ● Currently a single search engine accessed by internal and external users

3 Current architecture CUDN & cam.ac.uk Rest of the world

4 Current architecture

5

6

7

8 The Problem ● The Search engine is inside the CUDN boundary ●... and so can see 'ucam-only' material ●... which it will tell external users about ●... so we advise webmasters not to index it ●... but that means that internal users can't find it ● It's also tricky to get the restrictions right

9 Future architecture

10

11

12

13 X

14

15

16

17 ● Consider giving the internal search engine access to material currently excluded from indexing – User Agent is Ultraseek (internal search; webmaster@ucs.cam.ac.uk) ● Set-up 192.153.213.0 – 192.153.213.255 as being 'outside the University' – the external search engine will use one of these addresses and will NOT have a ' cam.ac.uk ' name ● Details at http://www.cam.ac.uk/cs/web-search/developments.html Things for you to do

18 Other possible enhancements ● Spell checking ● Passage-based summaries ● Wild card queries ● Thesaurus and smart 'no hits' page ● Quick links ● Following links in JavaScript ● Page expert

19 The Raven question ● Should the search engine be able to index Raven-protected pages?

20 Any more questions?


Download ppt "Search Engine Developments Jon Warbrick University of Cambridge Computing Service"

Similar presentations


Ads by Google