Presentation is loading. Please wait.

Presentation is loading. Please wait.

When worlds collide Metasearching meets central indexes Mike Taylor – Index Data –

Similar presentations


Presentation on theme: "When worlds collide Metasearching meets central indexes Mike Taylor – Index Data –"— Presentation transcript:

1 When worlds collide Metasearching meets central indexes Mike Taylor – mike@indexdata.com Index Data – http://indexdata.com/

2 Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com

3 Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com

4 Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data

5 Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Problem solved!

6 Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data ??

7 Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching

8 Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching 360 Search EHIS (EBSCO) MetaLib

9 Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching 360 Search EHIS (EBSCO) MetaLib Pazpar2 (Open source)

10 Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching

11 Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data A.K.A. federated search Searching

12 Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data A.K.A. federated search A.K.A. distributed search Searching

13 Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data A.K.A. federated search A.K.A. broadcast search A.K.A. distributed search Searching ?

14 Back to the sad searcher When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data ??

15 Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting

16 Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting Summon WorldCat Primo Central

17 Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting Summon WorldCat Primo Central MasterKey

18 Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting A.K.A. local index

19 Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting A.K.A. local index A.K.A. discovery services

20 Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting A.K.A. local index A.K.A. vertical search A.K.A. discovery services ?

21 We need a controlled vocabulary! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Metasearch = Federated search = Distributed search = Broadcast search Central index = Local index = Discovery services = Vertical search (if you ever heard anything so dumb)

22 Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Central indexing compared with metasearching: - requires harvesting infrastructure - requires lots of local storage - requires co-operation from services to be harvested - does not have access to all searchable data - will always be somewhat out of date - is faster at search time (or SHOULD be) - allows data to be normalised (e.g. dates extracted) - allows for better relevance ranking - can provide pre-baked facets - may have access to some data that not searchable

23 Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com

24 Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com

25 Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com

26 Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Let's do both!

27 When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search

28 When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search

29 When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search

30 When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search

31 Metasearch hides the complexity When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching

32 Metasearch Nine tenths under The surface When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching

33 Metasearch What you see looks beautiful When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching

34 Problems that need solving When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com A. Problems with pure metasearching B. How those problems change when you add a central index

35 Problems with metasearching When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Examples based on Index Data's suite: Pazpar2 is a free metasearching engine with a stupid name http://indexdata.com/pazpar2/ MasterKey is a non-open suite that wraps it http://indexdata.com/masterkey/ MasterKey is only one way to use Pazpar2 Also integrated into other vendors' UIs.

36 Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI Must be made available via a standard protocol

37 Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI Must be made available via a standard protocol Option 1: build a gateway in Perl http://indexdata.com/simpleserver/

38 Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI Must be made available via a standard protocol Option 1: build a gateway in Perl http://indexdata.com/simpleserver/ Option 2: MasterKey Connect (non-open) http://indexdata.com/connector-framework

39 Problems with metasearching #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Catalogs searchable using ANSI/NISO Z39.50 Support is very nominal in some cases

40 Problems with metasearching #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Catalogs searchable using ANSI/NISO Z39.50 Support is very nominal in some cases IRSpy probes behaviour http://irspy.indexdata.com MasterKey target profiles describe behaviour

41 Problems with metasearching #3: Data servers don't support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com

42 Problems with metasearching #3: Data servers don't support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Pazpar2 does its own relevance ranking (Part of merging/deduplication)

43 Problems with metasearching #4: Data servers don't return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com

44 Problems with metasearching #4: Data servers don't return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Pazpar2 calculates its own facets

45 There is a lot of magic in the magic box Searching Sorting Merging Deduplication Relevance Facet generation Time travel... When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data

46 There is a lot of magic in the magic box Searching Sorting Merging Deduplication Relevance Facet generation Time travel... When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Pazpar2 Data Remember, our engine is free: http://indexdata.com/pazpar2/

47 When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! What happens when we add a central index?

48 Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI

49 Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI

50 Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI You can't harvest Google

51 Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI You can't harvest Google You just can't

52 Problems with integrated search #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Repositories harvestable using OAI-PMH (an even worse name than pazpar2) Support is very nominal in some cases

53 Problems with integrated search #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Repositories harvestable using OAI-PMH (an even worse name than pazpar2) Support is very nominal in some cases OAI-PMH client must be very tolerant Extensive data-cleaning is usually required

54 Problems with integrated search #3: Central index does support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Returned records carry relevance scores Must be merged with records scored by engine Requires score normalisation into same range Existing ordering may be used in merge

55 Problems with integrated search #3: Central index does support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Unranked #1 Ranked #1 Ranked #2 Solr Sort Merged Unranked #2 Sort

56 Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Lists of field values with occurrence counts: Author Kernighan 27 Pike 13 Ritchie 7 Thompson 4 Title C 7 Unix 35 Programming 16 Date 1977 5 1978 4 1979 2 1981 2

57 Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Lists are returned or calculated for each server: Server 1 (central index) (all facets from 2000 hits) Cat 68 Dinosaur 162 Fish 145 Frog 19 Server 2 (metasearch) (1000 hits, 100 records) Cat 7 Dog 10 Dinosaur 87 Fish 23

58 Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Metasearched counts normalised by total hit-count Server 1 (central index) (all facets from 2000 hits) Cat 68 Dinosaur 162 Fish 145 Frog 19 Server 2 (metasearch) (normalised to 1000 hits) Cat 70 Dog 100 Dinosaur 870 Fish 230

59 Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Facet lists are merged Servers 1+2 (integrated) (as though for all records in result sets) Cat 68+70 = 138 Dog 0+100 = 100 Dinosaur 162+870 = 1032 Fish 145+230 = 375 Frog 19+0 = 19

60 Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Fringe benefit: facet-count normalisation is also useful when doing pure metasearching. Servers 1+2 (as though for all records in result sets) Cat 68+70 = 138 Dog 0+100 = 100 Dinosaur 162+870 = 1032 Fish 145+230 = 375 Frog 19+0 = 19

61 Summary of search issues When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Issue Metasearch solution Central index solution No data server Build gateways MasterKey Connect --- Bad data server Probe capabilities Profile targets Tolerant harvester Data-cleaning Relevance scores Magic engine Normalise scores Ingest from server Facets Magic engine Normalise counts Ingest from server

62 When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting

63 When worlds collide Metasearching meets central indexes Mike Taylor – mike@indexdata.com Index Data – http://indexdata.com/


Download ppt "When worlds collide Metasearching meets central indexes Mike Taylor – Index Data –"

Similar presentations


Ads by Google