Download presentation
Presentation is loading. Please wait.
Published byJayden Couden Modified over 10 years ago
1
When worlds collide Metasearching meets central indexes Mike Taylor – mike@indexdata.com Index Data – http://indexdata.com/
2
Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com
3
Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com
4
Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data
5
Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Problem solved!
6
Search When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data ??
7
Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching
8
Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching 360 Search EHIS (EBSCO) MetaLib
9
Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching 360 Search EHIS (EBSCO) MetaLib Pazpar2 (Open source)
10
Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching
11
Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data A.K.A. federated search Searching
12
Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data A.K.A. federated search A.K.A. distributed search Searching
13
Metasearch When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data A.K.A. federated search A.K.A. broadcast search A.K.A. distributed search Searching ?
14
Back to the sad searcher When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data ??
15
Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting
16
Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting Summon WorldCat Primo Central
17
Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting Summon WorldCat Primo Central MasterKey
18
Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting A.K.A. local index
19
Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting A.K.A. local index A.K.A. discovery services
20
Central index When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data Fat database Harvesting A.K.A. local index A.K.A. vertical search A.K.A. discovery services ?
21
We need a controlled vocabulary! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Metasearch = Federated search = Distributed search = Broadcast search Central index = Local index = Discovery services = Vertical search (if you ever heard anything so dumb)
22
Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Central indexing compared with metasearching: - requires harvesting infrastructure - requires lots of local storage - requires co-operation from services to be harvested - does not have access to all searchable data - will always be somewhat out of date - is faster at search time (or SHOULD be) - allows data to be normalised (e.g. dates extracted) - allows for better relevance ranking - can provide pre-baked facets - may have access to some data that not searchable
23
Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com
24
Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com
25
Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com
26
Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Let's do both!
27
When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search
28
When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search
29
When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search
30
When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! Integrated Search
31
Metasearch hides the complexity When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching
32
Metasearch Nine tenths under The surface When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching
33
Metasearch What you see looks beautiful When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching
34
Problems that need solving When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com A. Problems with pure metasearching B. How those problems change when you add a central index
35
Problems with metasearching When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Examples based on Index Data's suite: Pazpar2 is a free metasearching engine with a stupid name http://indexdata.com/pazpar2/ MasterKey is a non-open suite that wraps it http://indexdata.com/masterkey/ MasterKey is only one way to use Pazpar2 Also integrated into other vendors' UIs.
36
Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI Must be made available via a standard protocol
37
Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI Must be made available via a standard protocol Option 1: build a gateway in Perl http://indexdata.com/simpleserver/
38
Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI Must be made available via a standard protocol Option 1: build a gateway in Perl http://indexdata.com/simpleserver/ Option 2: MasterKey Connect (non-open) http://indexdata.com/connector-framework
39
Problems with metasearching #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Catalogs searchable using ANSI/NISO Z39.50 Support is very nominal in some cases
40
Problems with metasearching #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Catalogs searchable using ANSI/NISO Z39.50 Support is very nominal in some cases IRSpy probes behaviour http://irspy.indexdata.com MasterKey target profiles describe behaviour
41
Problems with metasearching #3: Data servers don't support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com
42
Problems with metasearching #3: Data servers don't support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Pazpar2 does its own relevance ranking (Part of merging/deduplication)
43
Problems with metasearching #4: Data servers don't return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com
44
Problems with metasearching #4: Data servers don't return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Pazpar2 calculates its own facets
45
There is a lot of magic in the magic box Searching Sorting Merging Deduplication Relevance Facet generation Time travel... When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data
46
There is a lot of magic in the magic box Searching Sorting Merging Deduplication Relevance Facet generation Time travel... When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Pazpar2 Data Remember, our engine is free: http://indexdata.com/pazpar2/
47
When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting ! What happens when we add a central index?
48
Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI
49
Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI
50
Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI You can't harvest Google
51
Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Data is often only in a user-facing Web UI You can't harvest Google You just can't
52
Problems with integrated search #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Repositories harvestable using OAI-PMH (an even worse name than pazpar2) Support is very nominal in some cases
53
Problems with integrated search #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Repositories harvestable using OAI-PMH (an even worse name than pazpar2) Support is very nominal in some cases OAI-PMH client must be very tolerant Extensive data-cleaning is usually required
54
Problems with integrated search #3: Central index does support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Returned records carry relevance scores Must be merged with records scored by engine Requires score normalisation into same range Existing ordering may be used in merge
55
Problems with integrated search #3: Central index does support relevance When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Unranked #1 Ranked #1 Ranked #2 Solr Sort Merged Unranked #2 Sort
56
Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Lists of field values with occurrence counts: Author Kernighan 27 Pike 13 Ritchie 7 Thompson 4 Title C 7 Unix 35 Programming 16 Date 1977 5 1978 4 1979 2 1981 2
57
Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Lists are returned or calculated for each server: Server 1 (central index) (all facets from 2000 hits) Cat 68 Dinosaur 162 Fish 145 Frog 19 Server 2 (metasearch) (1000 hits, 100 records) Cat 7 Dog 10 Dinosaur 87 Fish 23
58
Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Metasearched counts normalised by total hit-count Server 1 (central index) (all facets from 2000 hits) Cat 68 Dinosaur 162 Fish 145 Frog 19 Server 2 (metasearch) (normalised to 1000 hits) Cat 70 Dog 100 Dinosaur 870 Fish 230
59
Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Facet lists are merged Servers 1+2 (integrated) (as though for all records in result sets) Cat 68+70 = 138 Dog 0+100 = 100 Dinosaur 162+870 = 1032 Fish 145+230 = 375 Frog 19+0 = 19
60
Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Fringe benefit: facet-count normalisation is also useful when doing pure metasearching. Servers 1+2 (as though for all records in result sets) Cat 68+70 = 138 Dog 0+100 = 100 Dinosaur 162+870 = 1032 Fish 145+230 = 375 Frog 19+0 = 19
61
Summary of search issues When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Issue Metasearch solution Central index solution No data server Build gateways MasterKey Connect --- Bad data server Probe capabilities Profile targets Tolerant harvester Data-cleaning Relevance scores Magic engine Normalise scores Ingest from server Facets Magic engine Normalise counts Ingest from server
62
When worlds collide : metasearching and central indexes Mike Taylor – mike@indexdata.com Magic box Data Searching Data Fat database Harvesting
63
When worlds collide Metasearching meets central indexes Mike Taylor – mike@indexdata.com Index Data – http://indexdata.com/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.