HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1
Keyword Search 2
Form Based Search 3
4 over 8,000 meters in elevationless than 100K milesfaster than 100 mph What about?
5
HyKSS Hybrid Keyword and Semantic Search Semantics – extracted annotations – Multiple ontologies Keywords – text 6
Thesis Statement HyKSS (hybrid search) – Outperforms keyword and semantic search – Dynamic query weighting outperforms various other hybrid search approaches – Allows queries over multiple ontologies – Allows pay-as-you-go improvement 7
Extraction Ontologies 8
Data Frames 9
Indexing Architecture 10 Keyword IndexerSemantic Indexer Keyword IndexSemantic Index Document Collection
Indexing Architecture Implementation 11 Keyword Indexer Semantic Indexer Keyword Index Semantic Index Document Collection OntoES Ontology Library Sesame Lucene
Query Processing 12 Free Form Query Execute Query Post-Process Query Combine Results Pre-Process Query Execute Query Post-Process Query Pre-Process Query Keyword ProcessingSemantic Processing
Keyword Query Pre-Processing 13 Remove Lucene special characters (except quotes) Remove (inequality) comparison constraints Remove non-phrase stopwords hondas in "excellent condition" in orem for under 12 grand hondas “excellent condition” orem
Keyword Query Execution and Post-Processing Executed by Lucene Empty Post-Processing step 14
Semantic Query Pre-Processing Individual Ontology Scoring hondas in "excellent condition" in orem for under 12 grand 15
Semantic Query Pre-Processing Ontology Set Creation For each ontology sorted by score: – For each remaining ontology: Add point for each new or subsuming match If added points > 0 add ontology Completely subsumed ontologies are removed during query generation 16
Semantic Query Pre-Processing Ontology Set Creation 17 Price < LocationVehicle ContractualServices Location Vehicle Contractual Services Vehicle_Score + 1 US_City=“orem” Price < Price < ContractualServices_Score + 1 Vehicle_Score US_City=“orem”
Semantic Query Pre-Processing Structured Query Generation Open world assumption SPARQL query 18
Semantic Query Execution and Post-Processing Sesame query execution Semantic ranking: – 1 point for each requested projection satisfied – Normalized by # of projections requested hondas in "excellent condition" in orem for under 12 grand – Projections on Make, Price and US_City 19
Hybrid Query Processing Linear interpolation: – (kw_weight * kw_score) + (sm_weight * sm_score) Dynamic solution: – # keywords remaining (#kw) – concept match score (cms) = ½ * (selections + projections) – kw_weight = #kw/(#kw + cms) – sm_weight = cms/(#kw + cms) 20
Basic Search 21
Results Display 22
23 Form Based Search
Results Display
Experimental Setup – Ontology Libraries 5 Ontology Levels – Number – Generic Units – Vehicle Units – Vehicle – Vehicle+ 25
Experimental Setup – Query Sets 113 syntactically unique queries from database students 60 syntactically unique queries from linguistic students 26
Experimental Setup – Document Collection 250 vehicle advertisements (Craigslist) – 100 training, 50 validation, 100 test 318 mountain pages (Wikipedia) 66 roller coaster (Wikipedia) 88 video game advertisements (Craigslist) 27
Experiments 1)Training queries over test vehicle documents 2)Test queries over test vehicle documents 3)Training queries over test vehicle documents + additional noise 4)Test queries over test vehicle documents + additional noise 5)5 queries over noisy data (Generic Units only) 28
Experiments - Metric Mean Average Precision 29
Experimental Results 30
Experimental Results 31
Experimental Results 32
Conclusions Hybrid search outperforms keyword and semantic search HyKSS’s dynamic query weighting approach outperforms various other weighting techniques Using multiple does not outperform selecting and using a single ontology 33
External Image Citations Slide 2 Google search screenshot: (07/30/11) Slide 3 partial car search form screenshots: (07/30/11) Slide 4 mountain image: (04/26/11) Slide 4 car image: (04/26/11) Slide 4 roller coaster image: (04/26/11) Slide 4 Wikipedia logo: (04/26/11) Slide 4 craigslist logo: (04/26/11) 34