Download presentation
Presentation is loading. Please wait.
Published byBernadette Merritt Modified over 9 years ago
1
CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2
2
Overall goal Break down search into essential (necessary) components Identify issues associated with each component Facilitate matching of use-cases with functional overview For a given use-case, identify “critical” components –Those for which there is no known solution –Those for which existing solutions are not performing Identify use-cases where the model breaks –Repair/extend model –Identify potential « new models » ----> Prepare Gap Analysis
3
This analysis tries to be « Media » independant Functions are media independant –Document discovery –Meta-data extraction –User Interface –..... Techniques necessary to implement each function are media dependant... –Text extraction –Speech to text –Image signatures –.... ... and are at varying levels of maturity and performance
4
Top level vision Search engines come into play when « direct » search into the document repository fails (volume, performance,...) Indexing Matching Documents Data-base Querying
5
At the core: matching Matching Data-base Query-meta-data Document-meta-data Matching happens between two « computer based » chunks of data –Query-meta-data, derived from the user input (and his context) –Document-meta-data derived from the documents being searched
6
The Matching process Simple or boolean –AND, OR, NEAR, Parentheses, Regular expression,... Accurate of fuzzy –Spelling, phonetic, « similar to »,... Typed –Author:xx, Title:xx,... Centralized/distributed –Across single LAN, across WAN, peer 2 peer,... Issues –New media types: algorythms –Performance single query response time query throughput
7
The document side Matching Data-base Content Build Crawl Push Pull D-meta-data Document Transform The main issue: the « Transform » step –Extracting useful information from the documents
8
The document side Document discovery –Pull=crawling, push=OK –Completeness, freshness, Building the SE data-base –Scalabality, reliability –Incremental –Distributed Transform: elaborating D-meta-data –Deal with existing meta-data, multi pass process,... –Dealing with multiplicity of content type and formats –For each type, specific meta-data elaboration process Issue –Algorythm (for each media type) –Performance (relates to document repository size and churn rate)
9
The user side User Results Transform UI Query UI Matching Data-base Q-meta-data Organize The two main issues –Transforming the user query into Q-meta-data –Organizing the results into manageable form Navigation
10
The user side Capturing the « user intent » –The DWIM dream –Providing useful hints (what is « searchable »?) Organizing the results –Assume multiple results, i.e. choice or refinement Issues –Algorythm (for each media type) –Clustering, structuring, summarizing,... –User Interface (for each terminal type) –Performance (under the ½ sec threshold)
11
Librarian The big picture Intra-doc navigation User Results Transform Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI
12
The big picture issues On the document side, acquiring D-meta-data that will speed up the matching process –Performnce trade-off On the document side, acquiring D-meta-data that will be relevant on the user side –That will fit « naturally » with the potential user queries –That will assist in organizing results into « manageable » form
13
Librarian Context, personalization User context Content context Intra-doc navigation User Results Transform Query Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Navigation
14
Librarian A Functional breakdown of Search Engine (it is much more complex) User context Content context Intra-doc navigation User Results Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Transform Corpora
15
Librarian Search vs Alerts User context Content context Intra-doc navigation User Results Transform Query Stored queries Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Navigation
16
Librarian Acting on results User context Content context Intra-doc navigation User Results Transform UI Query UI Stored queries Matching Data-base Content Transform Build Crawl Push Pull Document Organize Act User as a “librarian” Q-meta-data D-meta-data Navigation
17
Some global cross-functional issues IP, access rights, usage rights, Security, privacy, … Business model Architecture, APIs, standards, … Software engineering Scalability
18
The Research triangle for Search Engines Librarian User context Content context Intra-doc navigation User Results Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Transform
19
Next steps Quantify limits associated with each functional component –Main driving parameter (size/churn, user population, media type,...) –Influence on other functional components --> Identify main use-case typology terms Compare/describe research and industry use-cases according to the proposed functional description –Prepare for gap analysis –Identify expected functional level progress –Identify « mismatch » cases, alternative/complementary models
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.