CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2.

CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2

Overall goal  Break down search into essential (necessary) components  Identify issues associated with each component  Facilitate matching of use-cases with functional overview  For a given use-case, identify “critical” components –Those for which there is no known solution –Those for which existing solutions are not performing  Identify use-cases where the model breaks –Repair/extend model –Identify potential « new models »  ----> Prepare Gap Analysis

This analysis tries to be « Media » independant  Functions are media independant –Document discovery –Meta-data extraction –User Interface –.....  Techniques necessary to implement each function are media dependant... –Text extraction –Speech to text –Image signatures –.... ... and are at varying levels of maturity and performance

Top level vision  Search engines come into play when « direct » search into the document repository fails (volume, performance,...)‏ Indexing Matching Documents Data-base Querying

At the core: matching Matching Data-base Query-meta-data Document-meta-data  Matching happens between two « computer based » chunks of data –Query-meta-data, derived from the user input (and his context)‏ –Document-meta-data derived from the documents being searched

The Matching process  Simple or boolean –AND, OR, NEAR, Parentheses, Regular expression,...  Accurate of fuzzy –Spelling, phonetic, « similar to »,...  Typed –Author:xx, Title:xx,...  Centralized/distributed –Across single LAN, across WAN, peer 2 peer,...  Issues –New media types: algorythms –Performance single query response time query throughput

The document side Matching Data-base Content Build Crawl Push Pull D-meta-data Document Transform  The main issue: the « Transform » step –Extracting useful information from the documents

The document side  Document discovery –Pull=crawling, push=OK –Completeness, freshness,  Building the SE data-base –Scalabality, reliability –Incremental –Distributed  Transform: elaborating D-meta-data –Deal with existing meta-data, multi pass process,... –Dealing with multiplicity of content type and formats –For each type, specific meta-data elaboration process  Issue –Algorythm (for each media type)‏ –Performance (relates to document repository size and churn rate)‏

The user side User Results Transform UI Query UI Matching Data-base Q-meta-data Organize  The two main issues –Transforming the user query into Q-meta-data –Organizing the results into manageable form Navigation

The user side  Capturing the « user intent » –The DWIM dream –Providing useful hints (what is « searchable »?)‏  Organizing the results –Assume multiple results, i.e. choice or refinement  Issues –Algorythm (for each media type)‏ –Clustering, structuring, summarizing,... –User Interface (for each terminal type)‏ –Performance (under the ½ sec threshold)‏

Librarian The big picture Intra-doc navigation User Results Transform Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI

The big picture issues  On the document side, acquiring D-meta-data that will speed up the matching process –Performnce trade-off  On the document side, acquiring D-meta-data that will be relevant on the user side –That will fit « naturally » with the potential user queries –That will assist in organizing results into « manageable » form

Librarian Context, personalization User context Content context Intra-doc navigation User Results Transform Query Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Navigation

Librarian A Functional breakdown of Search Engine (it is much more complex)‏ User context Content context Intra-doc navigation User Results Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Transform Corpora

Librarian Search vs Alerts User context Content context Intra-doc navigation User Results Transform Query Stored queries Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Navigation

Librarian Acting on results User context Content context Intra-doc navigation User Results Transform UI Query UI Stored queries Matching Data-base Content Transform Build Crawl Push Pull Document Organize Act User as a “librarian” Q-meta-data D-meta-data Navigation

Some global cross-functional issues  IP, access rights, usage rights,  Security, privacy, …  Business model  Architecture, APIs, standards, …  Software engineering  Scalability

The Research triangle for Search Engines Librarian User context Content context Intra-doc navigation User Results Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Transform

Next steps  Quantify limits associated with each functional component –Main driving parameter (size/churn, user population, media type,...)‏ –Influence on other functional components --> Identify main use-case typology terms  Compare/describe research and industry use-cases according to the proposed functional description –Prepare for gap analysis –Identify expected functional level progress –Identify « mismatch » cases, alternative/complementary models

CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2.

Similar presentations

Presentation on theme: "CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2.

Similar presentations

Presentation on theme: "CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2."— Presentation transcript:

Similar presentations

About project

Feedback