CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Cloud platforms Lead to Open and Universal access for people with Disabilities and for All WP Federating repositories of Solutions.
Cloud platforms Lead to Open and Universal access for people with Disabilities and for All WP Federating repositories of Solutions.
Chapter 7 Structuring System Process Requirements
GLOCAL Event-based Retrieval of Networked Media NEM Concertation Meeting Brussels, Feb
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Information Retrieval in Practice
Search Engines and Information Retrieval
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Requirements Specification
Chorus cluster meeting, Vilamoura April SAPIR Search in Audio-visual content using P2p IR Yosi Mass, Raul Santos.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Information Retrieval in Practice
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Real-time and Retrospective Analysis of Video Streams and Still Image Collections using MPEG-7 Ganesh Gopalan, College of Oceanic and Atmospheric Sciences,
Analysis Concepts and Principles
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
DISTRIBUTED DATABASE MANAGEMENT SYSTEM CHAPTER 07.
Overview of Search Engines
MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer.
What we learned while building DLESE Katy Ginger Metadata Architect, Meteorologist, Instructional Designer.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Search Engines and Information Retrieval Chapter 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
1 Software Design Reference: Software Engineering, by Ian Sommerville, Ch. 12 & 13, 5 th edition and Ch. 10, 6 th edition.
Approaching a Problem Where do we start? How do we proceed?
By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
GRASP: Designing Objects with Responsibilities
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Search Engine Architecture
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
Software Design: Principles, Process, and Concepts Getting Started with Design.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Data Management Managing Big Data Briefing 10/2012 Will Graves US-VISIT Chief Biometric engineer Chair of Biometric Domain.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
SAPIR Search in Audio-Visual Content using P2P Information Retrival For more information visit: Support.
Extracting value from grey literature Processes and technologies for aggregating and analysing the hidden Big Data treasure of the organisations.
Chapter : 9 Architectural Design
Indexing The World Wide Web: The Journey So Far Abhishek Das, Ankit Jain 2011 Paper Presentation : Abhishek Rangnekar 1.
Information Retrieval in Practice
Information Retrieval in Practice
Search Engine Architecture
Information Day on “Search Engines for Audio-Visual Content”
Proposal for Term Project
Search Engine Architecture
Chapter 18 MobileApp Design
Personalized Social Image Recommendation
Data Warehouse.
Time Format for a New Label
Search Engine Architecture
Information Retrieval and Web Design
Lab 2: Information Retrieval
Presentation transcript:

CHORUS What is « Search » A functional view Henri Gouraud WP2

Overall goal  Break down search into essential (necessary) components  Identify issues associated with each component  Facilitate matching of use-cases with functional overview  For a given use-case, identify “critical” components –Those for which there is no known solution –Those for which existing solutions are not performing  Identify use-cases where the model breaks –Repair/extend model –Identify potential « new models »  ----> Prepare Gap Analysis

This analysis tries to be « Media » independant  Functions are media independant –Document discovery –Meta-data extraction –User Interface –.....  Techniques necessary to implement each function are media dependant... –Text extraction –Speech to text –Image signatures –.... ... and are at varying levels of maturity and performance

Top level vision  Search engines come into play when « direct » search into the document repository fails (volume, performance,...)‏ Indexing Matching Documents Data-base Querying

At the core: matching Matching Data-base Query-meta-data Document-meta-data  Matching happens between two « computer based » chunks of data –Query-meta-data, derived from the user input (and his context)‏ –Document-meta-data derived from the documents being searched

The Matching process  Simple or boolean –AND, OR, NEAR, Parentheses, Regular expression,...  Accurate of fuzzy –Spelling, phonetic, « similar to »,...  Typed –Author:xx, Title:xx,...  Centralized/distributed –Across single LAN, across WAN, peer 2 peer,...  Issues –New media types: algorythms –Performance single query response time query throughput

The document side Matching Data-base Content Build Crawl Push Pull D-meta-data Document Transform  The main issue: the « Transform » step –Extracting useful information from the documents

The document side  Document discovery –Pull=crawling, push=OK –Completeness, freshness,  Building the SE data-base –Scalabality, reliability –Incremental –Distributed  Transform: elaborating D-meta-data –Deal with existing meta-data, multi pass process,... –Dealing with multiplicity of content type and formats –For each type, specific meta-data elaboration process  Issue –Algorythm (for each media type)‏ –Performance (relates to document repository size and churn rate)‏

The user side User Results Transform UI Query UI Matching Data-base Q-meta-data Organize  The two main issues –Transforming the user query into Q-meta-data –Organizing the results into manageable form Navigation

The user side  Capturing the « user intent » –The DWIM dream –Providing useful hints (what is « searchable »?)‏  Organizing the results –Assume multiple results, i.e. choice or refinement  Issues –Algorythm (for each media type)‏ –Clustering, structuring, summarizing,... –User Interface (for each terminal type)‏ –Performance (under the ½ sec threshold)‏

Librarian The big picture Intra-doc navigation User Results Transform Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI

The big picture issues  On the document side, acquiring D-meta-data that will speed up the matching process –Performnce trade-off  On the document side, acquiring D-meta-data that will be relevant on the user side –That will fit « naturally » with the potential user queries –That will assist in organizing results into « manageable » form

Librarian Context, personalization User context Content context Intra-doc navigation User Results Transform Query Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Navigation

Librarian A Functional breakdown of Search Engine (it is much more complex)‏ User context Content context Intra-doc navigation User Results Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Transform Corpora

Librarian Search vs Alerts User context Content context Intra-doc navigation User Results Transform Query Stored queries Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Navigation

Librarian Acting on results User context Content context Intra-doc navigation User Results Transform UI Query UI Stored queries Matching Data-base Content Transform Build Crawl Push Pull Document Organize Act User as a “librarian” Q-meta-data D-meta-data Navigation

Some global cross-functional issues  IP, access rights, usage rights,  Security, privacy, …  Business model  Architecture, APIs, standards, …  Software engineering  Scalability

The Research triangle for Search Engines Librarian User context Content context Intra-doc navigation User Results Query Navigation Matching Data-base Content Transform Build Crawl Push Pull Document Organize Q-meta-data D-meta-data UI Transform

Next steps  Quantify limits associated with each functional component –Main driving parameter (size/churn, user population, media type,...)‏ –Influence on other functional components --> Identify main use-case typology terms  Compare/describe research and industry use-cases according to the proposed functional description –Prepare for gap analysis –Identify expected functional level progress –Identify « mismatch » cases, alternative/complementary models