“Artificial Intelligence” in Database Querying Dept. of CSE Seung-won Hwang.

Slides:



Advertisements
Similar presentations
XML DOCUMENTS AND DATABASES
Advertisements

Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 14 The Second Component: The Database.
1 CS 430: Information Discovery Lecture 2 Introduction to Text Based Information Retrieval.
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Overview of Web Data Mining and Applications Part I
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Multimedia Databases (MMDB)
Chapter 1 Introduction to Data Mining
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Vector Space Models.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Office automation Office automation has changed the equipments and work habits of today’s end users and work groups None will be interested to work in.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Information Organization: Overview
Search Engine Architecture
Text Based Information Retrieval
CS 430: Information Discovery
Artificial Intelligence Techniques
אחזור מידע, מנועי חיפוש וספריות
Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018.
Data Mining Chapter 6 Search Engines
CSE 635 Multimedia Information Retrieval
Information Organization: Overview
Information Retrieval and Web Design
Information Retrieval and Web Design
Recuperação de Informação
Introduction to Search Engines
Presentation transcript:

“Artificial Intelligence” in Database Querying Dept. of CSE Seung-won Hwang

Why do you need to ace this class? “producing machines to automate tasks requiring intelligent behavior” (wikipedia) AI techniques are highly relevant to many research fields, including database

More obvious applications

But…

Crash course on DB SQL queries select * from cars where color=‘red’ and type=‘convertible’ and brand=`hyundai’

Crash course on DB Deciding the most efficient execution plan among:  hyundai->red->convertible?  red->convertible->hyundai?  convertible->hyundai->red? …… Depends on data structures (B+-tree), data distributions, … However, all these efforts are useless efforts, if no object qualifies

Our strength

Internet shopping, web bulletin board, cyworld, … You are sending SQL queries without you knowing (at least until you see DB errors) DBMS is optimizing your query for you without you knowing

Our weakness But do you use DBMS for managing your word files, photos, etc.. What do you use?  File system (Browsing)  Google desktop (Searching) SQL semantics is too strict  No red hyundai convertible! Or too many red hyundai elantra?

While Google makes $$$ for

Giving “Artificial Intelligence” What are the intelligent behaviors expected?  Suggesting alternatives: Red hyundai Red convertible Orange convertible What are the possible automation?  Deciding Red hyundai < Red convertible

But how? Any idea? Underspecified/Overspecified Queries GAP

[S1] Borrowing wisdom from data (as google does) Useful for both too many or empty results

Text ranking tf (term frequency): how often query term appears in document idf (inverse document frequency): how rare query term is in document collection hyundai red convertible red convertible high tf cars.com low idf red

Applying to database brandidfcoloridf hyundai0.5black0.1 BMW0.8red0.4 kia0.3purple0.9 Red hyundai = 0.9 Red honda = 0.4 Black hyundai = 0.8

What is the assumption? Rare items are preferred Can you think of exceptions?  ‘purple pony’ vs. ‘purple lexus’ How can we handle this problem?

[S2] Borrowing wisdom of other users

Query frequency Keyword frequency in prior queries  Eg., car=‘BMW’ appearing in 50% of prior queries Summing up, we can highly rank cars that are heavily queried before and rare in stocks

[S3] Borrowing wisdom from domain knowledge

Example 1: color (a) (b) (c)(d) (e)

Example 2: shape=‘retro’

[S4] Borrowing wisdom from specific user Notion of similarity significantly differs across users Shape? A B C

You cannot expect users to describe (or machine to understand) explicit explanation like I want a photo of a building similar to eiffel tower in terms of shape, but not in terms of the overall shape, but in terms of the shape of the steel material…………..

Mindreader? (mediabakery.com)

In our car search example You can show ‘red bmw’ and ‘hyundai sedan’ Based on user response (or clicks), you can figure out which is more important factors, e.g., color Then you can show more red cars to figure out further on preference on brands

Summing up You need to bridge the gap between SQL and ideal results, by collecting/analyzing as much as information available from data, prior users, user himself/herself, … Implicitly and automatically

Another implicit info to think about Tagging frequency ranking/ automatic classification?

Summary Networks enables access to a large amount of user created contents/info “Web 2.0”  (interesting web 2.0 video) Intelligent retrieval techniques is the key in new era  Ranking  Classification I will then show how AI techniques (that you already know!) got me a PhD in intelligent retrieval research  Rank Formulation: machine learning  Rank/Classification Processing : best first search, hill climbing

Q&A