Kevin C. Chang. About the collaboration -- Cazoodle 2 Coming next week: Vacation Rental Search.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013.
Advertisements

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
1 EntityRank: Searching Entities Directly and Holistically Tao Cheng Joint work with : Xifeng Yan, Kevin Chang VLDB 2007, Vienna, Austria.
EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam.
Search Engines and Information Retrieval
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Web Mining Research: A Survey
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
INFO 624 Week 3 Retrieval System Evaluation
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Information Extraction on Real Estate Rental Classifieds Eddy Hartanto Ryohei Takahashi.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Information Retrieval
Overview of Web Data Mining and Applications Part I
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 7-1 Module II Overview PLANNING: Things to Know BEFORE You Start… Why SEM? Goal Analysis.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Bug Localization with Machine Learning Techniques Wujie Zheng
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
The Ohio State University Efficient and Effective Sampling Methods for Aggregation Queries on the Hidden Web Fan Wang Gagan Agrawal Presented By: Venu.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Data Mining for Web Intelligence Presentation by Julia Erdman.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Entity Search Are you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
LOGO 1 Mining Templates from Search Result Records of Search Engines Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hongkun Zhao, Weiyi.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
INFO 330 Forward Engineering Project From User To Info.
1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Learning to Query: Focused Web Page Harvesting for Entity Aspects
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Information Retrieval in Practice
Information Retrieval in Practice
User Modeling for Personal Assistant
Statistical Schema Matching across Web Query Interfaces
Information Organization: Overview
Improving Data Discovery Through Semantic Search

Web Mining Ref:
A Shopping Agent for the WWW
Information Retrieval
Panagiotis G. Ipeirotis Luis Gravano
Web Mining Research: A Survey
Information Organization: Overview
Presentation transcript:

Kevin C. Chang

About the collaboration -- Cazoodle 2 Coming next week: Vacation Rental Search

3 How do you greet people in your culture? What have you been searching lately?

4 The university and areas of Kevin Chang? The of Marc Snir? Customer service phone number of Amazon? What profs are doing databases at UIUC? The papers and presentations of SIGMOD 2007? Due date of SIGMOD 2008? Sale price of “Canon PowerShot A400”? “Hamlet” books available at bookstores?

5 The Web is a Big Library. Huge Supermarket!

6 Queries can be any things, too! Search Engine

7 Are there certain “regularities” to exploit?

Let’s try out… 8

Survey 1: How likely does a query follow a pattern? 9  9 out of 10 samples share a pattern with others!

Survey 2: How likely do queries in a domain follow patterns w.r.t. pre-specified attributes? 10 Over 28,000 manually labeled queries:  Some domains have as high as 90+% patterned queries.

Survey 3: How many patterns are there? 11  Hundreds of patterns needed to cover 80% queries.

Simple concept: What is Query Template ? (this paper) Sequence of keywords and attributes  #celebrity affairs  #category jobs in #location  #movie showtimes in #zipcode  … (In general) Patterns that can be induced from queries  e.g., regular expressions. 12

13 How would such templates be useful?

We advocate Rich Query Interpretation. t = “#category jobs in #location” for Job q = “accounting jobs in chicago” By matching query q to template t: 1) Intent Classifier : recognize intended domain.  q  Job 2) Query Parser : recognize associated attributes.  #category = “accounting”, #location = “chicago” 14

Rich query interpretation is useful. Tailored responses by query patterns: Finding results directly  No longer 10 blue links. Ranking results  Relevant to attributes desired. Dispatching verticals  Bring verticals into search. Matching ads  More likely to click. 15

Query: Finding flights 16

Query: Finding movie showtimes 17

Query: Finding weather 18

But much more patterns can be leveraged! 19

20 Now, how to systematically discover such templates?

Problem: Query Template Discovery Given:  Query log L e.g., we use MSN query log  Domain schema D e.g., ( #category, #location, #title ) with vocabulary. Incomplete schema can be handled, too.  Seed knowledge (queries, sites, templates, or mix) E.g., 5 queries; or 2 sites; or 2 templates. Output: “Good” templates T* = { t1, t2, … }  t1 = #location jobs  t2 = #location #category positions  …….. 21

22 Step 1: Define quality metrics.

How to measure quality of templates? Some templates are more “popular.”  “#city1 #city2”, “#make #model” Some templates are more “accurate.”  “#city1 #city2 flights”, “#location #make used cars” 23  Precision : Recall :

24 Step 2: From seeds, infer templates with good quality.

1) Can P and R be “inferred”? (or, estimated.) Probabilistic Recall: Probabilistic Precision: 25

Sites S s 1 : monster.com s 2 : motorola.com s 3 : us401k.com Queries Q q 1 : jobs in chicago q 2 : jobs in boston q 3 : jobs in microsoft q 4 : jobs in motorola q 5 : marketing jobs in motorola q 6 : 401k plans q 7 : illinois employment statistics Templates T t 1 : jobs in #location t 2 : jobs in #company t 3 : #category jobs in #company t 4 : #location employment statistics t1t1 t2t2 t3t3 t4t4 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 s1s1 s2s2 s3s ) What relationships can we use to infer? LogQST “Quest” Graph

3) How to infer on this graph? Duality of Random Walk: When we walk back and forth, we are inferring precision and recall, respectively. R(t) is forward random walk from seeds. P(t) is backward random walk to seeds. 27

Recall is forward random walk from seeds. 28 t q x IqIq I qt D R0(x)R0(x)  Recall is just like (personalized) PageRank.

Precision is backward random walk to seeds. 29  Precision is harmonic energy minimization. t q x ItIt I qt D P0(x)P0(x)

Experimental results Quest is effective in finding templates by inferred P and R, achieving 90% on actual F-measures. Top results: 30

31 Thank You! And they did the real work… Ganesh AgarwalGovind Kabra