FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Chapter 5: Introduction to Information Retrieval
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Search Engines and Information Retrieval
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Learning Based Web Query Processing Yanlei Diao Computer Science Department Hong Kong U. of Science & Technology.
A Topic Specific Web Crawler and WIE*: An Automatic Web Information Extraction Technique using HPS Algorithm Dongwon Lee Database Systems Lab.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
Databases & Data Warehouses Chapter 3 Database Processing.
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Presenter: Shanshan Lu 03/04/2010
Restricted Search Engine Laurent Balat Christophe Decis Thomas Forey Sebastien Leclercq ESSI2 Project Supervisor: Johny BOND June 2002.
Search Engine Architecture
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
Design a full-text search engine for a website based on Lucene
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Internet Research – Illustrated, Fourth Edition Unit A.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
People and Families of the Bible Nathan Friedly. Overview Introduction Key Ideas Description and use Deliverables Demonstration Conclusion.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
A Mixed-Initiative System for Building Mixed-Initiative Systems Craig A. Knoblock, Pedro Szekely, and Rattapoom Tuchinda Information Science Institute.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Data mining in web applications
Search Engine Architecture
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Data Mining Chapter 6 Search Engines
The Ultimate MP3 Search Engine for the New Millennium
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Search Engine Architecture
Web Mining Research: A Survey
Presentation transcript:

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University

Demonstration, SIGMOD Outline  Introduction  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions

Demonstration, SIGMOD How Do We Query the Web?  Use a search engine  Form query key words  An example: Find room rates of hotels in Hong Kong  used search engine  keywords: Hong Kong+hotel

Demonstration, SIGMOD Hotel 2 Hotel 1 3 done forward Look at the Number!

Demonstration, SIGMOD Query the Web -- Current Situation  Search engines return a long list of URLs. User is required to browse the web pages to find the information.  The information required is often not on the returned page -- navigation through hyperlinks is often required (those links may or may not that obvious).  The target information is in different forms (paragraphs, lists, tables …)  A lot of web pages to be browsed Are we happy with this?

Demonstration, SIGMOD Efforts to Improve the Situation  Search engines  better index, improve precision/recall, metasearch engines, better presentation of results, ….  IR techniques to Web  document clustering/indexing, better model, similarity functions, documents ranking,...  Intelligent agent  user profiling, hyperlink recommendation,...  Database approach  wrappers, query languages, …

Demonstration, SIGMOD Our Dream  Querying the Web as easy as querying a relational database  SQL query returns a table of hotel prices SELECT room rates FROM web.hotel WHERE city = “hong kong”  May remain a dream for a while :-(

Demonstration, SIGMOD A Practical goal  Use keywords to express query requirements  simple, no need to know schema of data  inaccurate  Relieve users from tedious browsing as much as possible  Not URLs, not Web sites, even not Web pages  Present query results to users as accurate and concise as possible  Tables, lists, paragraphs, … containing user required information

Demonstration, SIGMOD Query Results -- Queried Segments  Return query results as accurate and concise as possible.  Basic idea:  Breaking a Web page into segments: a row in a table, a table, an item in a list, a list, a paragraph,  returning only queried segments to users queried segments : segments that contain the information the user is interested in.

Demonstration, SIGMOD Outline  Introduction  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions

Demonstration, SIGMOD Learning Based Query Processing  The fundamental difficulties in Web query processing:  Web is a huge, ever growing, heterogeneous, semi-structured data source  Most users of Web are naïve users issuing ad hoc queries  Learn the knowledge for query processing from the User!

Demonstration, SIGMOD A Learning Based Technique  Learn from the user when he browses from the first few URLs  to navigate through the web pages  to identify the required information in a web page  Process the rest URLs automatically and retrieve queried segments

Demonstration, SIGMOD Hotel 2 Hotel 1 3 done forward User browses it!

Demonstration, SIGMOD Back User clicks here!

Demonstration, SIGMOD Room information User marks it!

Demonstration, SIGMOD back Fact starts here!

Demonstration, SIGMOD roomrates Fact chooses it!

Demonstration, SIGMOD xxx Fact finds it!

Demonstration, SIGMOD Outline  Introduction  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions

Demonstration, SIGMOD A Query Processing System A learning based query processing system:  User Interface: accepts user queries, presents query results, a browser capable of capturing user actions  Query Analyzer: analyzes and transforms user queries  Session Controller: coordinates learning and locating  Learner: generates knowledge from captured user actions  Locator: applies knowledge and locates query results  Crawler & Parser: retrieves pages and parses to trees  Knowledge Base: stores learned knowledge

Demonstration, SIGMOD Reference Architecture Session Controller Locator Search Engine Web User Interface Knowledge Base Learner Query Analyzer Crawler & Parser User

Demonstration, SIGMOD A Query Session Session Controller Training Strategy Segment Graph Result Buffer Knowledge Base User Actions Query results Checking URLs Locating Process Locator Query Result Presenter Learning Process Learner Browser Scripts

Demonstration, SIGMOD Training Strategies  Sequential  First n sites: user browses and system learns  Next N-n sites: system processes  Random  Randomly choose n sites: user browses and system learns  the system processes the rest  Interleaved  First n 0 sites, user browses and system learns  Next n - n 0 site, system makes decision. For incorrect ones, user browses and system re-learns  Next N-n sites: system processes

Demonstration, SIGMOD Outline  Introduction  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions

Demonstration, SIGMOD System Evaluation  Functionality  Performance  precision, recall, correctness  efficiency: in a site, how many pages the system visits to find a result  training efficiency: how many training samples are needed  User interface

Demonstration, SIGMOD

Demonstration, SIGMOD System Evaluation - Effectiveness  Given a set of keywords, the system makes N decisions N =N1 + N2 + N3 + N4 Precision = N1 / (N1+N3), Recall= N1 / # relevant sites, Correctness = (N1+N2) / N.

Demonstration, SIGMOD System Evaluation - Efficiency  How efficiently the system finds a queried segment in a site? Level of a Queried Segment = the length of the shortest path to find it Absolute Path length = # Crawled pages, Relative Path Length = # Crawled pages / Level of the Queried Segment.

Demonstration, SIGMOD Basic Performance Q 11 : Hong Hong Hotel Room Rate Q 12 : Hong Kong Hotel Sequential training

Demonstration, SIGMOD Query Q 12 Effects of training Strategies

Demonstration, SIGMOD Improved Performance Interleaved training

Demonstration, SIGMOD Outline  Introduction  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions

Demonstration, SIGMOD Conclusions  Proposed and implemented learning based Web query processing with the following features  Returning succinct results: segments of pages;  No a prior knowledge or preprocessing, suited for ad hoc queries;  exploiting page formatting and linkage information simultaneously.  The preliminary results are promising

Demonstration, SIGMOD Future Work  Better knowledge  key factor that affects system performance  Dynamic web pages ?  Integrating results from another project  System evaluation  Prototype  product  dot com company $$$ ???