Learning Based Web Query Processing Yanlei Diao Computer Science Department Hong Kong U. of Science & Technology.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Chapter 5: Introduction to Information Retrieval

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Natural Language Processing WEB SEARCH ENGINES August, 2002.

“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS

Search Engines and Information Retrieval

WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.

Aki Hecht Seminar in Databases (236826) January 2009

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.

Information Retrieval in Practice

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.

WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.

Web Mining Research: A Survey

WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.

Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.

Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.

Recommender systems Ram Akella November 26 th 2008.

Information Retrieval

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Overview of Search Engines

1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.

Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.

Search Engines and Information Retrieval Chapter 1.

CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏

Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

Querying Structured Text in an XML Database By Xuemei Luo.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.

The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.

Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.

Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.

4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.

Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.

For: CS590 Intelligent Systems Related Subject Areas: Artificial Intelligence, Graphs, Epistemology, Knowledge Management and Information Filtering Application.

Algorithmic Detection of Semantic Similarity WWW 2005.

Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni

Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Website design and structure. A Website is a collection of webpages that are linked together. Webpages contain text, graphics, sound and video clips.

Post-Ranking query suggestion by diversifying search Chao Wang.

Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

General Architecture of Retrieval Systems 1Adrienn Skrop.

WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

CS246: Information Retrieval

Web Mining Research: A Survey

Presentation transcript:

Learning Based Web Query Processing Yanlei Diao Computer Science Department Hong Kong U. of Science & Technology

Mphil Thesis, Yanlei Diao 2 Outline  Background  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions  Demonstration

Mphil Thesis, Yanlei Diao 3 Searching the Web Want to find a piece of information on the Web? Huge Size Heterogeneity Lack of Structure Diversified User Bases Ever- Changing

Mphil Thesis, Yanlei Diao 4 Search Engines  Maintain indices, keyword input, match input keywords with indices, return relevant documents.  Problems  Large hit lists with low precision. Users find relevant documents by browsing.  URLs but not the required information are returned. Users read the pages for the required information.

Mphil Thesis, Yanlei Diao 5 Web Information Retrieval  IR: Vector-space model, search and browse capabilities  Web IR: Web navigation, indexing, query languages, query-document matching, output ranking, user relevance feedback  Recent Improvement: Hierarchical classification, better presentation of results, hypertext study, metasearching...

Mphil Thesis, Yanlei Diao 6 Web IR for Query Processing Problems  A list of URLs or documents is returned. Users browse a lot to find information.  It asks users for precise query requirements, which is hard for casual users.  It lacks a well-defined underlying model. Vector- space model does not convey as much as Hypertext.  Large hit lists with low precision, rely on input queries

Mphil Thesis, Yanlei Diao 7 Intelligent Agents The agents learn user profiles/models from their search behaviors and employ the knowledge to predict URLs of interest to the user.  Some rely on search engines and heuristics to find targets of a specific type: e.g. papers or homepages  Some help users in an interactive mode: They learn while users are browsing.  Some adaptive agents work autonomously: They use heuristics, recommend pages of interest and take user feedback to improve.

Mphil Thesis, Yanlei Diao 8 Agents for Query Processing Problems  Recommending pages of interest, but not information of interest to the user  Using vector-space model or converting HTML to text documents  Requiring a prior knowledge, such as user profiles, or using heuristics for a particular domain  Not well suited for ad hoc queries

Mphil Thesis, Yanlei Diao 9 Database Approaches  The Web is a directed graph: nodes are Web pages and edges are hyperlinks between pages.  Query languages: 1st generation combines content-based and structure-based queries. 2nd generation accesses structure of Web objects and creates complex objects.  Wrappers and mediators: they present an integrated view of the resources.

Mphil Thesis, Yanlei Diao 10 DB Approaches for Query Processing Problems  Wrapper generation is only feasible for a number of sites in a domain. The Web is growing very fast!  Web query languages require knowledge of the Web sites (content and linkage) and the language syntax. They are hard to use.  Not scalable, good for Web site management but not queries on the entire Web.

Mphil Thesis, Yanlei Diao 11 Our Goal A Web query processing system for any Web users that  processes ad hoc queries on HTML pages  automatically extracts succinct and precise query results ( a result may take the form of a table, a list or a paragraph).  Learn the knowledge for query processing from the User!

Mphil Thesis, Yanlei Diao 12 Proposed Approach An approach with learning capabilities:  Keyword input (probably not precise)  Search engines return a URL list  During browsing, learns from users  to navigate through the web pages  to identify the required information on a web page  Processes the rest URLs automatically  Returns succinct and precise results

Mphil Thesis, Yanlei Diao 13 Unique Features  Returning succinct and precise results, i.e. segments of pages;  No a prior knowledge or preprocessing, suited for ad hoc queries;  exploiting page formatting and linkage information simultaneously, good use of rich information conveyed by HTML.

Mphil Thesis, Yanlei Diao 14 Benefits from Learning  Bridging the gap between keyword input and real query requirements  Capable of navigating in the neighborhoods of documents returned by search engines  Automating the processing of all possibly relevant documents in one query  Almost imperceptible to users, user-friendly

Mphil Thesis, Yanlei Diao 15 Outline  Background  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions  Demonstration

Mphil Thesis, Yanlei Diao 16 Modeling a Web Page  Segment: a group of tag delimited elements, unit in query processing, e.g. paragraph, table, list, nested (atomic segments to the document), Segment Tree  Attributes of a segment  content: text in the scope of the segment  description: summary of the content  Hyperlink: represented as segments to be comparable  content: URL  description: anchor text  associated with the parent segment

Mphil Thesis, Yanlei Diao 17 A Sample … Hotel 1999 Room Rates Guest Room Executive Suite Special Promotion Room Type Single/Double (HK$) Standard 1000 Excutive Suite ac01a.html 2. ac02a.html "Room Type Single /Double (HK$) Standard 1000 Executive Suite 2750" "Special Promotion" & the content of the child table & contents of child paragraph and table "1999 Room Rates" Document Paragraph Content Table List Link Content

Mphil Thesis, Yanlei Diao 18 Modeling a Web Site Ignore backward links, links pointing to themselves, links outside a site. A Web site is modeled as hyperlink-connected segment trees, called Segment Graph. Definition: S ijk : Segment L m :Hyperlin k S 1 S 11 S 12 S 13 S 131 S 2 S 21 S 3 S 31 S 32 S 4 S 41 L 1 L 2 L 3 L 4

Mphil Thesis, Yanlei Diao 19 Knowledge for the Locating Task 1) Exhaustive search simplifies it, but is impractical. 2) Navigation in the graph should terminate if a segment answers the query well enough or conclusion of irrelevancy can be drawn. A decision of following a link or choosing a segment should be made on each page. Segments and links on a page should be comparable! The locating task is to find a segment in the Segment Graph of a site as the query result.

Mphil Thesis, Yanlei Diao 20 Two Types of Knowledge A link conveys description of the pointed page while a queried segment contains both description and the result itself. Segments and links on a page are not comparable by content! Two types of knowledge are needed!  One only concerns descriptive information and helps find the navigational path.  The other checks if a segment meets query requirements on both descriptive information and the result.

Mphil Thesis, Yanlei Diao 21 Navigation Knowledge  concerns descriptive information and helps find the navigational path  a set of (term, weight) pairs  Term: a selected word f the description of segments and links on the navigational path  Weight: indicating the importance of the term in leading to the queried segment

Mphil Thesis, Yanlei Diao 22 Learning Navigation Knowledge Navigational path, (link  )*segment, e.g. L 2  L 4  S 41. Extended navigational path, ((segment  )*link  )* ((segment  )* segment), e.g. (S 1  S 11  L 2 )  (S 3  S 31  L 4 )  (S 4  S 41 ). Step1. Assign a weight to each component on the path, e.g. L 2, S 31, S 41. The closer to the target, the higher the weight. Step2. Assign a weight to each term in the description of a component on the path. The weight of a term can be summed up over navigational paths. The set of (term, weight) pairs is stored into the navigation knowledge base.

Mphil Thesis, Yanlei Diao 23 Classification knowledge  Checks if a segment meets query requirements on both descriptive information and the result.  Cast in the Bayesian learning framework.  Set of triples: ( feature, NP, NN)  Feature: word, integer, real, symbol, …, date, time, address, …, contained in a segment  NP: #occurrences of the feature in positive samples  NN: #occurrences of the feature in negative samples

Mphil Thesis, Yanlei Diao 24 Learning Classification knowledge Count NP and NN accumulatively for each feature over all samples. Store all triples (feature, NP, NN) into the classification knowledge base. The queried segment is a positive sample. All other segments on the same page are negative samples. The content of each segment is parsed into a set of features, either simple and complex types.

Mphil Thesis, Yanlei Diao 25 Query Processing Using Learned Knowledge  After a Web page is retrieved, the segment graph is built  For each segment and link, a score is computed by applying the navigation knowledge (ApplyNavigation).  Segments/links are sorted on the score  If a link has the highest score, the system navigates through the link  If a segment has the highest score, all segments on the page are checked to see if there is a queried segment  The process is repeated until either a segment is found or conclusion can be made that the site does not contain queried information.

Mphil Thesis, Yanlei Diao 26 Locating Algorithm On each page, if the result is not found: choosing an unprocessed component with highest score: if a link is chosen  if a segment is chosen Definition: S ijk : Segment L m :Hyperlin k S 1 S 11 S 12 S 13 S 131 S 2 S 21 S 3 S 31 S 32 S 4 S 41 L 1 L 2 L 3 L 4

Mphil Thesis, Yanlei Diao 27 Locating Algorithm On each page, if the result is not found: choosing an unprocessed component with highest score: if a link is chosen if a segment is chosen  (ApplyClassification) Definition: S ijk : Segment L m :Hyperlin k S 1 S 11 S 12 S 13 S 131 S 2 S 21 S 3 S 31 S 32 S 4 S 41 L 1 L 2 L 3 L 4

Mphil Thesis, Yanlei Diao 28 Applying Learned Knowledge  Application of Navigation Knowledge :  extracts terms in the description of a link/segment  reads the weights of the terms and assigns a score to the link/segment by a certain function (max currently)  sorts all links and segments by their scores  Application of Classification Knowledge :  computes the confidence C to classify a segment as the queried result  chooses the segment on a page with the largest C. If the largest C is over a threshold, returns the segment

Mphil Thesis, Yanlei Diao 29 Hotel 2 Hotel 1 3 done forward User browses it!

Mphil Thesis, Yanlei Diao 30 User clicks here!

Mphil Thesis, Yanlei Diao 31 Room information User marks it!

Mphil Thesis, Yanlei Diao 32 Generating Navigation Knowledge  The navigation path looks like: Hotel Reservation->single hk$ double hk$ standard room deluxe room +executive room  By our weighting scheme, a weight is assigned to each term

Mphil Thesis, Yanlei Diao 33 Generating Classification Knowledge  Training Samples  Occurrences of each feature are counted Positive single hk$double hk$ standard room , deluxe room 1, , executive room 1, , Negative Holiday Inn Golden Mile In the heart of Tsim Sha Tsui - Kowloon, Holiday Inn Golden Mile is your number one choice for accommodation, dining, meetings and banquets. Ideally situated in the heart of...

Mphil Thesis, Yanlei Diao 34 back Fact starts here!

Mphil Thesis, Yanlei Diao 35

Mphil Thesis, Yanlei Diao 36 Applying Navigation Knowledge The page contains Navigation knowledge shows Links Main Features & Services Dining and Banqueting Hotel Rates Reservation... Paragraph Lockhart Road, Wanchai, Hong Kong, SAR, PRC Paragraph Located in the hub of Wanchai, the Wharney Hotel is within walking distance of the Hong Kong Arts Centre, Convention and Exhibition Centre, busy commercial complexes and shopping malls.... Paragraph TEL: (852) FAX: (852)

Mphil Thesis, Yanlei Diao 37 Fact chooses it! Current Navigation Knowledge assigns scores

Mphil Thesis, Yanlei Diao 38 Table: Paragraph: 3.0 Paragraph: 0.25 List: 0.25 VisitedCurrent Navigation Knowledge assigns scores

Mphil Thesis, Yanlei Diao 39 C=6.3e-008 C= C=2.5e-007 C=1.0 Apply Classification Knowledge to all Segments C= Classification Knowledge computes confidence

Mphil Thesis, Yanlei Diao 40 Fact finds it!

Mphil Thesis, Yanlei Diao 41 Outline  Background  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions  Demonstration

Mphil Thesis, Yanlei Diao 42 A Query Processing System A learning based query processing system:  User Interface : accepts user queries, presents query results, a browser capable of capturing user actions  Query Analyzer : analyzes and transforms user queries  Session Controller : coordinates learning and locating  Learner: generates knowledge from captured user actions  Locator: applies knowledge and locates query results  Retriever & Parser: retrieves pages and parses to trees  Knowledge Base: stores learned knowledge

Mphil Thesis, Yanlei Diao 43 Reference Architecture Session Controller Locator Search Engine Web User Interface Knowledge Base Learner Query Analyzer Retriever & Parser User

Mphil Thesis, Yanlei Diao 44 A Query Session Session Controller Training Strategy Segment Graph Result Buffer Knowledge Base User Actions Query results Checking URLs Locating Process Locator Query Result Presenter Learning Process Learner Browser Scripts

Mphil Thesis, Yanlei Diao 45 Training Strategies  Sequential  First n sites: user browses and system learns  Next N-n sites: system processes  Random  Randomly choose n sites: user browses and system learns  the system processes the rest  Interleaved  First n 0 sites, user browses and system learns  Next n - n 0 site, system makes decision. For incorrect ones, user browses and system re-learns  Next N-n sites: system processes

Mphil Thesis, Yanlei Diao 46 Outline  Background  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions  Demonstration

Mphil Thesis, Yanlei Diao 47 System Evaluation  System Capabilities  Performance  Effectiveness: precision, recall, correctness  Efficiency: in a site, how many pages the system visits to find a result or to recognize the irrelevancy  Training efficiency: how many training samples are needed  Key Issues  Effectiveness of the knowledge  Effectiveness of training strategies  Tests on A Range of Queries

Mphil Thesis, Yanlei Diao 48 A System Output Sample

Mphil Thesis, Yanlei Diao 49 System Capabilities  The system returns segments of the Web pages  The segments may not contain any input keyword but meet the requirement of room rates.  The system learned the query requirement from the user!  Segments can be from pages whose URLs are not directly returned by Yahoo!.  The system learned how to follow the hyperlinks to the queried segment!

Mphil Thesis, Yanlei Diao 50 System Evaluation - Effectiveness  Given a set of URLs in a query session, the system makes N decisions N =N1 + N2 + N3 + N4 Precision = N1 / (N1+N3), Recall= N1 / # sites that contain results, Correctness = (N1+N2) / N.

Mphil Thesis, Yanlei Diao 51 System Evaluation - Efficiency  How efficiently the system finds a queried segment in a site? Level of a Queried Segment = the length of the shortest path to find it Absolute Path length = # Visited pages, Relative Path Length = # Visited pages / Level of the Queried Segment.

Mphil Thesis, Yanlei Diao 52 Basic Performance Q 11 : Hong Kong Hotel Room Rate Q 12 : Hong Kong Hotel Sequential training

Mphil Thesis, Yanlei Diao 53 Effectiveness of Knowledge Other two systems implemented for comparison  Classification Knowledge Only: treat links and segments the same by the Bayes classifier  Learning  Locating Actionpositive negative click a linkthe link other links on the page mark a segmentthe segment other segments on the page Classify all segments and links If a link has the highest confidence, follow the link; If a segment has the highest confidence and passes the threshold, return it.

Mphil Thesis, Yanlei Diao 54 Effectiveness of Knowledge  Navigation Knowledge Only: only checks the descriptive information of links and segments  Learning  Locating Navigational path  Navigation Knowledge Assigns scores to all links and segments using navigation knowledge If a link has the highest score, follow the link; If a segment has the highest score, return it.

Mphil Thesis, Yanlei Diao 55 Effectiveness of Knowledge Only works for results on the first page Bad filtering capability! Navigation only checks description, nearly not workable Poor navigation capability!

Mphil Thesis, Yanlei Diao 56 Effects of Training Strategies Query Q 12 Training Size 3-10

Mphil Thesis, Yanlei Diao 57 Effects of Training Strategies  Random training performs badly, low in recall  As the training size increases, interleaved training outperforms sequential training  Best accuracy reaches or exceeds 90% in all metrics when the interleaved training strategy is used  Enlarging the training size for random and sequential training is not effective

Mphil Thesis, Yanlei Diao 58 Improved Performance Interleaved training

Mphil Thesis, Yanlei Diao 59 A Range of Queries  Hotel room rates: targets at prices, easy to identify  Admission requirements on graduate student: includes items such as degree, GPA, GRE, etc. that are not easy to specify in keywords but easy to show by marking  Data Mining Researcher: concept, subjective, evidence including research interests, projects, professional activity, etc

Mphil Thesis, Yanlei Diao 60 Results of A Range of Queries Interleaved training More precise

Mphil Thesis, Yanlei Diao 61 Performance for the Queries  Effectiveness  first 4 queries: accuracy is 80% to above 90%  the last query: still capable of filtering out irrelevant sites  Efficiency  relative path length to locate a queried segment is close to 1  absolute path length to conclude irrelevancy is no more than 2.5 pages.  The performance is not affected much by how precise the keyword query is. The system learns query requirements

Mphil Thesis, Yanlei Diao 62 Outline  Background  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions  Demonstration

Mphil Thesis, Yanlei Diao 63 Conclusions  Proposed and implemented learning based Web query processing with the following features  Returning succinct results: segments of pages;  No a prior knowledge or preprocessing, suited for ad hoc queries;  exploiting page formatting and linkage information simultaneously.  The preliminary results are promising

Mphil Thesis, Yanlei Diao 64 Future Work  Better segmentation for HTML documents  Better knowledge, key factor that affects system performance  other weighting schemes for navigation knowledge  other implementation of classification knowledge  More system evaluation  Dynamic web pages

Mphil Thesis, Yanlei Diao 65 Outline  Background  Learning Based Web Query Processing  FACT: A Prototype System  Preliminary System Evaluation  Conclusions  Demonstration

Mphil Thesis, Yanlei Diao 66 Demonstration