Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Struggling or Exploring? Disambiguating Long Search Sessions
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
A Quality Focused Crawler for Health Information Tim Tang.
Web Query Analysis: A Functional Faceted Classification WING group meeting Nguyen Viet Bang.
Focused Crawling in Depression Portal Search: A Feasibility Study Thanh Tin Tang (ANU) David Hawking (CSIRO) Nick Craswell (Microsoft) Ramesh Sankaranarayana(ANU)
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
ADVISE: Advanced Digital Video Information Segmentation Engine
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Chapter 5 Searching for Truth: Locating Information on the WWW.
1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Overview of Search Engines
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
Livnat Sharabani SDBI 2006 The Hidden Web. 2 Based on: “Distributed search over the hidden web: Hierarchical database sampling and selection” “Distributed.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Algorithmic Detection of Semantic Similarity WWW 2005.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Web Directories: Group 5 Jack Baker Laura Bingham Morgan Stewart.
A Taxonomy of Web Searches Andrei Broder, SIGIR Forum, 2002 Ahmet Yenicag Ceyhun Karbeyaz.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Search engine note. Search Signals “Heuristics” which allow for the sorting of search results – Word based: frequency, position, … – HTML based: emphasis,
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
June 30, 2005 Public Web Site Search Project Update: 6/30/2005 Linda Busdiecker & Andy Nguyen Department of Information Technology.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2007.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
Information Organization: Overview
Artface (Automated reorganization to fit approximate client expectations) Mike Venzke 9/19/2018.
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
Detecting Online Commercial Intention (OCI)
Advanced Techniques for Automatic Web Filtering
Advanced Techniques for Automatic Web Filtering
Information retrieval and PageRank
Searching for Truth: Locating Information on the WWW
<< Advanced Software Agents in Web Mining >>
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Query Type Classification for Web Document Retrieval
Information Organization: Overview
Information Retrieval and Web Design
Presentation transcript:

Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006

What does it mean by “classification of queries by their goals”? A taxonomy by [Rosen & Levinson] – Navigational: locate a specific website Example: “Stanford University” – Informational: find out about a topic Example: “European history” – Resources: find a resource Example: “download Beatles lyrics” Note: there are further sub-categories. Also a similar taxonomy by [Broder]

What’s this research about? An outline: by [Rose and Levinson] (i) Determine a framework to classify queries according to goals (ii) Given queries, find a way to associate the goals determined (i) with the queries. (iii) With the queries being classified in (ii), try to exploit that information to enhance current search engines.

Outline: Problem (i) (i) Determine a classification framework according to goals of users’ queries (a taxonomy by [Rose and Levinson]) (ii) Given queries, find a way to associate the goals determined (i) with the queries. (iii) With the queries being classified in (ii), try to exploit that information to enhance current search engines.

Outline: Problem (ii). Associate the goals with the queries (i) Determine a classification framework according to goals of users’ queries (ii) Given queries, find a way to associate the goals determined (i) with the queries. (iii) With the queries being classified in (ii), try to exploit that information to enhance current search engines.

Outline: Problem (ii). Associate the goals with the queries (1) Manually ask users (present a user interface) (2) Automated classification 2.1. Use others extra information (others than the queries) – Clickthrough data (user click history) [Lee, Liu and Cho] – Link (anchor text distribution) [Lee, Liu and Cho] – Many others features: Distribution of queries, PageRank, mutual information 2.2.Machine learning 2.3. How about looking at the queries only?

An example: click distribution Intuitive: for “navigational”, users tend to click on 1 single result. Algorithm: – Sort the results of a search descending to the number of clicks (yield a distribution) – Calculate a statistics description of the distribution) (for.e.g, mean) – If the mean value > some threshold, classify as “navigational”

Automated classification (contd) Combination of features: yield higher accuracy [Lee, Liu and Cho] Machine learning – Unsupervised (clustering) – Supervised (possibly lack of training data)

Problem (iii): retrieve results after classification Need different strategies for each category [Kang and Kim] Information to analyize: – Content information (the webpage itself) – Link information (topology of links in the web) – URL information (for e.g. to decide whether a webpage is a “root” (site entry) More techniques: boolean combination (“and” or “or”)

Our challenge Try to achieve accurate classification by looking at features of the queries only – POS – Relationship between queries – Features of URL returned by search engines (Meurlin?) Enhance search retrieval