CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.

Slides:



Advertisements
Similar presentations
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
©2012 Paula Matuszek CSC 9010: Text Mining Applications: Document Clustering l Dr. Paula Matuszek l
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Chapter 7: Text mining UIC - CS 594 Bing Liu 1 1.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
CS347 Lecture 8 May 7, 2001 ©Prabhakar Raghavan. Today’s topic Clustering documents.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
CS347 Lecture 10 May 14, 2001 ©Prabhakar Raghavan.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS347 Lecture 9 May 11, 2001 ©Prabhakar Raghavan.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Introduction to machine learning
Clustering Unsupervised learning Generating “classes”
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
This week: overview on pattern recognition (related to machine learning)
Classification (Supervised Clustering) Naomi Altman Nov '06.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Text mining.
Data mining and machine learning A brief introduction.
Text Classification, Active/Interactive learning.
Document Categorization Problem: given –a collection of documents, and –a taxonomy of subject areas Classification: Determine the subject area(s) most.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Text Clustering.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
tch?v=Y6ljFaKRTrI Fireflies.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 15 10/13/2011.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Information Retrieval and Organisation Chapter 14 Vector Space Classification Dell Zhang Birkbeck, University of London.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Information Retrieval Search Engine Technology (8) Prof. Dragomir R. Radev.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Data Mining and Text Mining. The Standard Data Mining process.
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
Machine Learning Lecture 9: Clustering
Revision (Part II) Ke Chen
Revision (Part II) Ke Chen
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Text Categorization Berlin Chen 2003 Reference:
Information Retrieval
Presentation transcript:

CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan

Overview of topics Clustering –Agglomerative –k-means Classification –Rule based –Support Vector Machines –Naive Bayes Finding communities (aka Trawling) Summarization Recommendation systems

Supervised vs. unsupervised learning Unsupervised learning: –Given corpus, infer structure implicit in the docs, without prior training. Supervised learning: –Train system to recognize docs of a certain type (e.g., docs in Italian, or docs about religion) –Decide whether or not new docs belong to the class(es) trained on

Why cluster documents Given a corpus, partition it into groups of related docs –Recursively, can induce a tree of topics Given the set of docs from the results of a search (say jaguar), partition into groups of related docs –semantic disambiguation

Agglomerative clustering Given target number of clusters k. Initially, each doc viewed as a cluster –start with n clusters; Repeat: –while there are > k clusters, find the “closest pair” of clusters and merge them.

k-means At the start of the iteration, we have k centroids. –Need not be docs, just some k points. –Axes could be terms, links, etc... Loop –Each doc assigned to the nearest centroid. –All docs assigned to the same centroid are averaged to compute a new centroid; thus have k new centroids.

Classification Given one or more topics, decide which one(s) a given document belongs to. Applications –Classification into a topic taxonomy –Intelligence analysts –Routing to help desks/customer service Choice of “topic” must be unique

Accuracy measurement Confusion matrix 53 Topic assigned by classifier Actual Topic This (i, j) entry means 53 of the docs actually in topic i were put in topic j by the classifier.

Explicit queries Topic queries can be built up from other topic queries.

Classification by exemplary docs Feed system exemplary docs on topic (training) Positive as well as negative examples System builds its model of topic Subsequent test docs evaluated against model –decides whether test is a member of the topic

Vector Spaces Each training doc a point (vector) labeled by its topic Hypothesis: docs of the same topic form a contiguous region of space Define surfaces to delineate topics in space

Support Vector Machine (SVM) Support vectors Maximize margin Quadratic programming problem The decision function is fully specified by training samples which lie on two parallel hyper-planes

Naive Bayes Training Use class frequencies in training data for P r [c i ]. Estimate word frequencies for each word and each class to estimate Pr [w |c i ]. Test doc d Use the Pr [w |c i ] values to estimate P r [d |c i ] for each class c i. Determine class c j for which P r [c j |d] is maximized.

Content + neighbors’ classes Naïve Bayes gives Pr[c j |d] based on the words in d. Now consider Pr[c j |N] where N is the set of labels of d’s neighbors. (Can separate N into in- and out-neighbors.) Can combine conditional probs for c j from text- and link-based evidence.

Finding communities on the web not easy, since web is huge what is a “dense subgraph”? define (i,j)-core: complete bipartite subgraph with i nodes all of which point to each of j others (2,3) core

·Lexical chains: look for terms appearing in consecutive sentences. ·For each sentence S in the doc. f(S) = a*h(S) - b*t(S) where h(S) = total score of all chains starting at S and t(S) = total score of all chains covering S, but not starting at S Document Summarization

Recommendation Systems Recommend docs to user based on user’s context (besides the docs’ content). Other applications: –Re-rank search results. –Locate experts. –Targeted ads.