ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo.

Slides:



Advertisements
Similar presentations
Semi-Supervised Learning Avrim Blum Carnegie Mellon University [USC CS Distinguished Lecture Series, 2008]
Advertisements

SI/EECS 767 Yang Liu Apr 2,  A minimum cut is the smallest cut that will disconnect a graph into two disjoint subsets.  Application:  Graph partitioning.
Co Training Presented by: Shankar B S DMML Lab
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Why do we want a good ratio anyway? Approximation stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work joint with Pranjal.
Semi-Supervised Learning
Graph-Based Image Segmentation
Discriminative and generative methods for bags of features
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Learning using Graph Mincuts Shuchi Chawla Carnegie Mellon University 1/11/2003.
Optimal solution of binary problems Much material taken from :  Olga Veksler, University of Western Ontario
K nearest neighbor and Rocchio algorithm
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Active Learning of Binary Classifiers
Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira Carnegie Mellon.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
CS 376b Introduction to Computer Vision 04 / 04 / 2008 Instructor: Michael Eckmann.
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
MRF Labeling With Graph Cut CMPUT 615 Nilanjan Ray.
Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Machine Learning Theory Maria-Florina Balcan Lecture 1, Jan. 12 th 2010.
Incorporating Unlabeled Data in the Learning Process
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Universit at Dortmund, LS VIII
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
CS654: Digital Image Analysis
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
An Introduction to Support Vector Machine (SVM)
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Harnessing implicit assumptions in problem formulations: Approximation-stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Markov Random Fields in Vision
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Reconstructing Preferences from Opaque Transactions Avrim Blum Carnegie Mellon University Joint work with Yishay Mansour (Tel-Aviv) and Jamie Morgenstern.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Real-Time Hierarchical Scene Segmentation and Classification Andre Uckermann, Christof Elbrechter, Robert Haschke and Helge Ritter John Grossmann.
Correlation Clustering
Semi-Supervised Clustering
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
K Nearest Neighbor Classification
Semi-Supervised Learning
Haim Kaplan and Uri Zwick
Greedy clustering methods
Semi-Supervised Learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo Shi, Eva Tardos, Ramin Zabih]

Graph partitioning Coming up recently in Vision (image segmentation, image cleaning,...) Machine Learning (learning from labeled & unlabeled data, clustering). Central problem in Algorithms (max- flow min-cut, balanced separators)

Goals of the Workshop Exchange of ideas among people in 3 areas. Understanding of similarities and differences in problems, objectives. Formulate good questions. This is supposed to be informal!

Thanks to our sponsor ALADDIN Center NSF-supported center on ALgorithms, ADaptation, Dissemination, and INtegration Support work/interaction between algorithms and application areas. More announcements at the break

Graph partitioning for Machine Learning from Labeled and Unlabeled data Avrim Blum, CMU

Combining Labeled and Unlabeled Data Hot topic in recent years. Many applications have lots of unlabeled data, but labeled data is rare or expensive. E.g., –Web page, document classification –OCR, Image classification –Text extraction Can we use the unlabeled data to help? [lots of relevant references omitted here]

How can unlabeled data help? Unlabeled data + assumptions ! reduce space of “reasonable” distinctions. –E.g., OCR data might cluster. We hope each digit is one or more clusters. –Assumptions about world add a “self- consistency” component to optimization. In the presence of other kinds of info, can provide ability to bootstrap (co-training). –e.g., video, word-sense disambiguation.

Unlabeled data + assumptions ! reasonableness criteria Suppose we are looking for a linear separator. We believe should exist one with large separation. SVM

Unlabeled data + assumptions ! reasonableness criteria Suppose we are looking for linear separator. We believe should exist one with large separation. Transductive SVM [Joachims]:

Unlabeled data + assumptions ! reasonableness criteria Suppose we are looking for linear separator. We believe should exist one with large separation. Transductive SVM [Joachims]:

Unlabeled data + assumptions ! reasonableness criteria Suppose we believe that in general, similar examples have the same label. –Suggests NearestNeighbor or locally- weighted voting alg for standard problem. –Why not extend to objective function over unlabeled data too?

Unlabeled data + assumptions ! reasonableness criteria Suppose we believe that in general, similar examples have the same label. –Given set of labeled and unlabeled data, classify unlabeled data to minimize penalty = #pairs of similar examples with different labels.

The good, the bad, and the ugly…

The good Suggests natural alg approach along lines of [GPS,BVZ,SVZ,KT]: 1.Define graph with edges between similar examples (perhaps weighted).

The good Suggests natural alg approach along lines of [GPS,BVZ,SVZ,KT]: 2. Solve for labeling that minimizes weight of bad edges.

The good Much of ML is just 2-class problems, so (2) becomes just a minimum cut. S.Chawla will discuss some exptl results and design issues

The good Another view: if we created graph by connecting each to nearest neighbor, this is the labeling s.t. NN would have smallest leave-one-out error. [see also Joachims’ talk]

The bad Is this really what we want to do? Assumptions swamp our evidence?

The bad Is this really what we want to do? Assumptions swamp our evidence?

The ugly 1.Who defined “similar” anyway? 2.Given a distance metric, how should we construct graph? 3.Given graph, several possible objectives. Will skip 1 but see Murphy, Dietterich, Lafferty talks.

2:given d, how to create G? weird issue: for just labeled data, kNN (k=1,3,5) makes more sense than fixed radius because of unevenness of distribution. (I.e., for each test point you want to grow radius until hit k labeled points). But for unlabeled data, fixed k has problems.

Given d, how to create G? Say we connect each example to nearest nbr.

Given d, how to create G? Say we connect each example to nearest nbr.

Given d, how to create G? Say we connect each example to nearest nbr.

Given d, how to create G? Say we connect each example to nearest nbr. w(u,v) = f(d(u,v)) at least has property that graph gets more connected…

Given d, how to create G? [BC]: use unweighted graph. Edge between any pair of distance < . Is there a “correct” way? GW-moats?

Given G, several natural objectives Say f(v)= fractional label of v. Mincut: minimize [GZ]: minimize nice random walk / electrical networks interp.

Given G, several natural objectives (When) is one better than the other? Optimize other fns too?

Given G, several natural objectives (When) is one better than the other? Optimize other fns too?

Given G, several natural objectives If we view G as MRF, then mincut is finding most likely configuration. Cut of size k has prob Instead, ask for Bayes-optimal prediction on each individual example?

Given G, several natural objectives If we view G as MRF, then mincut is finding most likely configuration. Cut of size k has prob Instead, ask for Bayes-optimal prediction on each individual example? Nice open problem: efficiently sample from this distrib? (extend [JS]?)

Given G, several natural objectives If we view G as MRF, then mincut is finding most likely configuration. Cut of size k has prob Instead, ask for Bayes-optimal prediction on each individual example? Hack: Repeatedly add noise to edges and solve.

More questions Tradeoff between assumptions over unlabeled data, and evidence from labeled data? Esp if non-uniform. Hypergraphs? find labeling to minimize number of points that are different from majority vote over k nearest nbrs? See [VK,RZ].

More questions … (we’ll see over the next 2.5 days)