SI/EECS 767 Yang Liu Apr 2, 2010.  A minimum cut is the smallest cut that will disconnect a graph into two disjoint subsets.  Application:  Graph partitioning.

Slides:



Advertisements
Similar presentations
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Advertisements

Greed is good. (Some of the time)
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Label Placement and graph drawing Imo Lieberwerth.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 8 Network models.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
1 s-t Graph Cuts for Binary Energy Minimization  Now that we have an energy function, the big question is how do we minimize it? n Exhaustive search is.
Chapter 10: Iterative Improvement The Maximum Flow Problem The Design and Analysis of Algorithms.
CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo.
Learning using Graph Mincuts Shuchi Chawla Carnegie Mellon University 1/11/2003.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
HCS Clustering Algorithm
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Reduced Support Vector Machine
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Stereo Computation using Iterative Graph-Cuts
Study Group Randomized Algorithms Jun 7, 2003 Jun 14, 2003.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
NetworkModel-1 Network Optimization Models. NetworkModel-2 Network Terminology A network consists of a set of nodes and arcs. The arcs may have some flow.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
SVM by Sequential Minimal Optimization (SMO)
Introduction to Operations Research
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Dijkstra’s Algorithm. Announcements Assignment #2 Due Tonight Exams Graded Assignment #3 Posted.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
 2004 SDU Lecture 7- Minimum Spanning Tree-- Extension 1.Properties of Minimum Spanning Tree 2.Secondary Minimum Spanning Tree 3.Bottleneck.
Bo Pang , Lillian Lee Department of Computer Science
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Mathematics of Networks (Cont)
Data Structures & Algorithms Graphs
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
1 Assignment #3 is posted: Due Thursday Nov. 15 at the beginning of class. Make sure you are also working on your projects. Come see me if you are unsure.
Prof. Swarat Chaudhuri COMP 382: Reasoning about Algorithms Fall 2015.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
1 Data Structures and Algorithms Graphs. 2 Graphs Basic Definitions Paths and Cycles Connectivity Other Properties Representation Examples of Graph Algorithms:
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Correlation Clustering
Semi-Supervised Clustering
Semi-supervised Machine Learning Gergana Lazarova
Minimum Spanning Tree 8/7/2018 4:26 AM
Graph theory Definitions Trees, cycles, directed graphs.
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
CIS 700: “algorithms for Big Data”
Instructor: Shengyu Zhang
Haim Kaplan and Uri Zwick
3.5 Minimum Cuts in Undirected Graphs
Text Book: Introduction to algorithms By C L R S
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

SI/EECS 767 Yang Liu Apr 2, 2010

 A minimum cut is the smallest cut that will disconnect a graph into two disjoint subsets.  Application:  Graph partitioning  Data clustering  Graph-based machine learning

 Cut  A cut C = (S,T) is a partition of V of a graph G = (V, E).  An s-t cut C = (S,T) of a network N = (V, E) is a cut of N such that s ∈ S and t ∈ T, where s and t are the source and the sink of N respectively.  The cut-set of a cut C = (S,T) is the set {(u,v) ∈ E | u ∈ S, v ∈ T}.  The size of a cut C = (S,T) is the number of edges in the cut- set. If the edges are weighted, the value of the cut is the sum of the weights. (

 Minimum cut  A cut is minimum if the size of the cut is not larger than the size of any other cut.  Max-flow-min-cut theorem  The maximum flow between two vertices is always equal to the size of the minimum cut times the capacity of a single pipe.  Also applies to weighted networks in which individual pipes can have different capacities.

 Max-flow min-cut theorem is very useful because there are simple computer algorithms that can calculate maximum flows quite quickly (in polynomial time) for any given networks.  We can use these same algorithms to quickly calculate the size of a cut set.

 Basic idea:  First find a path from source s to sink t using the breadth-first search;  Then find another path from s to t among the remaining edges and repeat this procedure until no more paths can be found. st

 Allow fluid to flow simultaneously both ways down an edge in the network. Mark Newman’s text book (preprint version)

Graph Clustering and Minimum Cut Tress (Flake et al 2004)

 Clustering data into disjoint groups  Data sets can be represented as weighted graphs  Nodes = entities to be clustered  Edges = a similarity measure between entities  Present a new clustering algorithm based on maximum flow. (in particular minimum cut tree)

 Also known as Gomory–Hu tree  A weighted tree that consists of edges representing all pairs minimum s-t cut in the graph  For every undirected graph, there always exists a min-cut tree.  See [Gomory and Hu 61] for detail and the algorithm for calculating min-cut trees.

 α→0, the trivial cut ({t}, V)  α→∞, n trivial clusters, all singletons  The exact value of α depends on the structure of G and the distribution of the weights over the edges.  The algorithm finds all clusters either in increasing or decreasing order, we can stop the algorithm as soon as a desired cluster has been found. α

 Once a clustering is produced, contract the clusters into single nodes and apply the same algorithm to the resulting graph.  When contracting a set of nodes, they get replaced by a single new node; possible loops get deleted and parallel edges are combined into a single edge with weight equal to the sum of their weights.  break if  ((clusters returned are of desired number and size) or (clustering failed to create nontrivial clusters))

 CiteSeer  Citation network (documents as nodes, citations as edges) Low levelhigh level

 Minimum cut trees, based on expanded graphs, provide a means for producing quality clusterings and for extracting heavily connected components.  A single parameter, α, can be used as a strict bound on the expansion of the clustering while simultaneously serving to bound the intercluster weight as well.

Bipartite Graph Partitioning and Data Clustering (Zha et al 2001)

 Bipartite graph  Two kinds of vertices  One representing the original vertices and the other representing the groups to which they belong  Examples: terms and documents, authors and authors of an article  Adapt undirected graphs criteria for bipartite graph partitioning and therefore solve the bi- clustering problem.

 Bipartite graph G(X, Y, W)  In the context of document clustering  X represents the set of terms  Y represents the set of documents  W = (w ij ) represents term frequency of i in document j.

 Tends to produce unbalanced clusters  The problem becomes following optimization problem

 Computational complexity: general linear in the number of documents to be clustered

 20 news groups

Learning from Labeled and Unlabeled Data using Graph Mincuts (Blum & Chawla 2001)

 Many application domains suffer from not having enough labeled training data for learning.  Large amounts of unlabeled examples  How unlabeled data can be used to aid classification

 A set L of labeled examples  A set U of unlabeled examples  Binary classification  L + to denote the set of positive examples  L - to denote the set of negtive examples

 Construct a weighted graph G = (V, E), where V = L ∪ U ∪ {v +, v - }, e ∈ E is a weight w(e). v +, v - : classification vertices; other vertices: example vertices;  w(v, v + ) = ∞ for all v ∈ L + and w(v, v - ) = ∞ for all v ∈ L -  The edge between example vertices are assigned weights based on some relationship (similarity/distance) between the examples

 Determine a minimum (v +, v - ) cut for the graph, i.e. the minimum total weight set of edges whose removal disconnects v + and v -. (using a max-flow algorithm in which v + is the source, v - is the sink)  Assign a positive label to all unlabeled examples in the set V + and a negative label to all unlabeled examples in the set V -.  *edges between examples which are similar to each other should be given a high weight

 If there are few labeled examples, it can cause mincut to assign the unlabeled examples to one class or the other  If the graph is too sparse, it could have a number of disconnected components  Therefore it is important to use a proper weighting function

 Datasets: UCI, 2000  The mincut algorithm has many degrees of freedom in terms of how the edge weights are defined.  Mincut-3: each example is connected to its nearest labeled example and two other nearest examples overall  Mincut- δ: if too nodes are closer than δ, they are connected  Mincut- δ 0 : max δ which graph has a cut of value 0  Mincut- δ 1/2 : the size of the largest connected component in the graph is half the number of datapoints  Mincut- δ opt : the values of δ that corresponds to the least classification error in hindsight

 The basic idea of this algorithm is to build a graph on all the data with edges between examples that are sufficiently similar  then to partition the graph into a positive and a negative set in a way that  (a) agrees with the labeled data  (b) cuts as few edges as possible

Semi-supervised Learning using Randomized Mincuts (Blum et al 2004)

 The drawbacks of the graph mincut approach:  A graph may have many minimum cuts and the mincut algorithm produces just one, typically the “leftmost” one using standard network flow algorithms.  Produced based on joint labeling rather than per- node probabilities.  Can be improved by averaging over many small cuts.

 Repeatedly add artificial random noise to the edge weights  Solve for the minimum cut in the resulting graphs  Output a fractional label for each example corresponding to the fraction of the time it was on one side or the other

 Given a graph G, produce a collection of cuts by repeatedly adding random noise to the edge weights and then solving for the minimum cut in the perturbed graph.  Sanity check: remove those that are highly unbalanced (any cut with less than 5% of the vertices on one side in this paper)  Predict based on a majority vote

 Overcome some of the limitations of the plain mincut algorithm.  Consider a graph which simply consists of a line with a positively labeled node at one end and a negatively labeled node at the other end with the rest being unlabeled.  Plain mincut: the cut will be the leftmost or right most one  Randomized mincut: end up using the middle of the line with confidence that increases linearly out to the endpoints

 The graph should be either be connected or at least have the property that a small number of connected components cover nearly all the examples.  Good to create a graph that at least has some small balanced cuts.

 MST: simply construct a minimum spanning tree on the entire dataset  δ-MST: connect two points with an edge if they are within a radius δ. Then veiw the components produced as super nodes and connect them via an MST.

 Handwritten digits  20 newsgroups  Various UCI datasets

 Improve performance when the number of labeled examples is small  Providing a confidence score for accuracy- coverage curves.

A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts (Pang & Lee 2004)

 machine-learning method that applies text- categorization techniques to determine the sentiment polarity—positive (“thumbs up”) or negative (“thumbs down”)

 Previous approaches focused on selecting indicative lexical features  Their approach:  Label the sentences as either subjective or objective  Apply a standard machine-learning classifier to the resulting extract.

 n items x 1,..., x n to divide into two classes C 1 and C 2  Individual scores ind j (x i ): non-negative estimates of each x i ’s preference for being in C j based on just the features of x i alone;  Association scores assoc(x i, x k ): non-negative estimates of how important it is that x i and x k be in the same class.  Minimize the partition cost  :

 Build an undirected graph G with vertices {v 1,..., v n, s, t}; the last two are, respectively, the source and sink.  Add n edges (s, v i ), each with weight ind 1 (x i ), and n edges (v i, t), each with weight ind 2 (x i ).  Finally, add edges (v i, v k ), each with weight assoc(x i, x k ).

 Classifying movie reviews as either positive and negative  The correct label can be extracted automatically from rating information (number of stars)

 The source s and sink t correspond to the class of subjective and objective sentences  Each internal node v i corresponds to the document’s i th sentence si  Set the ind 1 (s i ) to  : Naive Bayes’ estimate of the probability that sentence s is subjective 

 NB as a subjectivity detector in conjunction with a NB document-level polarity 86.4% accuracy VS 82.8% without extraction  SVM: 87.15% VS 86.4%  Sentences labeled as objective as input: 71% for NB and 67% for SVMs  Taking just the N most subjective sentences: 5 most subjective sentences is almost as informative as the Full review while containing only about 22% of the source words.

 Subjectivity detection can compress reviews into shorter extracts still retain polarity information  Minimum-cut frame work results in the development of efficient algorithm for sentiment analysis

 Questions?