SI/EECS 767 Yang Liu Apr 2, 2010.  A minimum cut is the smallest cut that will disconnect a graph into two disjoint subsets.  Application:  Graph partitioning.

1 SI/EECS 767 Yang Liu Apr 2, 2010

2  A minimum cut is the smallest cut that will disconnect a graph into two disjoint subsets.  Application:  Graph partitioning  Data clustering  Graph-based machine learning

3  Cut  A cut C = (S,T) is a partition of V of a graph G = (V, E).  An s-t cut C = (S,T) of a network N = (V, E) is a cut of N such that s ∈ S and t ∈ T, where s and t are the source and the sink of N respectively.  The cut-set of a cut C = (S,T) is the set {(u,v) ∈ E | u ∈ S, v ∈ T}.  The size of a cut C = (S,T) is the number of edges in the cut- set. If the edges are weighted, the value of the cut is the sum of the weights. (

4  Minimum cut  A cut is minimum if the size of the cut is not larger than the size of any other cut.  Max-flow-min-cut theorem  The maximum flow between two vertices is always equal to the size of the minimum cut times the capacity of a single pipe.  Also applies to weighted networks in which individual pipes can have different capacities.

5  Max-flow min-cut theorem is very useful because there are simple computer algorithms that can calculate maximum flows quite quickly (in polynomial time) for any given networks.  We can use these same algorithms to quickly calculate the size of a cut set.

6  Basic idea:  First find a path from source s to sink t using the breadth-first search;  Then find another path from s to t among the remaining edges and repeat this procedure until no more paths can be found. st

7  Allow fluid to flow simultaneously both ways down an edge in the network. Mark Newman’s text book (preprint version)

8 Graph Clustering and Minimum Cut Tress (Flake et al 2004)

9  Clustering data into disjoint groups  Data sets can be represented as weighted graphs  Nodes = entities to be clustered  Edges = a similarity measure between entities  Present a new clustering algorithm based on maximum flow. (in particular minimum cut tree)

10  Also known as Gomory–Hu tree  A weighted tree that consists of edges representing all pairs minimum s-t cut in the graph  For every undirected graph, there always exists a min-cut tree.  See [Gomory and Hu 61] for detail and the algorithm for calculating min-cut trees.


12  α→0, the trivial cut ({t}, V)  α→∞, n trivial clusters, all singletons  The exact value of α depends on the structure of G and the distribution of the weights over the edges.  The algorithm finds all clusters either in increasing or decreasing order, we can stop the algorithm as soon as a desired cluster has been found. α

13  Once a clustering is produced, contract the clusters into single nodes and apply the same algorithm to the resulting graph.  When contracting a set of nodes, they get replaced by a single new node; possible loops get deleted and parallel edges are combined into a single edge with weight equal to the sum of their weights.  break if  ((clusters returned are of desired number and size) or (clustering failed to create nontrivial clusters))


15  CiteSeer  Citation network (documents as nodes, citations as edges) Low levelhigh level

16  Minimum cut trees, based on expanded graphs, provide a means for producing quality clusterings and for extracting heavily connected components.  A single parameter, α, can be used as a strict bound on the expansion of the clustering while simultaneously serving to bound the intercluster weight as well.

17 Bipartite Graph Partitioning and Data Clustering (Zha et al 2001)

18  Bipartite graph  Two kinds of vertices  One representing the original vertices and the other representing the groups to which they belong  Examples: terms and documents, authors and authors of an article  Adapt undirected graphs criteria for bipartite graph partitioning and therefore solve the bi- clustering problem.

19  Bipartite graph G(X, Y, W)  In the context of document clustering  X represents the set of terms  Y represents the set of documents  W = (w ij ) represents term frequency of i in document j.

20  Tends to produce unbalanced clusters  The problem becomes following optimization problem

21  Computational complexity: general linear in the number of documents to be clustered

22  20 news groups

23 Learning from Labeled and Unlabeled Data using Graph Mincuts (Blum & Chawla 2001)

24  Many application domains suffer from not having enough labeled training data for learning.  Large amounts of unlabeled examples  How unlabeled data can be used to aid classification

25  A set L of labeled examples  A set U of unlabeled examples  Binary classification  L + to denote the set of positive examples  L - to denote the set of negtive examples

26  Construct a weighted graph G = (V, E), where V = L ∪ U ∪ {v +, v - }, e ∈ E is a weight w(e). v +, v - : classification vertices; other vertices: example vertices;  w(v, v + ) = ∞ for all v ∈ L + and w(v, v - ) = ∞ for all v ∈ L -  The edge between example vertices are assigned weights based on some relationship (similarity/distance) between the examples

27  Determine a minimum (v +, v - ) cut for the graph, i.e. the minimum total weight set of edges whose removal disconnects v + and v -. (using a max-flow algorithm in which v + is the source, v - is the sink)  Assign a positive label to all unlabeled examples in the set V + and a negative label to all unlabeled examples in the set V -.  *edges between examples which are similar to each other should be given a high weight

28  If there are few labeled examples, it can cause mincut to assign the unlabeled examples to one class or the other  If the graph is too sparse, it could have a number of disconnected components  Therefore it is important to use a proper weighting function

29  Datasets: UCI, 2000  The mincut algorithm has many degrees of freedom in terms of how the edge weights are defined.  Mincut-3: each example is connected to its nearest labeled example and two other nearest examples overall  Mincut- δ: if too nodes are closer than δ, they are connected  Mincut- δ 0 : max δ which graph has a cut of value 0  Mincut- δ 1/2 : the size of the largest connected component in the graph is half the number of datapoints  Mincut- δ opt : the values of δ that corresponds to the least classification error in hindsight


31  The basic idea of this algorithm is to build a graph on all the data with edges between examples that are sufficiently similar  then to partition the graph into a positive and a negative set in a way that  (a) agrees with the labeled data  (b) cuts as few edges as possible

32 Semi-supervised Learning using Randomized Mincuts (Blum et al 2004)

33  The drawbacks of the graph mincut approach:  A graph may have many minimum cuts and the mincut algorithm produces just one, typically the “leftmost” one using standard network flow algorithms.  Produced based on joint labeling rather than per- node probabilities.  Can be improved by averaging over many small cuts.

34  Repeatedly add artificial random noise to the edge weights  Solve for the minimum cut in the resulting graphs  Output a fractional label for each example corresponding to the fraction of the time it was on one side or the other

35  Given a graph G, produce a collection of cuts by repeatedly adding random noise to the edge weights and then solving for the minimum cut in the perturbed graph.  Sanity check: remove those that are highly unbalanced (any cut with less than 5% of the vertices on one side in this paper)  Predict based on a majority vote

36  Overcome some of the limitations of the plain mincut algorithm.  Consider a graph which simply consists of a line with a positively labeled node at one end and a negatively labeled node at the other end with the rest being unlabeled.  Plain mincut: the cut will be the leftmost or right most one  Randomized mincut: end up using the middle of the line with confidence that increases linearly out to the endpoints


38  The graph should be either be connected or at least have the property that a small number of connected components cover nearly all the examples.  Good to create a graph that at least has some small balanced cuts.

39  MST: simply construct a minimum spanning tree on the entire dataset  δ-MST: connect two points with an edge if they are within a radius δ. Then veiw the components produced as super nodes and connect them via an MST.

40  Handwritten digits  20 newsgroups  Various UCI datasets


42  Improve performance when the number of labeled examples is small  Providing a confidence score for accuracy- coverage curves.

43 A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts (Pang & Lee 2004)

44  machine-learning method that applies text- categorization techniques to determine the sentiment polarity—positive (“thumbs up”) or negative (“thumbs down”)

45  Previous approaches focused on selecting indicative lexical features  Their approach:  Label the sentences as either subjective or objective  Apply a standard machine-learning classifier to the resulting extract.


47  n items x 1,..., x n to divide into two classes C 1 and C 2  Individual scores ind j (x i ): non-negative estimates of each x i ’s preference for being in C j based on just the features of x i alone;  Association scores assoc(x i, x k ): non-negative estimates of how important it is that x i and x k be in the same class.  Minimize the partition cost  :

48  Build an undirected graph G with vertices {v 1,..., v n, s, t}; the last two are, respectively, the source and sink.  Add n edges (s, v i ), each with weight ind 1 (x i ), and n edges (v i, t), each with weight ind 2 (x i ).  Finally, add edges (v i, v k ), each with weight assoc(x i, x k ).


50  Classifying movie reviews as either positive and negative  The correct label can be extracted automatically from rating information (number of stars)

51  The source s and sink t correspond to the class of subjective and objective sentences  Each internal node v i corresponds to the document’s i th sentence si  Set the ind 1 (s i ) to  : Naive Bayes’ estimate of the probability that sentence s is subjective 

52  NB as a subjectivity detector in conjunction with a NB document-level polarity 86.4% accuracy VS 82.8% without extraction  SVM: 87.15% VS 86.4%  Sentences labeled as objective as input: 71% for NB and 67% for SVMs  Taking just the N most subjective sentences: 5 most subjective sentences is almost as informative as the Full review while containing only about 22% of the source words.

53  Subjectivity detection can compress reviews into shorter extracts still retain polarity information  Minimum-cut frame work results in the development of efficient algorithm for sentiment analysis

54  Questions?

