JJE: INEX XML Competition Bryan Clevenger James Reed Jon McElroy.

Slides:



Advertisements
Similar presentations
Max Flow Problem Given network N=(V,A), two nodes s,t of V, and capacities on the arcs: uij is the capacity on arc (i,j). Find non-negative flow fij for.
Advertisements

Introduction to Algorithms
1 Maximum flow sender receiver Capacity constraint Lecture 6: Jan 25.
Graph-Based Image Segmentation
1 s-t Graph Cuts for Binary Energy Minimization  Now that we have an energy function, the big question is how do we minimize it? n Exhaustive search is.
The Maximum Network Flow Problem. CSE Network Flows.
Chapter 10: Iterative Improvement The Maximum Flow Problem The Design and Analysis of Algorithms.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Lectures on Network Flows
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
The Out of Kilter Algorithm in Introduction The out of kilter algorithm is an example of a primal-dual algorithm. It works on both the primal.
1 COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf.
Nick McKeown Spring 2012 Maximum Matching Algorithms EE384x Packet Switch Architectures.
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
HW2 Solutions. Problem 1 Construct a bipartite graph where, every family represents a vertex in one partition, and table represents a vertex in another.
1 Maximum flow problems. 2 - Introduction of: network, max-flow problem capacity, flow - Ford-Fulkerson method pseudo code, residual networks, augmenting.
INEX 2009 XML Mining Track James Reed Jonathan McElroy Brian Clevenger.
Maximum Flows Lecture 4: Jan 19. Network transmission Given a directed graph G A source node s A sink node t Goal: To send as much information from s.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
Flows sourcesink s t Flows sourcesink edge-weights = capacities.
Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.
Network Flow & Linear Programming Jeff Edmonds York University Adapted from NetworkFlow.ppt.
CSE 421 Algorithms Richard Anderson Lecture 22 Network Flow.
Segmentation via Graph Cuts
Math – Getting Information from the Graph of a Function 1.
MAX FLOW CS302, Spring 2013 David Kauchak. Admin.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Lecture 16 Maximum Matching. Incremental Method Transform from a feasible solution to another feasible solution to increase (or decrease) the value of.
Network Flow How to solve maximal flow and minimal cut problems.
Jonathan Dinger 1. Traffic footage example 2  Important step in video analysis  Background subtraction is often used 3.
Parametric Max-Flow Algorithms for Total Variation Minimization W.Yin (Rice University) joint with D.Goldfarb (Columbia), Y.Zhang (Rice), Y.Wang (Rice)
1 COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf.
& 6.855J & ESD.78J Algorithm Visualization The Ford-Fulkerson Augmenting Path Algorithm for the Maximum Flow Problem.
CS223 Advanced Data Structures and Algorithms 1 Maximum Flow Neil Tang 3/30/2010.
Chapter 7 April 28 Network Flow.
Graphcut Textures Image and Video Synthesis Using Graph Cuts
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 25.
Algorithm Design and Analysis (ADA)
Chapter 7 May 3 Ford-Fulkerson algorithm Step-by-step walk through of an example Worst-case number of augmentations Edmunds-Karp modification Time complexity.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
CSCI 256 Data Structures and Algorithm Analysis Lecture 20 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
Outline Standard 2-way minimum graph cut problem. Applications to problems in computer vision Classical algorithms from the theory literature A new algorithm.
Network Analysis Maxflow. What is a Network? Directed connected graph Source node Sink (destination node) Arcs are weighted (costs) Represent a system.
CSE 421 Algorithms Richard Anderson Lecture 22 Network Flow.
Theory of Computing Lecture 12 MAS 714 Hartmut Klauck.
Maximum Flow Problem Definitions and notations The Ford-Fulkerson method.
CS 312: Algorithm Design & Analysis Lecture #29: Network Flow and Cuts This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
ENGM 631 Maximum Flow Solutions. Maximum Flow Models (Flow, Capacity) (0,3) (2,2) (5,7) (0,8) (3,6) (6,8) (3,3) (4,4) (4,10)
Maximum Flow - Anil Kishore Graph Theory Basics. Prerequisites What is a Graph Directed, Weighted graphs How to traverse a graph using – Depth First Search.
Contest Algorithms January 2016 Describe the maxflow problem, explain the Ford-Fulkerson, Edmonds-Karp algorithms. Look at the mincut problem, bipartite.
Honors Track: Competitive Programming & Problem Solving Push-Relabel Algorithm Claire Claassen.
Graphcut Textures:Image and Video Synthesis Using Graph Cuts
Max-flow, Min-cut Network flow.
The Steepest Ascent Hill Climbing Algorithm
June 2017 High Density Clusters.
Data Exploration Of Wikipedia
Maximum Flow Solutions
Max-flow, Min-cut Network flow.
Lecture 16 Maximum Matching
Lecture 10 Network flow Max-flow and Min-cut Ford-Fulkerson method
Lecture 19-Problem Solving 4 Incremental Method
Richard Anderson Lecture 21 Network Flow
Problem Solving 4.
Network Flow.
Lecture 21 Network Flow, Part 1
Fast Min-Register Retiming Through Binary Max-Flow
Lecture 22 Network Flow, Part 2
Network Flow.
Maximum Flow Problems in 2005.
Presentation transcript:

JJE: INEX XML Competition Bryan Clevenger James Reed Jon McElroy

Introduction  Deal with large size of internet through using better categorization techniques  Goal: Optimize search time by grouping pages using clusters  Wikipedia is the data source

Problem  Take the Wikipedia data and create a clustering algorithm that leads to a the data being clustered.  This creates a reduction in search space for related information.

Solution  If documents contain several similar links then similar data.  Focused on the link data set: Link data:

Overall solution  Determine sub-communities in the graph using Max-Flow/Min-Cut community Discovery  Heuristics used to find relevant seeds

Max Flow – Min Cut  Edge Capacity – similar to edge weight. Represents the “amount” of information that can be pushed along.  Flow – The sum of minimum capacity of all paths from one node to another.

Max Flow – Min Cut (cont.)  The flow between two nodes in the same cluster should be larger than flow between two nodes in separate clusters.

Max Flow – Min Cut (cont.)

Max-Flow Community Discovery

Implementation

Implementation (Parsing)  Links parsed into a Graph. Graph: HashMap  Document Id to HashMap of Link Ids to Capacity.  Links structure was created Links[0] = 3244,2645,791 Links[1] = 10293,432,2, Links[max] = 1012

Implementation (Initialization of Community Seeds)  Using the Links structure, a percentage of nodes with highest links are chosen as seeds

Implementation (Finding Communities)  Idea, why it didn’t work?  robots

Implementation (Visualization)  Walrus is an interactive 3D visualization tool that works on large directed graphs.  Input and output Parsing.  Grouped clusters by colors.

Results  The INEX links data was composed of 54,000 nodes and 15 million links  Average running time on a DELL Duo Core 2.0 GHz Pentium Laptop to retrieve one cluster was 5.9 hours  Cluster size is between K

Results  Visual Images of clusters

Conclusion  It worked... kinda.  Looks great!  See pretty pictures.

References [1] Inex 2009 mining track. mine/wiki-mine.asp, October [2] The standard maximum flow problem. ow, November ow [3] Walrus - graph visualization tool. December [4] Mark C. Chu-Carroll. Maximum flow and minimum cut. and_minimum_cut_1.php, December [5] Fordfulkerson algorithm. October [6] Max-flow Min-cut theorem. min-cut_theorem, November 2009.

Questions?  O really?