Overlapping Matrix Pattern Visualization: a Hypergraph Approach Ruoming Jin Kent State University Joint with Yang Xiang, David Fuhry, and Feodor F. Dragan.

Slides:

Advertisements

Similar presentations

Impact of Interference on Multi-hop Wireless Network Performance

Advertisements

Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.

4.1 Introduction to Matrices

Graphs and Finding your way in the wilderness

Lecture 24 MAS 714 Hartmut Klauck

GRAPH BALANCING. Scheduling on Unrelated Machines J1 J2 J3 J4 J5 M1 M2 M3.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.

Compact and Low Delay Routing Labeling Scheme for Unit Disk Graphs Chenyu Yan, Yang Xiang, and Feodor F. Dragan (WADS 2009) Kent State University, Kent,

Wavelength Assignment in Optical Network Design Team 6: Lisa Zhang (Mentor) Brendan Farrell, Yi Huang, Mark Iwen, Ting Wang, Jintong Zheng Progress Report.

Cluster analysis for microarray data Anja von Heydebreck.

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Probabilistic Graph and Hypergraph Matching

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Finding Local Linear Correlations in High Dimensional Data Xiang Zhang Feng Pan Wei Wang University of.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

The Simplex Method: Standard Maximization Problems

Yang Xiang, Ruoming Jin, David Fuhry, Feodor F. Dragan

Exhaustive Signature Algorithm

Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:

Balanced Graph Partitioning Konstantin Andreev Harald Räcke.

1 Internet Networking Spring 2006 Tutorial 6 Network Cost of Minimum Spanning Tree.

Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.

1 Exploratory Tools for Follow-up Studies to Microarray Experiments Kaushik Sinha Ruoming Jin Gagan Agrawal Helen Piontkivska Ohio State and Kent State.

Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.

Efficient and Effective Itemset Pattern Summarization: Regression-based Approaches Ruoming Jin Kent State University Joint work with Muad Abu-Ata, Yang.

Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.

A scalable multilevel algorithm for community structure detection

SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.

Utrecht, february 22, 2002 Applications of Tree Decompositions Stan van Hoesel KE-FdEWB Universiteit Maastricht

Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.

2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:

1 Internet Networking Spring 2004 Tutorial 6 Network Cost of Minimum Spanning Tree.

1 Internet Networking Spring 2002 Tutorial 6 Network Cost of Minimum Spanning Tree.

Cartesian Contour: A Concise Representation for a Collection of Frequent Sets Ruoming Jin Kent State University Joint work with Yang Xiang and Lin Liu.

NEW APPROACH TO CALCULATION OF RANGE OF POLYNOMIALS USING BERNSTEIN FORMS.

A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU)

Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.

Hardness Results for Problems

My work: 1. Co-cluster users and content to summarize user  content relationships. 2. Define a new similarity index to efficiently answer complex queries.

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Graph Algorithms: Minimum.

1 Quantum query complexity of some graph problems C. DürrUniv. Paris-Sud M. HeiligmanNational Security Agency P. HøyerUniv. of Calgary M. MhallaInstitut.

A compression-boosting transform for 2D data Qiaofeng Yang Stefano Lonardi University of California, Riverside.

Using Entropy-Related Measures in Categorical Data Visualization  Jamal Alsakran The University of Jordan  Xiaoke Huang, Ye Zhao Kent State University.

Computational Molecular Biology Non-unique Probe Selection via Group Testing.

CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.

1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.

Semantic Wordfication of Document Collections Presenter: Yingyu Wu.

Two Discrete Optimization Problems Problem: The Transportation Problem.

WK15. Vertex Cover and Approximation Algorithm By Lin, Jr-Shiun Choi, Jae Sung.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 29 Nov 11, 2005 Nanjing University of Science & Technology.

Computational Molecular Biology Non-unique Probe Selection via Group Testing.

Biclustering of Expression Data by Yizong Cheng and Geoge M. Church Presented by Bojun Yan March 25, 2004.

Confluent Drawings of Bipartite/Layered Graphs Ulrik, Riko, Stephen, Titto, Nina.

MATRICES MATRIX OPERATIONS. About Matrices  A matrix is a rectangular arrangement of numbers in rows and columns. Rows run horizontally and columns run.

Chapter 13 Backtracking Introduction The 3-coloring problem

David Luebke 1 2/18/2016 CS 332: Algorithms NP Completeness Continued: Reductions.

1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.

Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 

CS 721 Project Implementation of Hypergraph Edge Covering Algorithms By David Leung ( )

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.

A Graph Theoretic Approach to Cache-Conscious Placement of Data for Direct Mapped Caches Mirza Beg and Peter van Beek University of Waterloo June

Maryam Pourebadi Kent State University April 2016.

June 2017 High Density Clusters.

Network Modelling Group

Finding Subgraphs with Maximum Total Density and Limited Overlap

Benjamin Doerr Partial Colorings of Unimodular Hypergraphs.

Floyd’s Algorithm (shortest-path problem)

9.3 Linear programming and 2 x 2 games : A geometric approach

Presentation transcript:

Overlapping Matrix Pattern Visualization: a Hypergraph Approach Ruoming Jin Kent State University Joint with Yang Xiang, David Fuhry, and Feodor F. Dragan (KSU)

The Problem Given a set of discovered submatrices, how can we reorder the rows and columns of the data matrix to best display these submatrices and their relationship?

Motivation: Overlapping Bicluster Visualization Gene expression profiles (row: genes, columns: conditions, matrix entry: expression level) Biclustering: homogeneous submatrices (genes  conditions) Biclustering visualization problem [GMM06, KG07]

Motivation: Transactional Data Visualization Shopping-basket data (rows: transaction, columns: item, binary matrix) Transactional data summarization using a set of dense submatrices [CK07, WK06, XJFD08] Summarization Cost=8+8+5=21

Roadmap Problem Definition –Visualization cost Hardness of the visualization problem –Hypergraph ordering problem –Minimum linear arrangement (MLA) Algorithm –Leveraging MLA and local convergence Experimental Results

Submatrix Visualization Cost t1t1 t2t2 t3t3 t6t6 t4t4 t5t5 t7t7 t8t8 i1i1 i2i2 i3i3 i4i4 i5i5 i6i6 i7i7 i8i8 i9i9 t1t1 t2t2 t3t3 t6t6 t4t4 t5t5 t7t7 t8t8 i1i1 i2i2 i3i3 i4i4 i5i5 i6i6 i7i7 i8i8 i9i9 Given a display of the matrix (a fixed row-order and column-order), how can we measure the goodness of “visualization” of a submatrix? {t1,t2,t7,t8}X{i1,i2,i8,i9} Why the second one is intuitively better than the second one?

Submatrix Visualization Cost t1t1 t2t2 t3t3 t6t6 t4t4 t5t5 t7t7 t8t8 i1i1 i2i2 i3i3 i4i4 i5i5 i6i6 i7i7 i8i8 i9i9 t1t1 t2t2 t3t3 t6t6 t4t4 t5t5 t7t7 t8t8 i1i1 i2i2 i3i3 i4i4 i5i5 i6i6 i7i7 i8i8 i9i9 Area: 8x8, 6x6, 4x4, 4x4 Perimeter: 8+8, 6+6, 4+4, 4+4 Given a row order and a column order, the visualization cost of a submatrix is the sum of –difference between its first and last row w.r.t. the row order –difference between its first and last column w.r.t. the column order {t1,t2,t7,t8}X{i1,i2,i8,i9}

Matrix Visualization Cost Given a row order and a column order, and a set of submatrices, the matrix visualization cost is the sum of these submatrices’ visualization cost. Matrix Optimal Visualization Problem: –Find the optimal row order and column order such that the matrix visualization cost is minimal.

Roadmap Problem Definition –Visualization cost Hardness of the visualization problem –Hypergraph ordering problem –Minimal linear arrangement (MLA) Algorithm –Leveraging MLA and Local convergence Experimental Results

Hypergraph Ordering Hypergraph HG=(V,X), –V is the set of vertices –X={x1,x2,…,} is the set of hyperedges, where each hyperedge is the set of vertices Hyperedge cost and Hypergraph cost Hypergraph Ordering Problem Hyperedge {0,2,3,4} cost = 4 Hyperedge {1,3,5} cost = 4 Hypergraph cost=16

The Link between Matrix Visualization and Hypergraph Ordering Relationship between matrix visualization cost and hypergraph cost Finding minimum visualization (or hypergraph) cost is NP-hard t1t1 t2t2 t3t3 t6t6 t4t4 t5t5 t7t7 t8t8 i1i1 i2i2 i3i3 i4i4 i5i5 i6i6 i7i7 i8i8 i9i9 i1i1 i2i2 i3i3 i7i7 i8i8 i9i9 t1t1 t2t2 t3t3 t6t6 t7t7 t8t8 i4i4 i5i5 i6i6 t5t5 t4t4 HG 1 HG 2

Hypergraph Ordering Problem is the Generalization of MLA Graph cost w.r.t. a vertex order MLA (Minimal Linear Arrangement): Find an optimal vertex ordering to minimize graph cost Graph cost=2+2+2* =16 Graph cost=2+4+2* =18

Roadmap Problem Definition –Visualization cost Hardness of the visualization problem –Hypergraph ordering problem –Minimal linear arrangement Algorithm –Leveraging MLA and Local convergence Experimental Results

Basic Idea for Hypergraph Ordering Many existing work on solving MLA problem (heuristic or bounded- approximation) Instead of working from scratch for the hypergraph ordering problem, can we somehow leverage the MLA algorithms? –The answer is YES!

Basic Procedure Given the hypergraph HG=(V,X), and starts with a random vertex order  : Step 1: Transforming the hypergraph HG into a graph G=(V,E) based on the vertex order  ; –cost(HG,  )=cost(G,  ) Step 2: Run MLA algorithm for graph G to produce a new optimal vertex order  ’ –cost(G,  )  cost(G,  ’) Step 3: If the new order improve the hypergraph cost, cost(HG,  ) > cost(HG,  ’), then use  ’ as the new order (  =  ’), and repeat Step 1 and 2. –cost(G,  ’)  cost(HG,  ’) Cost(HG,  )=cost(G,  )  cost(G,  ’)  cost(HG,  ’)

(Step1) Transformation: Hyperedge->Path Hyperedge cost=path cost!

Step 1->Step Step 1 (Hypergraph->Graph): cost(G,  )=2+2+2* =16=cost(HG,  ) Step 2 (MLA): cost(G,  ’)=1+2+2* =13<cost(G,  )

Step 1->Step 2->Step Step 1 (Hypergraph->Graph): cost(G,  )=cost(HG,  )=16 Step 2 (MinLA): cost(G,  ’)=13<cost(G,  ) With the new ordering, hyperedge cost  path cost!

Step 1->Step 2->Step Step 1 (Hypergraph->Graph): cost(G,  )=cost(HG,  )=16 Step 2 (MinLA): cost(G,  ’)=13<cost(G,  ) Step 3: cost(HG,  ’)=10<cost(G,  ’)= Cost(HG,  )=cost(G,  )>cost(G,  ’)>cost(HG,  ’)

Run Iteratively and Local Convergence

Other conversions of hyperedge Converting hyperedge to cycle Converting hyperedge to mulicycles

Roadmap Problem Definition –Visualization cost Hardness of the visualization problem –Hypergraph ordering Algorithm –Minimum linear arrangement (MLA) –Leveraging MLA and local convergence Experimental Results

Visualization effects

Visualization effects (continued)

Cost and running time

Conclusion We found an interesting link from matrix visualization problem to a well-know graph theoretical problem: the minimal linear arrangement (MLA) problem. Theoretically, we introduce a generalization of the MLA problem for the hypergraphs, and develop a novel local convergence algorithm Our method can be incorporated into an interactive visualization environment to allow users to focus on different parts of the data and patterns.

Thanks!!