Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Facility Design-Week6 Group Technology and Facility Layout
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 8 Tuesday, 11/19/02 Linear Programming.
1 The TSP : Approximation and Hardness of Approximation All exact science is dominated by the idea of approximation. -- Bertrand Russell ( )
1 MERLIN A polynomial solution for the Traveling Salesman Problem Dr. Joachim Mertz, 2005.
Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
1 Optimization Algorithms on a Quantum Computer A New Paradigm for Technical Computing Richard H. Warren, PhD Optimization.
Chapter 6 Linear Programming: The Simplex Method Section 3 The Dual Problem: Minimization with Problem Constraints of the Form ≥
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
1 Linear Programming Jose Rolim University of Geneva.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Math443/543 Mathematical Modeling and Optimization
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Reduced Support Vector Machine
Chapter 13 Introduction to Linear Regression and Correlation Analysis
An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Linear Programming Applications
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Solving System of Linear Equations. 1. Diagonal Form of a System of Equations 2. Elementary Row Operations 3. Elementary Row Operation 1 4. Elementary.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
1 1.1 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra SYSTEMS OF LINEAR EQUATIONS.
Gene expression & Clustering (Chapter 10)
Computer Engineering Majors Authors: Autar Kaw
Programming & Data Structures
Hon Wai Leong, NUS (CS6234, Spring 2009) Page 1 Copyright © 2009 by Leong Hon Wai CS6234 Lecture 1 -- (14-Jan-09) “Introduction”  Combinatorial Optimization.
Warm-Up Write each system as a matrix equation. Then solve the system, if possible, by using the matrix equation. 6 minutes.
Network Models (2) Tran Van Hoai Faculty of Computer Science & Engineering HCMC University of Technology Tran Van Hoai.
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 5 Systems and Matrices Copyright © 2013, 2009, 2005 Pearson Education, Inc.
Introduction to Linear Regression
Section 4-1: Introduction to Linear Systems. To understand and solve linear systems.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
6.1 Hamilton Circuits and Paths: Hamilton Circuits and Paths: Hamilton Path: Travels to each vertex once and only once… Hamilton Path: Travels to each.
10/17/ Gauss-Siedel Method Industrial Engineering Majors Authors: Autar Kaw
Sorting by Cuts, Joins and Whole Chromosome Duplications
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Introduction to Design and Manufacture Supply Chain Analysis (K. Khammuang & H. S. Gan) A scientific approach to decision making, which seeks to.
Notes 5IE 3121 Knapsack Model Intuitive idea: what is the most valuable collection of items that can be fit into a backpack?
Online Algorithms By: Sean Keith. An online algorithm is an algorithm that receives its input over time, where knowledge of the entire input is not available.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
Optimization with Neural Networks Presented by: Mahmood Khademi Babak Bashiri Instructor: Dr. Bagheri Sharif University of Technology April 2007.
Mechanical Engineering Department 1 سورة النحل (78)
An Overview of Clustering Methods Michael D. Kane, Ph.D.
A Linear Search Strategy Using Bounds Sharlee Climer and Weixiong Zhang.
Beauty and Joy of Computing Limits of Computing Ivona Bezáková CS10: UC Berkeley, April 14, 2014 (Slides inspired by Dan Garcia’s slides.)
Analyzing Expression Data: Clustering and Stats Chapter 16.
Hopfield Neural Networks for Optimization 虞台文 大同大學資工所 智慧型多媒體研究室.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
1 Euler and Hamilton paths Jorge A. Cobb The University of Texas at Dallas.
Copyright © 2008 Pearson Education, Inc. Slide 13-1 Unit 13B The Traveling Salesman Problem.
Multiplying Matrices.
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Joint work with Frans Schalekamp and Anke van Zuylen
Matrix Operations Monday, August 06, 2018.
Traveling Salesman Problem
Chapter 4 Linear Programming: The Simplex Method
How Accurate is Pure Parsimony Haplotype Inferencing
Route Optimization Problems and Google Maps
View Planning with Traveling Cost (Traveling VPP):
Neural Networks Chapter 4
Anastasia Baryshnikova  Cell Systems 
Presentation transcript:

Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in part by NDSEG and Olin Fellowships and by NSF grants IIS and ITR/EIA

Sharlee Climer Washington University in St. Louis 2 Overview Introduction Example Results Conclusion

Sharlee Climer Washington University in St. Louis 3 Introduction Rearrangement clustering  Rearrange rows of a matrix  Minimize the sum of the differences between adjacent rows  min  d(i, i+1)  Rows correspond to objects  Columns correspond to features

Sharlee Climer Washington University in St. Louis 4 Introduction Applications  Information retrieval  Manufacturing  Software engineering

Sharlee Climer Washington University in St. Louis 5 Example

Sharlee Climer Washington University in St. Louis 6 Example Bond Energy Algorithm (BEA)  Introduced in 1972 (McCormick, Schweitzer, White)  Approximate solution  Still widely used

Sharlee Climer Washington University in St. Louis 7 Example

Sharlee Climer Washington University in St. Louis 8 Example Optimal solution Lenstra (1974) observed equivalence to the Traveling Salesman Problem (TSP)  Given n cities and the distance between each pair  Find shortest cycle visiting every city  NP-hard problem

Sharlee Climer Washington University in St. Louis 9 Example Transform into a TSP  Each object corresponds to a city  Distance between two cities equal to difference between the corresponding objects  Dummy city added to problem Costs from dummy city to all other cities equal a constant  Location of dummy city indicates position to cut cycle into a path

Sharlee Climer Washington University in St. Louis 10 Example TSP solvers extremely slow even for small problems in the 70’s Massive research efforts to solve TSP over last three decades Current solvers  Concorde (Applegate, Bixby, Chvatal, Cook, 2001) Solved a 15,112 city TSP

Sharlee Climer Washington University in St. Louis 11 Example

Sharlee Climer Washington University in St. Louis 12 Example BEA and TSP offer approximate and optimal solutions We have observed a flaw in the objective function when the objects form natural clusters The objective minimizes the sum of every pair of adjacent rows Inter-cluster distances tend to be significantly larger than intra-cluster distances Summation dominated by inter-cluster distances

Sharlee Climer Washington University in St. Louis 13 Example TSPCluster addresses this flaw Add k dummy cities  k clusters are specified by the output TSP solver ignores inter-cluster distances  Minimizes sum of intra-cluster distances Use sufficiently small constant for distances to/from dummy cities  Dummy cities never adjacent to each other

Sharlee Climer Washington University in St. Louis 14 Example

Sharlee Climer Washington University in St. Louis 15 Results Arabidopsis  499 genes  25 conditions Comparison with BEA  Used BEA similarity measure  BEA score: 447,070  TSPCluster score: 452,109 (k = 1)

Sharlee Climer Washington University in St. Louis 16 Results BEATSPCluster

Sharlee Climer Washington University in St. Louis 17 Results Compared with Cluster (Eisen et al., 1998) and k-ary (Bar-Joseph et al., 2003) Used Pearson correlation coefficient Cluster: 398 k-ary: 427 TSPCluster: 436 (k = 1)

Sharlee Climer Washington University in St. Louis 18 Results Clusterk-aryTSPCluster

Sharlee Climer Washington University in St. Louis 19 Results TSPCluster with k equal to 2 to 50 How many clusters? Average inter-cluster distances BEA local peaks:  6, 13, 19, 26, 29, 35, 40, 47 Pearson correlation coefficient local peaks:  3, 9, 12, 21, 26, 40 Computation time varied  Less than half minute to ~3 minutes

Sharlee Climer Washington University in St. Louis 20 Results k = 26k = 40

Sharlee Climer Washington University in St. Louis 21 Conclusion Most problems have errors in their data Error introduced by approximation algorithms can’t be expected to “undo” this error Computers are cheap Computers and solvers are sophisticated Don’t have to always resort on approximate solutions even for NP-hard problems

Sharlee Climer Washington University in St. Louis 22 Conclusion Rearrangement clustering provides a linear ordering Linear ordering inherent to many applications  Information retrieval  Manufacturing  Software engineering

Sharlee Climer Washington University in St. Louis 23 Conclusion Gene data arranged in linear order to examine data Linear ordering not necessarily essential to gene clustering problems Current work  Optimally solve subproblems in clustering algorithms

Sharlee Climer Washington University in St. Louis 24 Questions?