Comp. Genomics Recitation 7 Clustering and analysis of microarrays.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms
Advertisements

NP-Hard Nattee Niparnan.
1 Appendix B: Solving TSP by Dynamic Programming Course: Algorithm Design and Analysis.
NP-Completeness More Reductions. Definitions P: is the class of all decision problems which can be solved in polynomial time, O(n^k) for some constant.
Department of Computer Science & Engineering
Combinatorial Algorithms
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Tirgul 12 Algorithm for Single-Source-Shortest-Paths (s-s-s-p) Problem Application of s-s-s-p for Solving a System of Difference Constraints.
Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.
On the Hardness of Graph Isomorphism Jacobo Tor á n SIAM J. Comput. Vol 33, p , Presenter: Qingwu Yang April, 2006.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The Theory of NP-Completeness
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
CSE 421 Algorithms Richard Anderson Lecture 27 NP Completeness.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
K-Coloring k-coloring: A k-coloring of a graph G is a labeling f: V(G)  S, where |S|=k. The labels are colors; the vertices of one color form a color.
Hardness Results for Problems
K-Coloring k-coloring: A k-coloring of a graph G is a labeling f: V(G)  S, where |S|=k. The labels are colors; the vertices of one color form a color.
1 The Theory of NP-Completeness 2 NP P NPC NP: Non-deterministic Polynomial P: Polynomial NPC: Non-deterministic Polynomial Complete P=NP? X = P.
The Shortest Path Problem
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
NP-Complete Problems CSC 331: Algorithm Analysis NP-Complete Problems.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
MCS 312: NP Completeness and Approximation algorithms Instructor Neelima Gupta
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
NP-COMPLETENESS PRESENTED BY TUSHAR KUMAR J. RITESH BAGGA.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Data Structures & Algorithms Graphs
Lecture 11 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
NP-Complete problems.
Bipartite Matching. Unweighted Bipartite Matching.
Introduction to Graphs And Breadth First Search. Graphs: what are they? Representations of pairwise relationships Collections of objects under some specified.
NP-Completeness (Nondeterministic Polynomial Completeness) Sushanth Sivaram Vallath & Z. Joseph.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
Chapter 13 Backtracking Introduction The 3-coloring problem
NPC.
1 Ch 10 - NP-completeness Tractable and intractable problems Decision/Optimization problems Deterministic/NonDeterministic algorithms Classes P and NP.
NP-complete Languages
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
The NP class. NP-completeness
P & NP.
Richard Anderson Lecture 26 NP-Completeness
Advanced Algorithms Analysis and Design
Richard Anderson Lecture 26 NP-Completeness
Lecture 22 Complexity and Reductions
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
CSE 421: Introduction to Algorithms
ICS 353: Design and Analysis of Algorithms
Richard Anderson Lecture 25 NP-Completeness
Chapter 34: NP-Completeness
Problem Solving 4.
5.4 T-joins and Postman Problems
CSE 589 Applied Algorithms Spring 1999
Instructor: Aaron Roth
Presentation transcript:

Comp. Genomics Recitation 7 Clustering and analysis of microarrays

Exercise 1 A microarray that contains probes for all the N metabolic enzymes of the bacterium D.Angerous was used for the following time-series microarray experiment: The bacteria population were exposed to a drug, and gene expression was measured every hour for M hours. The expression values are discretized to {-1,0,1}

Exercise 1 Find the longest expression pattern that is common to at least k enzymes. Each enzyme may start the pattern at a different time. T7T6T5T4T3T2T E E2 010 E E E5 0 1E6 K=3

Solution Treat each expression vector as a string Create a generalized suffix tree O(MN) Find longest k-common substring

Exercise 2 Expression of N genes was measured under a certain condition using a microarray. No discretization was performed. Give a polynomial time algorithm for clustering these genes into exactly k clusters. The objective function is

Pictorially G1G2G3G4G5G6 Expression level If {G3,G4,G5}is a cluster, its contribution to the objective function is d(G3,G5)

Solution Create a weighted directed graph, every gene is a node and the edge from i to j has weight d(i,j-1) if i’s expression is lower than j’s (otherwise ∞) G1G2G3G4G5G6 The path in the graph that corresponds to this clustering is G1  G3  G6. The value of the objective function is d(G1,G2)+d(G3,G5)+0

Solution Next: Find the shortest path that visits exactly k nodes Dynamic programming: Start from k because if l<k P l (k-1)=∞

Exercise 3 A microarray experiment with N genes and M conditions was conducted Describe a polynomial algorithm that determines whether the genes can be clustered into 2 clusters such that the maximum distance d(Gi,Gj) in each cluster < W

Illustration W=2 G1 G2 G3 G4

Solution Create a graph with a node for every gene Add an edge (i,j) if d(i,j)> W Check if the resulting graph is bipartite: Run BFS, if you discover an edge (u,v) to a gray node and the depths of u and v are both even or both odd, answer: “no”.

Solution Not Bipartite

Exercise 4 We are given a microarray with N genes and M experiments We want to cluster the genes into k clusters such that the distance between genes that belong to the same cluster will be < W Can you give a polynomial algorithm that solves this problem?

Solution Probably not More specifically, if we could solve this problem in polynomial time, we could solve a large class of problem that are widely believed to be unsolvable in polynomial time

Solution How can we show that we can probably not find a solution in polynomial time? We will take a problem for which this has already been shown We will construct a polynomial time reduction to our problem So, if our problem could be solved efficiently the “hard” problem could also be solved efficiently

Graph description The following graph can describe our problem: G1 G2 G3 G6 G5 G4 There’s an edge (Gi,Gj) if the distance between Gi and Gj is less than W

Graph description Clustering with k=3:

3COL 3-Colorability: Given a graph G, can we dye its vertices with 3 different colors such that no two adjacent nodes have the same color?

Comparing the problems What is common to both these problems? In both we “cluster” the nodes What are the differences? First, in 3COL there are only 3 clusters instead of k Second, the elements that belong to the same group in 3COL must not have edges between them

Reduction Now that we understand the differences, we can take a graph G that is an input to 3COL, and transform it to a graph G’ and a constant k that are the input to the k- clustering problem We assume that we have a polynomial k- clustering algorithm, and we apply it to (G’,k) and translate the solution to 3COL

Reduction Given the first difference that we noted, what should be the value of k? We set k to 3, i.e. the algorithm should find exactly 3 clusters How do we change G to get G’? G’ has the complement edges of G

Example

Proof  Suppose that G is 3 colorable. Let V 1,V 2,V 3 be the groups of nodes that can be colored by distinct colors. There are no edges between any pair of nodes in V 1,and therefore it forms a legal cluster in G’. Similarly, the nodes of V 2 and V 3 form clusters. Since V 1 UV 2 UV 3 contains all the nodes all the genes are clustered in the 3 corresponding clusters.

Proof, second direction  Suppose that G’ contains a clustering to 3 legal clusters. These clusters correspond to 3 nodes sets in G such that within each set there are no edges between pairs of nodes. Therefore, assigning a different color to every set is a 3-coloring.