Review: Graph Theory in Bioinformatics Yunkai Liu Assistant Professor Computer Science Department University of South Dakota.

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

22C:19 Discrete Math Graphs Fall 2010 Sukumar Ghosh.
Introduction to Graph Theory Instructor: Dr. Chaudhary Department of Computer Science Millersville University Reading Assignment Chapter 1.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Design and Analysis of Algorithms Approximation algorithms for NP-complete problems Haidong Xue Summer 2012, at GSU.
Shortest Paths Text Discrete Mathematics and Its Applications (5 th Edition) Kenneth H. Rosen Chapter 9.6 Based on slides from Chuck Allison, Michael T.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Approximation Algorithms for TSP
1 The TSP : Approximation and Hardness of Approximation All exact science is dominated by the idea of approximation. -- Bertrand Russell ( )
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Management Science 461 Lecture 2b – Shortest Paths September 16, 2008.
Some algorithmic background Biology 162 Computational Genetics Todd Vision Fall Aug 2004.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Graphs. Overview What is a graph? Some terminology Types of graph Implementing graphs (briefly) Some graph algorithms Graphs 2/18.
Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.
The Theory of NP-Completeness
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
Approximation Algorithms for the Traveling Salesperson Problem.
Euler and Hamilton Paths
22C:19 Discrete Math Graphs Spring 2014 Sukumar Ghosh.
Graph Theory and Graph Coloring Lindsay Mullen
CS 6030 – Bioinformatics Summer II 2012 Jason Eric Johnson
Introduction to Bioinformatics Algorithms Graph Algorithms in Bioinformatics.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Take a Tour with Euler Elementary Graph Theory – Euler Circuits and Hamiltonian Circuits Amro Mosaad – Middlesex County Academy.
May 5, 2015Applied Discrete Mathematics Week 13: Boolean Algebra 1 Dijkstra’s Algorithm procedure Dijkstra(G: weighted connected simple graph with vertices.
Programming & Data Structures
Physical Mapping of DNA Shanna Terry March 2, 2004.
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 8, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
The Traveling Salesperson Problem Algorithms and Networks.
EECS 203: It’s the end of the class and I feel fine. Graphs.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
© Nuffield Foundation 2011 Nuffield Free-Standing Mathematics Activity Chinese postman problems What route can I take to avoid going along the same street.
May 1, 2002Applied Discrete Mathematics Week 13: Graphs and Trees 1News CSEMS Scholarships for CS and Math students (US citizens only) $3,125 per year.
“Graph theory” for the master degree program “Geographic Information Systems” Yulia Burkatovskaya Department of Computer Engineering Associate professor.
Euler and Hamilton Paths
Structures 7 Decision Maths: Graph Theory, Networks and Algorithms.
394C March 5, 2012 Introduction to Genome Assembly.
Graph Theory And Bioinformatics Jason Wengert. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing.
CS 200 Algorithms and Data Structures
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
The Traveling Salesman Problem Over Seventy Years of Research, and a Million in Cash Presented by Vladimir Coxall.
Data Structures & Algorithms Graphs
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Lecture 6 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
1 Approximation Algorithm Updated on 2012/12/25. 2 Approximation Algorithm Up to now, the best algorithm for solving an NP-complete problem requires exponential.
CIRCUITS, PATHS, AND SCHEDULES Euler and Königsberg.
Introduction to Graph Theory
Lecture 25 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
Construction and Analysis of Efficient Algorithms Introduction Autumn 2015, Juris Vīksna.
A new Approach to Fragment Assembly in DNA Sequenceing Fei wu April,24,2006.
David Luebke 1 2/18/2016 CS 332: Algorithms NP Completeness Continued: Reductions.
Graph Theory Graph Theory - History Leonhard Euler's paper on “Seven Bridges of Königsberg”, published in 1736.
1 Euler and Hamilton paths Jorge A. Cobb The University of Texas at Dallas.
Grade 11 AP Mathematics Graph Theory Definition: A graph, G, is a set of vertices v(G) = {v 1, v 2, v 3, …, v n } and edges e(G) = {v i v j where 1 ≤ i,
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics)
Construction and Analysis of Efficient Algorithms
Shortest Path Problems
Graph Theory.
Graphs Chapter 13.
Genome Assembly.
Maximum Flows of Minimum Cost
Nuffield Free-Standing Mathematics Activity
Chapter 2: Business Efficiency Business Efficiency
Graph Algorithms in Bioinformatics
Introduction to Graph Theory
Lecture 24 Vertex Cover and Hamiltonian Cycle
Presentation transcript:

Review: Graph Theory in Bioinformatics Yunkai Liu Assistant Professor Computer Science Department University of South Dakota

Graph Theory In mathematics and computer science, graph theory has for its subject matter the properties of graphs. G=(V, E) Or G=(V, A)

Graph Theory - History Leonhard Euler's paper on “Seven Bridges of Königsberg”, published in 1736.

Graph Theory  “The traveling salesman problem” A traveling salesman is to visit a number of cities; how to plan the trip so every city is visited once and just once and the whole trip is as short as possible ?  “The Chinese postman problem”. A postman delivering mail to a number of streets such that the total distance walked by the postman was as short as possible. How could the postman ensure that the distance walked was a minimum?

Graph Theory In 1852 Francis Guthrie posed the “four color problem” which asks if it is possible to color, using only four colors, any map of countries in such a way as to prevent two bordering countries from having the same color. This problem, which was only solved a century later in 1976 by Kenneth Appel and Wolfgang Haken, can be considered the birth of graph theory.

Graph Algorithm in Bioinfo 1.Understand biological problem 2.Represent biological data as mathematical objects (strings, sets, graphs, permutations,…), map biological relations into mathematical relations, and formulate the biological question as optimization or feasibility problem 3.Study computational complexity: Polynomial? NP- hard? 4.Develop efficient algorithms If in P, find fast and memory efficient exact algorithms If NP-hard, find practical exact algorithms and/or algorithms with provable approximation guarantees 5.Validate algorithms on biological data

Multiple Sequence Alignment Multiple Sequence alignment (MSA) can be seen as a generalization of Pairwise Sequence Alignment - instead of aligning two sequences, k sequences are aligned simultaneously, where k is any number greater than two. Tool: Clustl, COFFEE

Multiple Sequence Alignment Biology Motive:  Representing Protein Families  Repetitive Sequences in DNA Computer Science Challenge: NP-Complete Problem

MSA Using Travel Salesman Problem Approach Model: Take each sequence as a vertex and assign the similarity as the weight of edge (undirected graph).

MSA Using Travel Salesman Problem Approach Challenges:  Similarity / distance Relative entropy, likelihood ratio, …  Cluster number  …

Other Graph Algorithm in Sequence Comparison Suffix Tree; EM algorithm; Reference Book: Dan Gusfield, “Algorithms on Strings, Trees, and Sequences”, 1997

Shotgun Sequencing Cover region with ~7-fold redundancy Overlap reads and extend to reconstruct the original genomic region reads

Change into TSP  Define overlap ( s i, s j ) as the length of the longest prefix of s j that matches a suffix of s i.  Construct a graph with n vertices representing the n strings s 1, s 2,…., s n. Insert edges of length overlap ( s i, s j ) between vertices s i and s j.  Find the shortest path which visits every vertex exactly once. This is the Traveling Salesman Problem (TSP).

Shortest Superstring  Given: set of strings s 1, s 2, …, s n  Find: shortest string s containing each s i as a substring  Example: Set of strings: 000, 001, 010, 011, 100, 101, 110, 111 Superstring:  NP-Complete

Hamiltonian Cycle Problem  Hamiltonian Cycle Problem: Find a cycle that visits every vertex exactly once  NP – complete

Hamiltonian Path Approach Path visited every VERTEX once ATG AGGTGCTCC H GTC GGT GCACAG ATGCAGGTCC

Some Difficulties with SBH  Fidelity of Hybridization: difficult to detect differences between probes hybridized with perfect matches and 1 or 2 mismatches  Array Size: Effect of low fidelity can be decreased with longer l-mers, but array size increases exponentially in l. Array size is limited with current technology.  Practicality: SBH is still impractical. As DNA microarray technology improves, SBH may become practical in the future

Conclusion  Graph algorithm always gives a certain result.  Graph Algorithm is good for data mining and modeling.  It is powerful to have graphic statistic model. For example, Markova model and random forest.

Biopathway Biological pathways represent networks of complex reactions at the molecular level in living cells. They model how biological molecules interact to accomplish a biological function and to respond to environmental stimuli. It includes metabolic pathway, signal transduction pathway, protein interaction pathway …

Biopathway A common goal of research in the life sciences is to develop an ever- broadening library of pathway models for biological processes of many different organisms. Such pathways can have significant broad impacts, such as making products in biotech applications and drug discovery in the pharmaceutical industry.

Problems in Biopathway  pathway assembly  information overlay  pathway analysis

Reference  Humberto Carrillo and David Lipman, “The Multiple Sequence Alignment Problem in Biology”, SAIM J. APPL. MATH, Vol. 48, No. 5, Oct  Y. ZHANG AND M.S.WATERMAN. “DNA Sequence Assembly and Multiple Sequence Alignment by Eulerian Path Approach”. Cold Spring Harbor Symposia on Quantitative Biology, Volume LXVIII. ©  Purvi Saraiya, Chris North, Karen Duca. “Visualizing biological pathways: requirements analysis, systems evaluation and research agenda”. Information Visualization (2005), 1–15.  aphsDNAseq.pdf