Download presentation
Presentation is loading. Please wait.
Published byAshley Flynn Modified over 8 years ago
1
Review: Graph Theory in Bioinformatics Yunkai Liu Assistant Professor Computer Science Department University of South Dakota
2
Graph Theory In mathematics and computer science, graph theory has for its subject matter the properties of graphs. G=(V, E) Or G=(V, A)
3
Graph Theory - History Leonhard Euler's paper on “Seven Bridges of Königsberg”, published in 1736.
4
Graph Theory “The traveling salesman problem” A traveling salesman is to visit a number of cities; how to plan the trip so every city is visited once and just once and the whole trip is as short as possible ? “The Chinese postman problem”. A postman delivering mail to a number of streets such that the total distance walked by the postman was as short as possible. How could the postman ensure that the distance walked was a minimum?
5
Graph Theory In 1852 Francis Guthrie posed the “four color problem” which asks if it is possible to color, using only four colors, any map of countries in such a way as to prevent two bordering countries from having the same color. This problem, which was only solved a century later in 1976 by Kenneth Appel and Wolfgang Haken, can be considered the birth of graph theory.
6
Graph Algorithm in Bioinfo 1.Understand biological problem 2.Represent biological data as mathematical objects (strings, sets, graphs, permutations,…), map biological relations into mathematical relations, and formulate the biological question as optimization or feasibility problem 3.Study computational complexity: Polynomial? NP- hard? 4.Develop efficient algorithms If in P, find fast and memory efficient exact algorithms If NP-hard, find practical exact algorithms and/or algorithms with provable approximation guarantees 5.Validate algorithms on biological data
7
Multiple Sequence Alignment Multiple Sequence alignment (MSA) can be seen as a generalization of Pairwise Sequence Alignment - instead of aligning two sequences, k sequences are aligned simultaneously, where k is any number greater than two. Tool: Clustl, COFFEE
8
Multiple Sequence Alignment Biology Motive: Representing Protein Families Repetitive Sequences in DNA Computer Science Challenge: NP-Complete Problem
9
MSA Using Travel Salesman Problem Approach Model: Take each sequence as a vertex and assign the similarity as the weight of edge (undirected graph).
10
MSA Using Travel Salesman Problem Approach Challenges: Similarity / distance Relative entropy, likelihood ratio, … Cluster number …
11
Other Graph Algorithm in Sequence Comparison Suffix Tree; EM algorithm; Reference Book: Dan Gusfield, “Algorithms on Strings, Trees, and Sequences”, 1997
12
Shotgun Sequencing Cover region with ~7-fold redundancy Overlap reads and extend to reconstruct the original genomic region reads
13
Change into TSP Define overlap ( s i, s j ) as the length of the longest prefix of s j that matches a suffix of s i. Construct a graph with n vertices representing the n strings s 1, s 2,…., s n. Insert edges of length overlap ( s i, s j ) between vertices s i and s j. Find the shortest path which visits every vertex exactly once. This is the Traveling Salesman Problem (TSP).
14
Shortest Superstring Given: set of strings s 1, s 2, …, s n Find: shortest string s containing each s i as a substring Example: Set of strings: 000, 001, 010, 011, 100, 101, 110, 111 Superstring: 0001110100 NP-Complete
15
Hamiltonian Cycle Problem Hamiltonian Cycle Problem: Find a cycle that visits every vertex exactly once NP – complete
16
Hamiltonian Path Approach Path visited every VERTEX once ATG AGGTGCTCC H GTC GGT GCACAG ATGCAGGTCC
17
Some Difficulties with SBH Fidelity of Hybridization: difficult to detect differences between probes hybridized with perfect matches and 1 or 2 mismatches Array Size: Effect of low fidelity can be decreased with longer l-mers, but array size increases exponentially in l. Array size is limited with current technology. Practicality: SBH is still impractical. As DNA microarray technology improves, SBH may become practical in the future
18
Conclusion Graph algorithm always gives a certain result. Graph Algorithm is good for data mining and modeling. It is powerful to have graphic statistic model. For example, Markova model and random forest.
19
Biopathway Biological pathways represent networks of complex reactions at the molecular level in living cells. They model how biological molecules interact to accomplish a biological function and to respond to environmental stimuli. It includes metabolic pathway, signal transduction pathway, protein interaction pathway …
20
Biopathway A common goal of research in the life sciences is to develop an ever- broadening library of pathway models for biological processes of many different organisms. Such pathways can have significant broad impacts, such as making products in biotech applications and drug discovery in the pharmaceutical industry.
22
Problems in Biopathway pathway assembly information overlay pathway analysis
23
Reference Humberto Carrillo and David Lipman, “The Multiple Sequence Alignment Problem in Biology”, SAIM J. APPL. MATH, Vol. 48, No. 5, Oct 1988. Y. ZHANG AND M.S.WATERMAN. “DNA Sequence Assembly and Multiple Sequence Alignment by Eulerian Path Approach”. Cold Spring Harbor Symposia on Quantitative Biology, Volume LXVIII. © 2003. Purvi Saraiya, Chris North, Karen Duca. “Visualizing biological pathways: requirements analysis, systems evaluation and research agenda”. Information Visualization (2005), 1–15. http://www.bioalgorithms.info/presentations/Ch08_Gr aphsDNAseq.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.