Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2.

Slides:



Advertisements
Similar presentations
Iterative Rounding and Iterative Relaxation
Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate
1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Combinatorial Algorithms
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Computational problems, algorithms, runtime, hardness
Instructor Neelima Gupta Table of Contents Lp –rounding Dual Fitting LP-Duality.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
NP-Complete Problems Problems in Computer Science are classified into
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’
CEM 514 Modeling of Construction Operations The LP/IP hybrid method for construction time-cost trade-off analysis By Samir Abdallah Sulaiman.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Decision Procedures An Algorithmic Point of View
Mouse Genome Sequencing
MCS312: NP-completeness and Approximation Algorithms
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
MCS 312: NP Completeness and Approximation algorithms Instructor Neelima Gupta
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
The Changing Face of Sequencing
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Theobroma cacao Integrated Physical and Genetic Map 2 BAC Libraries 250 Genetic Markers.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Lecture.6. Table of Contents Lp –rounding Dual Fitting LP-Duality.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Efficient Point Coverage in Wireless Sensor Networks Jie Wang and Ning Zhong Department of Computer Science University of Massachusetts Journal of Combinatorial.
Approximation Algorithms by bounding the OPT Instructor Neelima Gupta
Instructor: Shengyu Zhang 1. Optimization Very often we need to solve an optimization problem.  Maximize the utility/payoff/gain/…  Minimize the cost/penalty/loss/…
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Approximation Algorithms based on linear programming.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
CompSci Today’s Topics Computer Science Noncomputability Upcoming Special Topic: Enabled by Computer -- Decoding the Human Genome Reading Great.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Human Genome Project.
Chapter 7 Linear Programming
Possibilities and Limitations in Computation
Hidden Markov Models Part 2: Algorithms
SAT-Based Area Recovery in Technology Mapping
Integer Programming (정수계획법)
Integer Programming (정수계획법)
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)
Approximation Algorithms for the Selection of Robust Tag SNPs
Parsimony population haplotyping
Presentation transcript:

Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2 Department of Botany & Plant Sciences

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Selective sequencing Many organisms are unlikely to be sequenced in the near future due to the large size and highly repetitive content of their genomesMany organisms are unlikely to be sequenced in the near future due to the large size and highly repetitive content of their genomes Selective sequencing: obtain the sequence of a small set of BAC clones that contain a specific set of genes of interestSelective sequencing: obtain the sequence of a small set of BAC clones that contain a specific set of genes of interest How do we identify these BAC clones? BAC-gene deconvolution problemHow do we identify these BAC clones? BAC-gene deconvolution problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with probes The presence of a gene in a BAC can be determined by an hybridization experiment (e.g., using a unique probe designed from it)The presence of a gene in a BAC can be determined by an hybridization experiment (e.g., using a unique probe designed from it) Given that typically BAC clones and probes could be in the order of tens of thousands, carrying out an experiment for each pair (BAC,probe) is usually unfeasibleGiven that typically BAC clones and probes could be in the order of tens of thousands, carrying out an experiment for each pair (BAC,probe) is usually unfeasible Group testing (or pooling) has to be usedGroup testing (or pooling) has to be used

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with pools of probes Probes can be arranged into pools for group testing. However, in order to achieve exact deconvolution this strategy could be still unfeasible due to the large number of poolsProbes can be arranged into pools for group testing. However, in order to achieve exact deconvolution this strategy could be still unfeasible due to the large number of pools Question: Can we use a small number of pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?Question: Can we use a small number of pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Dealing with the limitations of pooling Answer: Yes, if one compensates for the lack of information obtained by a weak pooling design with the knowledge of the overlapping structure of the BACsAnswer: Yes, if one compensates for the lack of information obtained by a weak pooling design with the knowledge of the overlapping structure of the BACs In this way, the number of pools required is reduced  less expensive/time- consumingIn this way, the number of pools required is reduced  less expensive/time- consuming

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization data h(b,p)=1 (pool p hybridizes to BAC b) –b must contain at least one of the probes/genes represented by p –positive information h(b,p)=0 (pool p does not hybridize to BAC b) –b cannot contain any of the probes/genes represented by p –negative information

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Deconvolution problem Given h(b,p) for all pairs (b,p) the deconvolution problem is to establish a one-to-many assignment between the probes p and the clones b in such a way that it satisfies the value of hGiven h(b,p) for all pairs (b,p) the deconvolution problem is to establish a one-to-many assignment between the probes p and the clones b in such a way that it satisfies the value of h 1.Basic deconvolution: uses only on information obtained from group testing 2.Improved deconvolution: also uses the physical map

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Input to the basic deconvolution h p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 b1b1b1b11000 b2b2b2b21100 b3b3b3b30110 b4b4b4b40011 b5b5b5b50001 Hybridization table p i is a pool b j is a BAC u k is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Input to the basic deconvolution h p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 b1b1b1b11 b2b2b2b211 b3b3b3b311 b4b4b4b411 b5b5b5b51 u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 p1p1p1p1111 p2p2p2p2111 p3p3p3p3111 p4p4p4p4111 Pool content table Hybridization table p i is a pool b j is a BAC u k is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Positive information u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p b 2,p b 2,p b 3,p b 3,p b 4,p b 4,p b 5,p p i is a pool b j is a BAC u k is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Negative information u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b1b1b1b b2b2b2b b3b3b3b b4b4b4b b5b5b5b p i is a pool b j is a BAC u k is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Combining positive & negative u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p b 2,p b 2,p b 3,p b 3,p b 4,p b 4,p b 5,p p i is a pool b j is a BAC u k is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Combining positive & negative u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p 1 11 b 2,p b 2,p 2 11 b 3,p 2 11 b 3,p 3 11 b 4,p 3 11 b 4,p b 5,p 4 11 Each row represents a constraint to be satisfiedEach row represents a constraint to be satisfied If a row contains only one “1”, then the relationship between the BAC and probe is resolved exactlyIf a row contains only one “1”, then the relationship between the BAC and probe is resolved exactly p i is a pool b j is a BAC u k is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution Basic deconvolution is not sufficientBasic deconvolution is not sufficient BACs are assembled into contigs by FPC (a contig is a set of BAC clones)BACs are assembled into contigs by FPC (a contig is a set of BAC clones) We assume the probes are unique  each probe can belong to exactly one contigWe assume the probes are unique  each probe can belong to exactly one contig Contig 1Contig 2

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Optimization problem We formulate the following optimization problemWe formulate the following optimization problem The problem is NP-complete (proof in the paper, reduction from 3SAT)The problem is NP-complete (proof in the paper, reduction from 3SAT)

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Integer Linear Programming The optimization problem can be solved via integer linear programming (ILP)The optimization problem can be solved via integer linear programming (ILP)

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 LP and randomized rounding The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the GLPK package)The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the GLPK package) Optimal solution to the LP is mapped to a valid solution to the ILP via randomized roundingOptimal solution to the LP is mapped to a valid solution to the ILP via randomized rounding We prove that our method achieves approximation ratio (1-e -1 )We prove that our method achieves approximation ratio (1-e -1 )

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results on rice genome Whole genome sequence for rice is availableWhole genome sequence for rice is available BAC library and fingerprinting data are available from AGIBAC library and fingerprinting data are available from AGI BAC-end sequences are also available from GenbankBAC-end sequences are also available from Genbank Physical map was built using FPCPhysical map was built using FPC Coordinates of the BAC on the genome were determined by BLASTing BAC-end sequences against the genomeCoordinates of the BAC on the genome were determined by BLASTing BAC-end sequences against the genome

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results on rice genome Rice unigenes are available from NCBIRice unigenes are available from NCBI Unique probes for the unigenes were designed by the Oligospawn softwareUnique probes for the unigenes were designed by the Oligospawn software Experiments focused on chromosome IExperiments focused on chromosome I Probe pools were designed following the shifted transversal design (STD)Probe pools were designed following the shifted transversal design (STD) Dataset: 2,002 probes and 2,629 BACsDataset: 2,002 probes and 2,629 BACs

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results 1-decodable pooling design

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results 2-decodable pooling design

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Findings We proposed a new method to solve the BAC-gene deconvolution problem based on integer linear programmingWe proposed a new method to solve the BAC-gene deconvolution problem based on integer linear programming Experimental results show that our method is accurate and effectiveExperimental results show that our method is accurate and effective

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Thank you FundingFunding Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization with pools of probes Probes can be arranged into pools for group testingProbes can be arranged into pools for group testing In order to achieve exact deconvolution this strategy can be still unfeasibleIn order to achieve exact deconvolution this strategy can be still unfeasible The reason: a BAC may contain several, if not tens of genes  the “decodability” of the pool design has to be high to achieve exact deconvolution  …The reason: a BAC may contain several, if not tens of genes  the “decodability” of the pool design has to be high to achieve exact deconvolution  …

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with pools of probes …  the pool size has to be small, which implies that the number of pools will be large…  the pool size has to be small, which implies that the number of pools will be large Question: Can we use a low decodability (1- or 2-decodable) pool design and still achieve good deconvolution?Question: Can we use a low decodability (1- or 2-decodable) pool design and still achieve good deconvolution?

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution For example, if we knew that BAC and BAC b j are 80% overlapping  if a probe p belongs to BAC, it is very likely that p also belongs to b jFor example, if we knew that BAC b i and BAC b j are 80% overlapping  if a probe p belongs to BAC b i, it is very likely that p also belongs to b j On the other hand, if we knew that BAC and BAC b j are not overlapping  if a probe p belongs to BAC, then it is very unlikely that probe p also belong to BAC b jOn the other hand, if we knew that BAC b i and BAC b j are not overlapping  if a probe p belongs to BAC b i, then it is very unlikely that probe p also belong to BAC b j

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution Basic deconvolution step is not sufficientBasic deconvolution step is not sufficient The overlapping structure of the BACs is used to resolve additional relationships between BACs and probesThe overlapping structure of the BACs is used to resolve additional relationships between BACs and probes

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Sketch of the algorithm

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Perfect physical map Cut the chromosome at the points where a BAC starts or endsCut the chromosome at the points where a BAC starts or ends Let’s call the resulting pieces fragmentsLet’s call the resulting pieces fragments Each fragment is covered by a set of BACsEach fragment is covered by a set of BACs Assume the probes are unique, therefore, each probe can only belong to one fragmentAssume the probes are unique, therefore, each probe can only belong to one fragment f1f1 f2f2 f3f3 f4f4 f5f5

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Optimization problem Optimization problem is similarly formulatedOptimization problem is similarly formulated

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 ILP

Solving the optimization problem The above problem is NP-completeThe above problem is NP-complete It is solved via ILP followed by LP relaxation and randomized roundingIt is solved via ILP followed by LP relaxation and randomized rounding Similar performance guarantee can be provedSimilar performance guarantee can be proved

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Sketch of the algorithm

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results