Download presentation
Presentation is loading. Please wait.
1
Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2 Department of Botany & Plant Sciences
2
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Selective sequencing Many organisms are unlikely to be sequenced in the near future due to the large size and highly repetitive content of their genomesMany organisms are unlikely to be sequenced in the near future due to the large size and highly repetitive content of their genomes Selective sequencing: obtain the sequence of a small set of BAC clones that contain a specific set of genes of interestSelective sequencing: obtain the sequence of a small set of BAC clones that contain a specific set of genes of interest How do we identify these BAC clones? BAC-gene deconvolution problemHow do we identify these BAC clones? BAC-gene deconvolution problem
3
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem
4
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem
5
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem
6
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with probes The presence of a gene in a BAC can be determined by an hybridization experiment (e.g., using a unique probe designed from it)The presence of a gene in a BAC can be determined by an hybridization experiment (e.g., using a unique probe designed from it) Given that typically BAC clones and probes could be in the order of tens of thousands, carrying out an experiment for each pair (BAC,probe) is usually unfeasibleGiven that typically BAC clones and probes could be in the order of tens of thousands, carrying out an experiment for each pair (BAC,probe) is usually unfeasible Group testing (or pooling) has to be usedGroup testing (or pooling) has to be used
7
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with pools of probes Probes can be arranged into pools for group testing. However, in order to achieve exact deconvolution this strategy could be still unfeasible due to the large number of poolsProbes can be arranged into pools for group testing. However, in order to achieve exact deconvolution this strategy could be still unfeasible due to the large number of pools Question: Can we use a small number of pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?Question: Can we use a small number of pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?
8
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Dealing with the limitations of pooling Answer: Yes, if one compensates for the lack of information obtained by a weak pooling design with the knowledge of the overlapping structure of the BACsAnswer: Yes, if one compensates for the lack of information obtained by a weak pooling design with the knowledge of the overlapping structure of the BACs In this way, the number of pools required is reduced less expensive/time- consumingIn this way, the number of pools required is reduced less expensive/time- consuming
9
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization data h(b,p)=1 (pool p hybridizes to BAC b) –b must contain at least one of the probes/genes represented by p –positive information h(b,p)=0 (pool p does not hybridize to BAC b) –b cannot contain any of the probes/genes represented by p –negative information
10
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Deconvolution problem Given h(b,p) for all pairs (b,p) the deconvolution problem is to establish a one-to-many assignment between the probes p and the clones b in such a way that it satisfies the value of hGiven h(b,p) for all pairs (b,p) the deconvolution problem is to establish a one-to-many assignment between the probes p and the clones b in such a way that it satisfies the value of h 1.Basic deconvolution: uses only on information obtained from group testing 2.Improved deconvolution: also uses the physical map
11
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Input to the basic deconvolution h p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 b1b1b1b11000 b2b2b2b21100 b3b3b3b30110 b4b4b4b40011 b5b5b5b50001 Hybridization table p i is a pool b j is a BAC u k is a probe/gene
12
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Input to the basic deconvolution h p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 b1b1b1b11 b2b2b2b211 b3b3b3b311 b4b4b4b411 b5b5b5b51 u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 p1p1p1p1111 p2p2p2p2111 p3p3p3p3111 p4p4p4p4111 Pool content table Hybridization table p i is a pool b j is a BAC u k is a probe/gene
13
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Positive information u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p 1 111 b 2,p 1 111 b 2,p 2 111 b 3,p 2 111 b 3,p 3 111 b 4,p 3 111 b 4,p 4 111 b 5,p 4 111 p i is a pool b j is a BAC u k is a probe/gene
14
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Negative information u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b1b1b1b10000000 b2b2b2b200000 b3b3b3b3000000 b4b4b4b400000 b5b5b5b50000000 p i is a pool b j is a BAC u k is a probe/gene
15
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Combining positive & negative u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p 1 111 b 2,p 1 111 b 2,p 2 111 b 3,p 2 111 b 3,p 3 111 b 4,p 3 111 b 4,p 4 111 b 5,p 4 111 p i is a pool b j is a BAC u k is a probe/gene
16
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Combining positive & negative u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p 1 11 b 2,p 1 111 b 2,p 2 11 b 3,p 2 11 b 3,p 3 11 b 4,p 3 11 b 4,p 4 111 b 5,p 4 11 Each row represents a constraint to be satisfiedEach row represents a constraint to be satisfied If a row contains only one “1”, then the relationship between the BAC and probe is resolved exactlyIf a row contains only one “1”, then the relationship between the BAC and probe is resolved exactly p i is a pool b j is a BAC u k is a probe/gene
17
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution Basic deconvolution is not sufficientBasic deconvolution is not sufficient BACs are assembled into contigs by FPC (a contig is a set of BAC clones)BACs are assembled into contigs by FPC (a contig is a set of BAC clones) We assume the probes are unique each probe can belong to exactly one contigWe assume the probes are unique each probe can belong to exactly one contig Contig 1Contig 2
18
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Optimization problem We formulate the following optimization problemWe formulate the following optimization problem The problem is NP-complete (proof in the paper, reduction from 3SAT)The problem is NP-complete (proof in the paper, reduction from 3SAT)
19
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Integer Linear Programming The optimization problem can be solved via integer linear programming (ILP)The optimization problem can be solved via integer linear programming (ILP)
20
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 LP and randomized rounding The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the GLPK package)The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the GLPK package) Optimal solution to the LP is mapped to a valid solution to the ILP via randomized roundingOptimal solution to the LP is mapped to a valid solution to the ILP via randomized rounding We prove that our method achieves approximation ratio (1-e -1 )We prove that our method achieves approximation ratio (1-e -1 )
21
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results on rice genome Whole genome sequence for rice is availableWhole genome sequence for rice is available BAC library and fingerprinting data are available from AGIBAC library and fingerprinting data are available from AGI BAC-end sequences are also available from GenbankBAC-end sequences are also available from Genbank Physical map was built using FPCPhysical map was built using FPC Coordinates of the BAC on the genome were determined by BLASTing BAC-end sequences against the genomeCoordinates of the BAC on the genome were determined by BLASTing BAC-end sequences against the genome
22
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results on rice genome Rice unigenes are available from NCBIRice unigenes are available from NCBI Unique probes for the unigenes were designed by the Oligospawn softwareUnique probes for the unigenes were designed by the Oligospawn software Experiments focused on chromosome IExperiments focused on chromosome I Probe pools were designed following the shifted transversal design (STD)Probe pools were designed following the shifted transversal design (STD) Dataset: 2,002 probes and 2,629 BACsDataset: 2,002 probes and 2,629 BACs
23
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results 1-decodable pooling design
24
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results 2-decodable pooling design
25
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results
26
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Findings We proposed a new method to solve the BAC-gene deconvolution problem based on integer linear programmingWe proposed a new method to solve the BAC-gene deconvolution problem based on integer linear programming Experimental results show that our method is accurate and effectiveExperimental results show that our method is accurate and effective
27
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Thank you FundingFunding Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)
28
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
29
Hybridization with pools of probes Probes can be arranged into pools for group testingProbes can be arranged into pools for group testing In order to achieve exact deconvolution this strategy can be still unfeasibleIn order to achieve exact deconvolution this strategy can be still unfeasible The reason: a BAC may contain several, if not tens of genes the “decodability” of the pool design has to be high to achieve exact deconvolution …The reason: a BAC may contain several, if not tens of genes the “decodability” of the pool design has to be high to achieve exact deconvolution …
30
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with pools of probes … the pool size has to be small, which implies that the number of pools will be large… the pool size has to be small, which implies that the number of pools will be large Question: Can we use a low decodability (1- or 2-decodable) pool design and still achieve good deconvolution?Question: Can we use a low decodability (1- or 2-decodable) pool design and still achieve good deconvolution?
31
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution For example, if we knew that BAC and BAC b j are 80% overlapping if a probe p belongs to BAC, it is very likely that p also belongs to b jFor example, if we knew that BAC b i and BAC b j are 80% overlapping if a probe p belongs to BAC b i, it is very likely that p also belongs to b j On the other hand, if we knew that BAC and BAC b j are not overlapping if a probe p belongs to BAC, then it is very unlikely that probe p also belong to BAC b jOn the other hand, if we knew that BAC b i and BAC b j are not overlapping if a probe p belongs to BAC b i, then it is very unlikely that probe p also belong to BAC b j
32
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution Basic deconvolution step is not sufficientBasic deconvolution step is not sufficient The overlapping structure of the BACs is used to resolve additional relationships between BACs and probesThe overlapping structure of the BACs is used to resolve additional relationships between BACs and probes
33
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Sketch of the algorithm
34
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Perfect physical map Cut the chromosome at the points where a BAC starts or endsCut the chromosome at the points where a BAC starts or ends Let’s call the resulting pieces fragmentsLet’s call the resulting pieces fragments Each fragment is covered by a set of BACsEach fragment is covered by a set of BACs Assume the probes are unique, therefore, each probe can only belong to one fragmentAssume the probes are unique, therefore, each probe can only belong to one fragment f1f1 f2f2 f3f3 f4f4 f5f5
35
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Optimization problem Optimization problem is similarly formulatedOptimization problem is similarly formulated
36
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 ILP
37
Solving the optimization problem The above problem is NP-completeThe above problem is NP-complete It is solved via ILP followed by LP relaxation and randomized roundingIt is solved via ILP followed by LP relaxation and randomized rounding Similar performance guarantee can be provedSimilar performance guarantee can be proved
38
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Sketch of the algorithm
39
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.