WGP Tomato EU-SOL meeting July 15, 2009 Antoine Janssen
overview Whole Genome Profiling Whole Genome Profiling: the concept POP in Arabidopsis WGP melon Combining WGP and WGS WGP Tomato
Whole Genome Profiling: Sequence-based physical mapping BAC clones using Illumina Genome Analyzer (Solexa)
Next-generation sequencing technologies have accelerated whole genome re-sequencing approaches and reduced their costs dramatically but, de novo construction of genomes in complex organisms is still costly therefore, An improved de novo draft genome sequencing strategy is needed taking full advantage of the power of next-generation sequencing The challenge Whole Genome Profiling
BAC libraries - BACs 125 kb average insert size, covering 5-20 times the genome (GE) Chromosome BAC1 BAC3 BAC5 BAC4 BAC2 Whole Genome Profiling
TTAA……ACTTAGTTAGCTTGGACTAACGAATTCGTAGGCATAGTGACTAGCATTG…..……TTAA EcoRIMseI Restriction fragments Whole Genome Profiling
Arabidopsis Genome – 125 Mbp 6144 BACs (5 GE) in 384 well plates Each Illumina GA lane: 768 BACs ~ 3 M reads Total 8 lanes Individual BAC target preparation is too time consuming/costly Therefore: BAC 2D pooling Each pool identified by unique sample identification tag Pooling BAC clones R1 - CTACT R2 - CAGGT R3 - GCATC R4 - TGCAG R5 - TACTA R6 - CCTAG TCTGT - AGACT - GAGTC - GTGCA - ATCAC - GTATC wells plate = 384 BACs column pools row pools Whole Genome Profiling
Illumina Genome Analyzer GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GAACGCAATTCGTTCTCCCACATGATGTTGC GCGACCAATTCCGTGTTTGTTTCTTGCAGTA GCCATCAATTCGTCTTACATGACTTCCATAG TCAGACAATTCGGGGCATTCCAAAAACTTGG GTTGACAATTCATCAATTCATAATATAATAG TGACGCAATTCCATTTTCCTTCCTCATTCAT TAAGGCAATTCCAGAGATTTGTAGACTCTAG GTCTTCAATTCAGCTCACATTTCTTCGTTTT TAAAACAATTCAGTAAAAAAATACAAGTAGT GTCAGCAATTCTCTTCCATAAGAAGTTGTGA GGTTACAATTCTGGGAACTATAAAAATCAGA TGACGCAATTCTTTTGTACATGGGATAAACA GTTCCCAATTCTACAATGGAGCTCTTATATG GTTCCCAATTCAACTAAAGATAAAGGAGTTC GCGCGCAATTCTAAAACCAATCAATTAAATA GAACGCAATTCCTACATAGATGGTGGGACCC GTCTTCAATTCAAATAAGCTAATACCAAGTA GTACTCAATTCCTCTCTATAAACCATTACAA GGTGTCAATTCTGATGCTCGTCCTCGTCATA GGTACCAATTCCAAATCAAATAAAATCCTAT TATTACAATTCTTGAACGTGTCGTCTCACAC GTCTTCAATTCGTGCGTTTCTATCAAAACTG TAACTCAATTCAGACTCTGAATAAAGTTTAC TATGTCAATTCGGATTCACCTATAGCTTGAT GAATACAATTCAGAACCTTCTCTTCCTCAAA GTCTTCAATTCATGGAGACTTCAAATTTGAA GTTCCCAATTCTGGTTTTGTGTTTACAGTTA GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GTCGCCAATTCTACAGCCAGGTTTTGGATCT TACCACAATTCCCGGGGGCAAATTACGTTGA GGTACCAATTCCGTAGGCGACTTGAGTGCGG GTTATCAATTCAACAGTCATTTCATTGGACC GGTACCAATTCATTATTTTCATATAAATTTT GTTATCAATTCTTCGCTTTCGCCACTTGGTC TGACGCAATTCTGATCCATTGATTGCTCTTG GTTCCCAATTCTTGTTTAGGCAGTTCATACC GTTGACAATTCTCAGTATGCTAGGTGGTTGA GGTACCAATTCAGTTCAAGAGCCCAAGGACT GGTACCAATTCATCGTGAGAGAATGAGTAAA GCCATCAATTCGGTTCAGTATTTCCTTTCGG GGTGTCAATTCGCAAGGATTTGTAGGCCGGA TATTACAATTCTGGGTTTTTCCTTCTGGTGA GAATACAATTCTGAAGTCCTACGAAATATAG TATGTCAATTCTCCCAAAATGTGAGAGGTCC GTCAGCAATTCATTTTCATTCTGACCGAACC TATGTCAATTCCATATTCGAAGTTGCGATCA GAATACAATTCAAAGTTGTAAGTAATATCTC TATTACAATTCCAATAGAGAAAAGAGTCGTA GTCAGCAATTCGCCCTATAGTGAGTCGTATT GTCAGCAATTCCATTTCCGGCGTGATGATGC GAACGCAATTCGTTCTCCCACATGATGTTGC GCGACCAATTCCGTGTTTGTTTCTTGCAGTA GCCATCAATTCGTCTTACATGACTTCCATAG TCAGACAATTCGGGGCATTCCAAAAACTTGG GTTGACAATTCATCAATTCATAATATAATAG TGACGCAATTCCATTTTCCTTCCTCATTCAT TAAGGCAATTCCAGAGATTTGTAGACTCTAG GTCTTCAATTCAGCTCACATTTCTTCGTTTT TAAAACAATTCAGTAAAAAAATACAAGTAGT GTCAGCAATTCTCTTCCATAAGAAGTTGTGA GGTTACAATTCTGGGAACTATAAAAATCAGA TGACGCAATTCTTTTGTACATGGGATAAACA GTTCCCAATTCTACAATGGAGCTCTTATATG GTTCCCAATTCAACTAAAGATAAAGGAGTTC GCGCGCAATTCTAAAACCAATCAATTAAATA GAACGCAATTCCTACATAGATGGTGGGACCC GTCTTCAATTCAAATAAGCTAATACCAAGTA GTACTCAATTCCTCTCTATAAACCATTACAA GGTGTCAATTCTGATGCTCGTCCTCGTCATA GGTACCAATTCCAAATCAAATAAAATCCTAT TATTACAATTCTTGAACGTGTCGTCTCACAC GTCTTCAATTCGTGCGTTTCTATCAAAACTG TAACTCAATTCAGACTCTGAATAAAGTTTAC TATGTCAATTCGGATTCACCTATAGCTTGAT GAATACAATTCAGAACCTTCTCTTCCTCAAA GTCTTCAATTCATGGAGACTTCAAATTTGAA GTTCCCAATTCTGGTTTTGTGTTTACAGTTA GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GTCGCCAATTCTACAGCCAGGTTTTGGATCT TACCACAATTCCCGGGGGCAAATTACGTTGA GGTACCAATTCCGTAGGCGACTTGAGTGCGG GTTATCAATTCAACAGTCATTTCATTGGACC GGTACCAATTCATTATTTTCATATAAATTTT GTTATCAATTCTTCGCTTTCGCCACTTGGTC TGACGCAATTCTGATCCATTGATTGCTCTTG GTTCCCAATTCTTGTTTAGGCAGTTCATACC GTTGACAATTCTCAGTATGCTAGGTGGTTGA GGTACCAATTCAGTTCAAGAGCCCAAGGACT GGTACCAATTCATCGTGAGAGAATGAGTAAA GCCATCAATTCGGTTCAGTATTTCCTTTCGG GGTGTCAATTCGCAAGGATTTGTAGGCCGGA TATTACAATTCTGGGTTTTTCCTTCTGGTGA GAATACAATTCTGAAGTCCTACGAAATATAG TATGTCAATTCTCCCAAAATGTGAGAGGTCC GTCAGCAATTCATTTTCATTCTGACCGAACC TATGTCAATTCCATATTCGAAGTTGCGATCA GAATACAATTCAAAGTTGTAAGTAATATCTC TATTACAATTCCAATAGAGAAAAGAGTCGTA GTCAGCAATTCGCCCTATAGTGAGTCGTATT GTCAGCAATTCCATTTCCGGCGTGATGATGC GAACGCAATTCGTTCTCCCACATGATGTTGC GCGACCAATTCCGTGTTTGTTTCTTGCAGTA GCCATCAATTCGTCTTACATGACTTCCATAG TCAGACAATTCGGGGCATTCCAAAAACTTGG GTTGACAATTCATCAATTCATAATATAATAG TGACGCAATTCCATTTTCCTTCCTCATTCAT TAAGGCAATTCCAGAGATTTGTAGACTCTAG GTCTTCAATTCAGCTCACATTTCTTCGTTTT TAAAACAATTCAGTAAAAAAATACAAGTAGT GTCAGCAATTCTCTTCCATAAGAAGTTGTGA GGTTACAATTCTGGGAACTATAAAAATCAGA TGACGCAATTCTTTTGTACATGGGATAAACA GTTCCCAATTCTACAATGGAGCTCTTATATG GTTCCCAATTCAACTAAAGATAAAGGAGTTC GCGCGCAATTCTAAAACCAATCAATTAAATA GAACGCAATTCCTACATAGATGGTGGGACCC GTCTTCAATTCAAATAAGCTAATACCAAGTA GTACTCAATTCCTCTCTATAAACCATTACAA GGTGTCAATTCTGATGCTCGTCCTCGTCATA GGTACCAATTCCAAATCAAATAAAATCCTAT TATTACAATTCTTGAACGTGTCGTCTCACAC GTCTTCAATTCGTGCGTTTCTATCAAAACTG TAACTCAATTCAGACTCTGAATAAAGTTTAC TATGTCAATTCGGATTCACCTATAGCTTGAT GAATACAATTCAGAACCTTCTCTTCCTCAAA GTCTTCAATTCATGGAGACTTCAAATTTGAA GTTCCCAATTCTGGTTTTGTGTTTACAGTTA GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GTCGCCAATTCTACAGCCAGGTTTTGGATCT TACCACAATTCCCGGGGGCAAATTACGTTGA GGTACCAATTCCGTAGGCGACTTGAGTGCGG GTTATCAATTCAACAGTCATTTCATTGGACC GGTACCAATTCATTATTTTCATATAAATTTT GTTATCAATTCTTCGCTTTCGCCACTTGGTC TGACGCAATTCTGATCCATTGATTGCTCTTG GTTCCCAATTCTTGTTTAGGCAGTTCATACC GTTGACAATTCTCAGTATGCTAGGTGGTTGA GGTACCAATTCAGTTCAAGAGCCCAAGGACT GGTACCAATTCATCGTGAGAGAATGAGTAAA GCCATCAATTCGGTTCAGTATTTCCTTTCGG GGTGTCAATTCGCAAGGATTTGTAGGCCGGA TATTACAATTCTGGGTTTTTCCTTCTGGTGA GAATACAATTCTGAAGTCCTACGAAATATAG TATGTCAATTCTCCCAAAATGTGAGAGGTCC GTCAGCAATTCATTTTCATTCTGACCGAACC TATGTCAATTCCATATTCGAAGTTGCGATCA GAATACAATTCAAAGTTGTAAGTAATATCTC TATTACAATTCCAATAGAGAAAAGAGTCGTA GTCAGCAATTCGCCCTATAGTGAGTCGTATT GTCAGCAATTCCATTTCCGGCGTGATGATGC GAACGCAATTCGTTCTCCCACATGATGTTGC GCGACCAATTCCGTGTTTGTTTCTTGCAGTA GCCATCAATTCGTCTTACATGACTTCCATAG TCAGACAATTCGGGGCATTCCAAAAACTTGG GTTGACAATTCATCAATTCATAATATAATAG TGACGCAATTCCATTTTCCTTCCTCATTCAT TAAGGCAATTCCAGAGATTTGTAGACTCTAG GTCTTCAATTCAGCTCACATTTCTTCGTTTT TAAAACAATTCAGTAAAAAAATACAAGTAGT GTCAGCAATTCTCTTCCATAAGAAGTTGTGA GGTTACAATTCTGGGAACTATAAAAATCAGA TGACGCAATTCTTTTGTACATGGGATAAACA GTTCCCAATTCTACAATGGAGCTCTTATATG GTTCCCAATTCAACTAAAGATAAAGGAGTTC GCGCGCAATTCTAAAACCAATCAATTAAATA GAACGCAATTCCTACATAGATGGTGGGACCC GTCTTCAATTCAAATAAGCTAATACCAAGTA GTACTCAATTCCTCTCTATAAACCATTACAA GGTGTCAATTCTGATGCTCGTCCTCGTCATA GGTACCAATTCCAAATCAAATAAAATCCTAT TATTACAATTCTTGAACGTGTCGTCTCACAC GTCTTCAATTCGTGCGTTTCTATCAAAACTG TAACTCAATTCAGACTCTGAATAAAGTTTAC TATGTCAATTCGGATTCACCTATAGCTTGAT GAATACAATTCAGAACCTTCTCTTCCTCAAA GTCTTCAATTCATGGAGACTTCAAATTTGAA GTTCCCAATTCTGGTTTTGTGTTTACAGTTA GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GTCGCCAATTCTACAGCCAGGTTTTGGATCT TACCACAATTCCCGGGGGCAAATTACGTTGA GGTACCAATTCCGTAGGCGACTTGAGTGCGG GTTATCAATTCAACAGTCATTTCATTGGACC GGTACCAATTCATTATTTTCATATAAATTTT GTTATCAATTCTTCGCTTTCGCCACTTGGTC TGACGCAATTCTGATCCATTGATTGCTCTTG GTTCCCAATTCTTGTTTAGGCAGTTCATACC GTTGACAATTCTCAGTATGCTAGGTGGTTGA GGTACCAATTCAGTTCAAGAGCCCAAGGACT GGTACCAATTCATCGTGAGAGAATGAGTAAA GCCATCAATTCGGTTCAGTATTTCCTTTCGG GGTGTCAATTCGCAAGGATTTGTAGGCCGGA TATTACAATTCTGGGTTTTTCCTTCTGGTGA GAATACAATTCTGAAGTCCTACGAAATATAG TATGTCAATTCTCCCAAAATGTGAGAGGTCC GTCAGCAATTCATTTTCATTCTGACCGAACC TATGTCAATTCCATATTCGAAGTTGCGATCA GAATACAATTCAAAGTTGTAAGTAATATCTC TATTACAATTCCAATAGAGAAAAGAGTCGTA GTCAGCAATTCGCCCTATAGTGAGTCGTATT GTCAGCAATTCCATTTCCGGCGTGATGATGC GAACGCAATTCGTTCTCCCACATGATGTTGC GCGACCAATTCCGTGTTTGTTTCTTGCAGTA GCCATCAATTCGTCTTACATGACTTCCATAG TCAGACAATTCGGGGCATTCCAAAAACTTGG GTTGACAATTCATCAATTCATAATATAATAG TGACGCAATTCCATTTTCCTTCCTCATTCAT TAAGGCAATTCCAGAGATTTGTAGACTCTAG GTCTTCAATTCAGCTCACATTTCTTCGTTTT TAAAACAATTCAGTAAAAAAATACAAGTAGT GTCAGCAATTCTCTTCCATAAGAAGTTGTGA GGTTACAATTCTGGGAACTATAAAAATCAGA TGACGCAATTCTTTTGTACATGGGATAAACA GTTCCCAATTCTACAATGGAGCTCTTATATG GTTCCCAATTCAACTAAAGATAAAGGAGTTC GCGCGCAATTCTAAAACCAATCAATTAAATA GAACGCAATTCCTACATAGATGGTGGGACCC GTCTTCAATTCAAATAAGCTAATACCAAGTA GTACTCAATTCCTCTCTATAAACCATTACAA GGTGTCAATTCTGATGCTCGTCCTCGTCATA GGTACCAATTCCAAATCAAATAAAATCCTAT TATTACAATTCTTGAACGTGTCGTCTCACAC GTCTTCAATTCGTGCGTTTCTATCAAAACTG TAACTCAATTCAGACTCTGAATAAAGTTTAC TATGTCAATTCGGATTCACCTATAGCTTGAT GAATACAATTCAGAACCTTCTCTTCCTCAAA GTCTTCAATTCATGGAGACTTCAAATTTGAA GTTCCCAATTCTGGTTTTGTGTTTACAGTTA GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GTCGCCAATTCTACAGCCAGGTTTTGGATCT TACCACAATTCCCGGGGGCAAATTACGTTGA GGTACCAATTCCGTAGGCGACTTGAGTGCGG GTTATCAATTCAACAGTCATTTCATTGGACC GGTACCAATTCATTATTTTCATATAAATTTT GTTATCAATTCTTCGCTTTCGCCACTTGGTC TGACGCAATTCTGATCCATTGATTGCTCTTG GTTCCCAATTCTTGTTTAGGCAGTTCATACC GTTGACAATTCTCAGTATGCTAGGTGGTTGA GGTACCAATTCAGTTCAAGAGCCCAAGGACT GGTACCAATTCATCGTGAGAGAATGAGTAAA GCCATCAATTCGGTTCAGTATTTCCTTTCGG GGTGTCAATTCGCAAGGATTTGTAGGCCGGA TATTACAATTCTGGGTTTTTCCTTCTGGTGA GAATACAATTCTGAAGTCCTACGAAATATAG TATGTCAATTCTCCCAAAATGTGAGAGGTCC GTCAGCAATTCATTTTCATTCTGACCGAACC TATGTCAATTCCATATTCGAAGTTGCGATCA GAATACAATTCAAAGTTGTAAGTAATATCTC TATTACAATTCCAATAGAGAAAAGAGTCGTA GTCAGCAATTCGCCCTATAGTGAGTCGTATT GTCAGCAATTCCATTTCCGGCGTGATGATGC GAACGCAATTCGTTCTCCCACATGATGTTGC GCGACCAATTCCGTGTTTGTTTCTTGCAGTA GCCATCAATTCGTCTTACATGACTTCCATAG TCAGACAATTCGGGGCATTCCAAAAACTTGG GTTGACAATTCATCAATTCATAATATAATAG TGACGCAATTCCATTTTCCTTCCTCATTCAT TAAGGCAATTCCAGAGATTTGTAGACTCTAG GTCTTCAATTCAGCTCACATTTCTTCGTTTT TAAAACAATTCAGTAAAAAAATACAAGTAGT GTCAGCAATTCTCTTCCATAAGAAGTTGTGA GGTTACAATTCTGGGAACTATAAAAATCAGA TGACGCAATTCTTTTGTACATGGGATAAACA GTTCCCAATTCTACAATGGAGCTCTTATATG GTTCCCAATTCAACTAAAGATAAAGGAGTTC GCGCGCAATTCTAAAACCAATCAATTAAATA GAACGCAATTCCTACATAGATGGTGGGACCC GTCTTCAATTCAAATAAGCTAATACCAAGTA GTACTCAATTCCTCTCTATAAACCATTACAA GGTGTCAATTCTGATGCTCGTCCTCGTCATA GGTACCAATTCCAAATCAAATAAAATCCTAT TATTACAATTCTTGAACGTGTCGTCTCACAC GTCTTCAATTCGTGCGTTTCTATCAAAACTG TAACTCAATTCAGACTCTGAATAAAGTTTAC TATGTCAATTCGGATTCACCTATAGCTTGAT GAATACAATTCAGAACCTTCTCTTCCTCAAA GTCTTCAATTCATGGAGACTTCAAATTTGAA GTTCCCAATTCTGGTTTTGTGTTTACAGTTA GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GTCGCCAATTCTACAGCCAGGTTTTGGATCT TACCACAATTCCCGGGGGCAAATTACGTTGA GGTACCAATTCCGTAGGCGACTTGAGTGCGG GTTATCAATTCAACAGTCATTTCATTGGACC GGTACCAATTCATTATTTTCATATAAATTTT GTTATCAATTCTTCGCTTTCGCCACTTGGTC TGACGCAATTCTGATCCATTGATTGCTCTTG GTTCCCAATTCTTGTTTAGGCAGTTCATACC GTTGACAATTCTCAGTATGCTAGGTGGTTGA GGTACCAATTCAGTTCAAGAGCCCAAGGACT GGTACCAATTCATCGTGAGAGAATGAGTAAA GCCATCAATTCGGTTCAGTATTTCCTTTCGG GGTGTCAATTCGCAAGGATTTGTAGGCCGGA TATTACAATTCTGGGTTTTTCCTTCTGGTGA GAATACAATTCTGAAGTCCTACGAAATATAG TATGTCAATTCTCCCAAAATGTGAGAGGTCC GTCAGCAATTCATTTTCATTCTGACCGAACC TATGTCAATTCCATATTCGAAGTTGCGATCA GAATACAATTCAAAGTTGTAAGTAATATCTC TATTACAATTCCAATAGAGAAAAGAGTCGTA GTCAGCAATTCGCCCTATAGTGAGTCGTATT GTCAGCAATTCCATTTCCGGCGTGATGATGC GAACGCAATTCGTTCTCCCACATGATGTTGC GCGACCAATTCCGTGTTTGTTTCTTGCAGTA GCCATCAATTCGTCTTACATGACTTCCATAG TCAGACAATTCGGGGCATTCCAAAAACTTGG GTTGACAATTCATCAATTCATAATATAATAG TGACGCAATTCCATTTTCCTTCCTCATTCAT TAAGGCAATTCCAGAGATTTGTAGACTCTAG GTCTTCAATTCAGCTCACATTTCTTCGTTTT TAAAACAATTCAGTAAAAAAATACAAGTAGT GTCAGCAATTCTCTTCCATAAGAAGTTGTGA GGTTACAATTCTGGGAACTATAAAAATCAGA TGACGCAATTCTTTTGTACATGGGATAAACA GTTCCCAATTCTACAATGGAGCTCTTATATG GTTCCCAATTCAACTAAAGATAAAGGAGTTC GCGCGCAATTCTAAAACCAATCAATTAAATA GAACGCAATTCCTACATAGATGGTGGGACCC GTCTTCAATTCAAATAAGCTAATACCAAGTA GTACTCAATTCCTCTCTATAAACCATTACAA GGTGTCAATTCTGATGCTCGTCCTCGTCATA GGTACCAATTCCAAATCAAATAAAATCCTAT TATTACAATTCTTGAACGTGTCGTCTCACAC GTCTTCAATTCGTGCGTTTCTATCAAAACTG TAACTCAATTCAGACTCTGAATAAAGTTTAC TATGTCAATTCGGATTCACCTATAGCTTGAT GAATACAATTCAGAACCTTCTCTTCCTCAAA GTCTTCAATTCATGGAGACTTCAAATTTGAA GTTCCCAATTCTGGTTTTGTGTTTACAGTTA GTTGACAATTCAGTATTACTAGAGCTGCGTA TAACTCAATTCTCATCAGAAAAAATATATGA TACGCCAATTCCAAAAGATTGTAGAGTATTT GCCATCAATTCCAGTTTTGGGTTCTTCACCA GAAGTCAATTCAATTCCTTCAATCATCACAC GGTCGCAATTCAGTGTAAAAAGAAATTTCTA GCGGTCAATTCATGGTATGATGATAATAAGT TAACTCAATTCGAGTTCAGTTTGTTACTACT GCGGTCAATTCGGCAGAAGACACCAGTAACA TAAGGCAATTCATAGGTATAGGAAGAAGCCC TAACTCAATTCGTTCATACTTACACATGCAT TATTACAATTCTATATGATTTGTTACAGGTT GAATACAATTCAAAACCCTAATCACACGCAC TGAACCAATTCCACAATCGATGCTTATGAAA TATTACAATTCCAGTGGTCTAACAAGTAAAC GTTGACAATTCTTCATTCGAGTATTTCATCA GAAGTCAATTCGATTAGAAGCTGAAGAAGAA GTCGCCAATTCACAAAAAGCAAAACCCTTTT GTACTCAATTCTAATGGTGATTGAAAGACAA GTTATCAATTCTCCTCAGAGTTGGCGATGGA GCCATCAATTCTCTCGGGGATGTGTTGGGGG GCGCGCAATTCGCTTATCAAATCATTACCAG GTCAGCAATTCTATGGAGATAATTCGTGGGA TAAGGCAATTCTCATAGAAACAGAGAATGGA TAACTCAATTCTAAACTCAACCACCTAAAAC GTCGCCAATTCTACAGCCAGGTTTTGGATCT TACCACAATTCCCGGGGGCAAATTACGTTGA GGTACCAATTCCGTAGGCGACTTGAGTGCGG GTTATCAATTCAACAGTCATTTCATTGGACC GGTACCAATTCATTATTTTCATATAAATTTT GTTATCAATTCTTCGCTTTCGCCACTTGGTC TGACGCAATTCTGATCCATTGATTGCTCTTG GTTCCCAATTCTTGTTTAGGCAGTTCATACC GTTGACAATTCTCAGTATGCTAGGTGGTTGA GGTACCAATTCAGTTCAAGAGCCCAAGGACT GGTACCAATTCATCGTGAGAGAATGAGTAAA GCCATCAATTCGGTTCAGTATTTCCTTTCGG GGTGTCAATTCGCAAGGATTTGTAGGCCGGA TATTACAATTCTGGGTTTTTCCTTCTGGTGA GAATACAATTCTGAAGTCCTACGAAATATAG TATGTCAATTCTCCCAAAATGTGAGAGGTCC GTCAGCAATTCATTTTCATTCTGACCGAACC TATGTCAATTCCATATTCGAAGTTGCGATCA GAATACAATTCAAAGTTGTAAGTAATATCTC TATTACAATTCCAATAGAGAAAAGAGTCGTA GTCAGCAATTCGCCCTATAGTGAGTCGTATT GTCAGCAATTCCATTTCCGGCGTGATGATGC Whole Genome Profiling
Illumina sequence reads: TCTGT CAATTC TAGTACCAAGCTTGCCATGA TAAGG CAATTC GTTCCCGGGCCTTGTACACA GTCGC CAATTC CATCCAATAAATAGCTCTAT GCATC CAATTC TAGTACCAAGCTTGCCATGA TATTA CAATTC AATTAGAAGAAATGATATTC Whole Genome Profiling Sequence Tags sample identification tag (“barcode”) Restriction site part of the primer 20 base genome sequence tag flanking RE site = pool R3 = pool C19
70% of sequence 20-mer tags are unique in rice; > 85% in Arabidopsis Impact sequence tag length Whole Genome Profiling
FingerPrinted Contigs (FPC) assembly BAC1 BAC2 Assembly physical BAC map using adapted FPC Whole Genome Profiling
Whole Genome Profiling Whole Genome Arabidopsis Arabidopsis Genome – 125 Mbp 6144 BACs (6 GE) in 384 well plates Each Illumina GA lane: 768 BACs ~ 3 M reads Total 8 lanes
Whole Genome Profiling Results 6 GE Arabidopsis 4599 BACs 65,000 tags 234 contigs (2 – 125 BACs) 541 singletons 85% coverage FPC BAC1 BAC2
WGP Arabidopsis thaliana ecotype Colombia 6144 BACs (5 GE); WGP using one Illumina GA classic run 65,000 sequence tags assembly 4599 BACs (75%): 234 contigs (2 – 125 BACs/contig) Validation on genome sequence by BLAST analysis WGP sequence tags: 52,000 tags 100% hits, covering 99% of genome; max. gap 125 kbp 50,000 unique hits; average 2,355 bp between tags 86% of all EcoRI sites represented PoP Arabidopsis thaliana Whole Genome Profiling
PoP Arabidopsis thaliana Whole Genome Profiling
450 Mbp estimated genome size 47,000 BACs (EcoRI and HindIII libraries) ~ 13 GE in total Available for contig building: - 5 GA runs M reads - 196,000 unique sequence tags - 40,000 BACs (85%) uniquely tagged, average 33 tags/BAC WGP melon Whole Genome Profiling
WGP melon: results 549 contigs, 6416 singleton BACs Median 21 BACs / contig 78% genome coverage Whole Genome Profiling
Combining WGP and WGS Roche GS FLX Titanium and Illumina Genome Analyzer II Whole Genome Profiling and Whole Genome Sequencing
GS FLX Titanium sequencing (15 X): 10 GS FLX Titanium random shotgun runs 3 3-kb and 4 long jump p.e. GS FLX Titanium runs Illumina GA II paired-end sequencing (30 X): 500 bp, 2 kb and 10 kb Status: GS and GA sequencing completed GS assembly completed GA assembly in progress WGS melon genome Whole Genome Profiling and Whole Genome Sequencing
Combining WGP and WGS Whole Genome Profiling and Whole Genome Sequencing EcoRI WGP BAC contigs EcoRI WGP sequence tag kb distance → WGP sequence tag 400 nt Titanium (Paired-end) WGS contigs 36 nt GA II
Combining WGP and WGS Advantages: WGP provides sequence-based anchor points for WGS Use WGP to create high-resolution sequence-based physical BAC map, eg. 10 X BAC library coverage Use WGS to generate (deep) coverage whole genome sequence Superior assembly: WGP map contains far less contigs (549) than genomes sequenced by conventional random shotgun WGS strategies (tens of thousands) and produces more accurate maps than fingerprint based PM Cost reduction: no Sanger sequencing required Direct access to BAC clones in regions of interest Whole Genome Profiling and Whole Genome Sequencing
Status WGP Tomato 4 types of BAC libraries: HindIII15360 clones120Kbp insert EcoRI15360 clones120Kbp insert MboI15360 clones120Kbp insert 20 pools total 5.5 GBp / 950 Mbp = 5.7 x Random sheared (Lucigen)50688 clones90 kb insert 16 pools Total 4.6 Gbp / 950 Mbp = 4.8 x Total nr of clones: of which are analyzed (95%) Approximately 85% RE bacs deconvolutable Approximately 60% of sheared bacs deconvolutable
Comparison WGP results WGP Tomato average nr reads/tag average nr tags/BAC 74%84%75% tagged BACs (FPC ready) 67,74239,9354,599nr tagged BACs (FPC ready) 336,258181,25465,734nr unique tags 42%50%43% deconvolutable reads nr deconvolutable reads (M) nr OK reads generated (M) E/M enzyme combination genome equivalents BACs tested 92,16047,6166,144nr BACs tested 26 tag length (incl. restriction site) genome size (Mbp) TomatoMelonArabidopsis
What next WGP Tomato Finish last 5% (planned for next run) Contiging with FPC Deliver data EU-SOL: Integrate with WGS data
Amplicon Express: Robert Bogden Keith Stormo Quanzhou Tao 454 Life Sciences / Roche Applied Science: Jason Affourtit Brian Desany Hans Lunstroo University of Udine: Michele Morgante CBSG / EU SOL: Willem Stiekema Roeland van Ham René Klein Lankhorst BioSeeds companies: Rijk Zwaan Enza Zaden Vilmorin & Cie Takii & Co Keygene N.V.: Upstream ResearchApplied Research Marcel PrinsRené Hofstede Marjo de RuiterAnker Sørensen Hein van der PoelRichard Feron Marjolijn KelderMartin Zevenbergen Anita BonnéLinda de Leeuw Nathalie van OrsouwAlberto Maurer Esther VerstegeMarco van Schriek Taco JesseJeroen Rombout Bio-informatics ICT Jan van Oeveren Kornelis Stol Antoine Janssen Harold VerstegenContact: Hanne Jifeng Business Development Jon Wittendorp Herco van Liere Mark van Haaren Keygene N.V. owns patents and patent applications covering its Whole Genome technologies Thanks to: Whole Genome Profiling and Whole Genome Sequencing