Download presentation
Presentation is loading. Please wait.
Published byJonathan Scott Modified over 9 years ago
1
Solanum lycopersicum Chromosome 4 Mapping and Finishing Update SRC-UK and Wellcome Trust Sanger Institute SOL Korea – September 2007 Wellcome Trust Medical Photographic Library
2
Tomato Physical Map LibraryNo. of clones Average InsertGenome equivalents Fingerprints LE_HBa129,024117 kb15 X88,000 (AGI) SL_MboI52,992135 kb7 X 43,000 (WTSI) SL_EcoI72,26495-100 kb7 X BACs are selected for sequencing on chromosome 4 using the physical map assembled in fpc. The map has been assembled using fingerprinted clones from 2 BAC libraries. Extending and gap filling clones are identified using end sequences. Clones are fingerprinted, entered in fpc and overlaps checked before being selected for sequencing. Tomato BAC libraries
3
Map Coverage – Chromosome 4 Chromosome 4 is represented by 45 FPC contigs that cover approximately 22.2Mb, estimated from fingerprints (5 bands/kb). 40 clones have been selected to extend original contigs based on clone end sequence matches All contigs are anchored to the chromosome by SGN chromosome 4 markers FISH (H. de Jong, Wageningen) has confirmed the placement of some contigs on chromosome 4, but may refute placement of >= 7 contigs. Confirmation of chromosome 4 contigs is high priority. 142 markers are missing out of the 907 SGN chromosome 4 markers from current fpc build. Overgo probes are being used to screen the BAC libraries. They may identify ~47 additional clones The Syngenta marker data will also be used for identifying additional BACs.
4
FISH Data Confirmation of chromosome location Verification of contig and marker placement Assessment of heterochromatin & euchromatin distribution This image demonstrates: –LE_HBa114C15 on short arm –LE_HBa308B7 on heterochromatin/centromere border –LE_HBa20F17 on long arm FISH performed by S. B. Chang at Prof S. Stack’s Laboratory, University of Colorado, USA.
5
Chromosome 4 – Distribution of contigs Mapped Markers ctg503ctg15 ctg5716 ctg5014ctg5252 ctg5711 ctg916 ctg1406 ctg1189 ctg1795 FISH confirmed This shows that clones for sequencing have been selected from seed contigs along the length of the chromosome. Including those selected from putative heterochromatic regions to try to asses the boundary domains
6
Distribution of Chromosome 4 Contigs This shows that clones for sequencing have been selected from seed contigs along the length of the chromosome. Ten contigs shown are from the current 45 fpc contigs on chr4 - including those selected from putative heterochromatic regions to try to assess the boundary domains. Chr4 Mapped Markers ctg503ctg15 ctg5716 ctg5014ctg5252 ctg5711 ctg916 ctg1406 ctg1189 ctg1795 TG485T0635T0954T1322 CT_At5g 37360 T1068TG287 FISH confirmed TG163 P41P74 Analysed BAC and Number of gene models Centromere bTH8H22 - 4 Genes bTH36C23 – 2 Genes bTH50I18 – 3 Genes bTH114C15 2 Genes bTH308B7 0 Genes bTH198L24 – 0 Genes bTH31H5 – 1 Gene bTH132O11 3 Genes bTH53M2 5 Genes bTH59M16 7 Genes The number of gene models obtained from the gene prediction training set = Euchromatin = Heterochromatin
7
Sequence Plot of ctg916 euchromatin
8
Sequence Plot of ctg5711 euchromatin
9
Sequence Plot of ctg15 (heterochromatic - euchromatic boundary region) Same plot as before with greyscale adjusted to view repeat features
10
Sequence Plot of ctg5014 near centromere Same plot as before with greyscale adjusted to view repeat features
11
TPF File T ile P ath F ormat file – tab delimited flat file GAPtype-3? ? LE_HBa-24G5ctg145 CT990489 LE_HBa-20F17ctg145 GAPtype-3? CT990488 LE_HBa-114C15ctg5716 ? SL_MboI-143K21ctg5716 GAPtype-3? ? LE_HBa-147F16ctg5014 CT990558 LE_HBa-308B7ctg5014 GAPtype-3? CT990624 LE_HBa-27G19ctg15 CT476825 LE_HBa-198L24ctg15 CT573298 LE_HBa-119A16ctg15 CT485992 LE_HBa-31H5ctg15
12
chr41500001N50000cloneno chr4500011000002N50000cloneno chr41000011500003N50000contigno chr41500012000004N50000cloneno chr42000013604325FCT476825.11160432+ chr43604333701136FCT573298.1200111681+ chr43701145322777FCT485992.12001164164+ chr45322785822778N50000contigno chr45822786322779N50000cloneno chr463227868227710N50000contigno AGP File Accesioned Golden Path – tab delimited flat file Gaps and unfinished clones are entered as 50,000bp sections to more accurately represent the chromosome in each build Order and alignment of Phase 3 finished accessions
13
AGP View on SGN
14
PseudoGoldenPath analysis for Contig Extension and Gap Closure A PGP viewer is being developed to visualise sequence alignments and contig positioning Contains finished and unfinished sequence Unfinished clones are represented as sequence contigs Unmasked BES aligned to PGP sequence using ssaha2 Parameters e.g. minimum percentage id = 95%, minimum of 60% of the end sequence found Map gaps are assigned an arbitrary 5kb size Clone candidates for contig extension checked with BLAST and fingerprinted Aim to incorporate other data such as markers
15
Closing the Map using PGP MAP GAP Bridging clones identified from BES alignments to sequence Sequenced clones 53 clone extensions have been identified, including 5 merges with previously unplaced contigs. 2 merges of chromosome 4 contigs have also been made
16
Extender from Fosmid Library Fosmid end sequences deposited by Cornell have been aligned to chromosome 4 sequence A copy of the fosmid library has been received at WTSI and ~ 50,000 clones will be end sequenced by December and the sequences deposited in the Ensembl / NCBI Trace repositories Potential Extender
17
WTSI Tomato Clone Pipeline Pipeline StageNumber of BACs Subcloning34 Shotgun21 Assembly Start7 Auto-prefinishing3 Finishing11 QC Checking4 Finished63 Total143 Phase 3 Phase 1 Phase 2 HTGS:
18
Chromosome 4 Sequence Generated Total Sequence Available10,666,227 bp Total Unique Sequence10,633,995 bp Total amount of Finished Sequence = 7,543,322 bp
19
Summary of Progress on Chromosome 4 45 map contigs have been built on chromosome 4 Clone end sequence alignments visualised with the PGP viewer are being used to extend contigs and close gaps ~100,000 fosmid end sequences will be generated by end 2007 10.6Mb of sequence has been generated, of which 7.5Mb are finished All sequence assemblies >2kb are deposited in HTGS divisions of EMBL/GenBank/DDBJ
20
Acknowledgements Wellcome Trust Sanger Institute: Jane Rogers Sean Humphray Clare Riddle and Mapping Core Group Karen McLaren and Finishing Team 46 Stuart McLaren and Pre-finishing Team 58 Christine Lloyd and QC Team 57 Karen Oliver Matt Jones Carol Scott Imperial College London: Gerard Bishop Daniel Buchan James Abbott Sarah Butcher University of Nottingham: Graham Seymour Scottish Crop Research Institute: Glenn Bryan Cornell University: Lukas Mueller Jim Giovannoni MIPS/IBI Institute for Bioinformatics: Klaus Mayer Remy Bruggmann FISH Resources Stephen Stack Group (Colorado) Hans de Jong (Wageningen) FUNDING
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.