Download presentation
Presentation is loading. Please wait.
Published byBenjamin Harmon Modified over 9 years ago
1
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources
2
Bioinformatics resources outline clone mapping, sequencing and manual annotation in genome assemblies and automated annotation in integrated ZF-Models data and tools
3
Clone mapping and sequencing mapping 2 BAC Tuebingen libraries 1 BAC and 1 cosmid library from single Tuebingen double-haploid fish end sequencing, RH mapping, fingerprinting pieced together according to fingerprints, marker mapping, sequence alignment currently ~ 2500 ctgs
4
Clone mapping and sequencing sequencing pipeline select clones based on position in fpc contig subcloning sequencing automatical assembly/pre-finishing (back to sequencing if necessary) finishing QC automated analysis pipeline manual annotation submission to EMBL + + =
5
RepeatMasker CpG island prediction Genscan FGenesh halfwise (Pfam) EPCR Blast (ESTs, cDNAs, proteins) gene structures remarks (gene names, function, similarities) other features EMBL mysql database in 'ensembl style' acedb or apollo front end open to users from the 'outside' unfinished sequence finished sequence automated analysis pipeline manual annotation otter Manual annotation
6
annotation policy follows guidelines for human annotation (havana team, Sanger Institute) no "guesses", annotations solely based on supporting evidence annotation of:CDSs and UTRs / transcripts splice variants pseudogenes poly A features transposons repeats approved nomenclature (SI:clone.number) collaboration with ZFIN existing ZFIN records are reported ZFIN provides new records for newly found genes
7
DNA repeats CpG island Genscan FGenesH proteins ESTs mRNAs Manual annotation
8
vega.sanger.ac.uk
9
Vega contigview
10
Vega geneview
11
www.sanger.ac.uk/Projects/D_rerio
13
when to use what go to vega.sanger.ac.uk if you need highly reliable sequence highly reliable annotation (with your input) ‘your gene’ stable over time (TILLING) go to www.ensembl.org if you need the whole genome comparative data ZF-Models microarray or insertional mutagenesis data complicated searches (BioMart)
14
Zebrafish Genome Project assembly release (Zv5) clone libraries map (un)finished clones whole genome shotgun sequencingclone mapping and sequencing WGS reads WGS assembly integration markers (T51) supercontig contig tile path BACs fpc ctg sequencing ~ 8,000 finished clones (~1 Gb) clones+ctgs contigs finish clone 1.63 Gb automatic annotation manual annotation
15
WGS assembly reads group reads supercontig Phusion assembler - High Performance Assembly Group (Zemin Ning et al.) contig supercontig ABC phrap read-pair tracker A CB B A C gap NNNNNNNN
16
Read grouping continuous base hash - k=12 ATGGCGTGCAGTCCATGTTCGGATCAATGGCGTGCAGT TGGCGTGCAGTC TGGCGTGCAGTC GGCGTGCAGTCC GGCGTGCAGTCC GCGTGCAGTCCA GCGTGCAGTCCA gap hash k=12 (4x3) - dealing with variation ATGGCGTGCAGTCCATGTTCGGATCA ATGGCGTGCAGTCCATGT TGGCGTGCAGTCCATGTT TGGCGTGCAGTCCATGTT GGCGTGCAGTCCATGTTC GGCGTGCAGTCCATGTTC GCGTGCAGTCCATGTTCG GCGTGCAGTCCATGTTCG k-mer word hashing ~7 repeats seq. errors word distribution k-mer occurrence frequency
17
Zebrafish Genome Project assembly release (Zv5) clone libraries map (un)finished clones whole genome shotgun sequencingclone mapping and sequencing WGS reads WGS assembly integration markers (T51) sequencing ~ 7,000 finished clones (~1 Gb) automatic annotation manual annotation
18
Integration Zv5 scaffoldn BX005049.6BX005057.8BX005153BX005123.6 BX005153BX005057.8 BX005049.6BX005123.6 fpc contig WGS supercontig marker cDNA bacends BACs Zv5 scaffoldn.3Zv5 scaffoldn.5Zv5 scaffoldn.7Zv5 scaffoldn.1
19
Assemblies Zv5Zv4Zv3Zv2 release date assembly27.05.0512.07.0427.11.0303.04.03 total length [bp]1,630,306,8661,592,025,6861,459,115,4861,452,210,772 scaffolds16,21421,33358,33983,470 finished clones4,519 (699 Mb)2.828 (443 Mb)1,502 (263Mb)- scaffolds in chr 1-251,7491,8921,490- scaffolds in fpc contigs265 (chrU)694 (chrU)1,8425,677 NA scaffolds14,67618,74754,79877,793 sum(length) chr 1-25 [bp] 1,200,129,620 (73%)1,097,507,810 (69%)718,270,423 (49%)- sum(length) ctgs183,993,739 (11%)176,222,396 (11%)365,271,659 (25%)1,143,459,008 sum(length) NAs246,183,507 (16%)318,295,480 (20%)335,615,307 (23%)308,751,764
20
Automatic Annotation Zebrafish Proteins Genewise genes Other Proteins Aligned cDNAs Zebrafish cDNAs Genewise genes with UTRs Genebuilder Supported ab initio (optional) Final set Aligned ESTs Zebrafish ESTs Ensembl EST genes Exonerate ClusterMerge Genewise
21
Ensembl
22
Contigview
23
Geneview
24
Searching Ensembl
25
Biomart startfilter output
27
Do’s and Dont’s go elsewhere (Ensembl) if you want to know about the whole genome need comparative data need ZF-Models microarray or insertional mut data need to do complicated searches go to Vega if you need highly reliable sequence need highly reliable annotation need ‘your gene’ stable over time (TILLING)
28
DAS reference sequence genome browser local storage remote storage DAS server remote storage DAS server remote storage DAS server XML DAS client
29
SNPs and Indels
30
Ensembl releases
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.