Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating.

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Homology Based Analysis of the Human/Mouse lncRNome
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Basics of Comparative Genomics Dr G. P. S. Raghava.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Gene Finding Charles Yan.
CSE182-L12 Gene Finding.
Comparative ab initio prediction of gene structures using pair HMMs
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Eukaryotic Gene Finding
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Lecture 12 Splicing and gene prediction in eukaryotes
Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Genome Annotation BCB 660 October 20, From Carson Holt.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Figure 1. P. Knowlesi top, six frame translation showing snap generated gene models (blue), contigs depicted alternate brown and orange. P falciparum (bottom)
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Mouse Genome Sequencing
Tomato genome annotation pipeline in Cyrille2
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Vervet Monkey Genomics: Genome Canada and Génome Québec Physical Map Project J. Wasserscheid, G. Leveque, C. Nagy, C. Pinsonnault, and K. Dewar, McGill.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Spliced Transcripts Alignment & Reconstruction
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Gene discovery using combined signals from genome sequence and natural selection Michael Brent Washington University The mouse genome analysis group.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
(H)MMs in gene prediction and similarity searches.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
Welcome to the combined BLAST and Genome Browser Tutorial.
AceView Danielle and Jean Thierry-Mieg NCBI = global annotation of the whole human genome ● Restricted to the Gencode Regions ●
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
What is BLAST? Basic BLAST search What is BLAST?
Annotation for D. virilis
bacteria and eukaryotes
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
GEP Annotation Workflow
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Gene Annotation with DNA Subway
Basics of Comparative Genomics
Genome Annotation and the Human Genome
Basic Local Alignment Search Tool
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Presentation transcript:

Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating the ‘first’ proteome. some ‘syntenic breakpoints’ in ACT view.

Overview Contig Ordering (knowlesi v falciparum) : benefits and a caveat. Speeds:- Annotation via generation of pseudomolecules Prefinishing dissemination of gene models via GeneDB. identification ‘syntenic breakpoints’ Towards ‘species specific genes’ Generation of a predicted proteome Positive impact on gene models in falciparum by identification of missed genes/exons CAVEAT: current methodology assumes synteny no evidence for physical linkage of contigs in pseudomolecules Integration of read pair data needed to confirm linkage to generate scaffolds

Read pairs can confirm or deny physical linkage of contigs assumed by ordering

Ellen Adlam’s contig ordering Script – Brief Methodology Four stages: 1.Pk contig set is filtered to remove those below 5 kb. 2.TBlastX on sections of Pk contigs against Pf chromosomes. Contigs split into 14 groups according to the top hit linked to a Pf chromosome. 3.Coordinates of hits examined. Pk Contigs ordered relative to the ‘corresponding’ Pf chromosome. 4.Coordinates are reexamined and N’s are inserted to represent gaps as expected by measurement against Pf.

Contigs ordered against Pf Chr7. Ordering tends to fail in highly variable regions Subtelomeres Internal var arrays

Integration of data to inform gene models Blast and fastaA Comparison of regions of synteny with falciparum Gene prediction algorithms SNAP Projector Intergrate into ACT Manual review Acurate gene predictions Proteome data EST data

ACT visualisation of ‘synteny’ to aid annotation

Contigs ordering results/estimates/next steps coverage (5x)18.6 Mb ordered av. 21 (980 gaps)gene preds 2300 (8x) 23 Mb orderedav. 29 kb (280 gaps) 5100 Manually reviewed models (297) for chr 6 (estimated time scale for manual review of all genepredictions: person days, 2 – 3.5 months) Passed on to aid in prefinishing. possible next steps: 1. May be possible to manually order smaller contigs into the gaps 2. Analyse using read pair data (sequencing and BAC end reads) to generate scaffolds (IN PROGRESS). 3. Identify BAC clones which may be telomeric/subtelomeric by mapping end reads onto the metachromosomes.

Identification of gene duplication/deletion P. falciparum chr7 P. knowlesi

Gene finders different types: ab initio - bases predictions on statistic profile calculated from a training set (criteria: consensus sequence start sites, splice junctions, sequence composition on codon and DNA level for coding, introns and non-coding, intron length distribution, exon length distribution) comparative - bases predictions on sequence similarity to coding in related organism and uses statistic profile from training set to a much lesser extent

Projector precise alignment step of algorithm means that it needs much memory it cannot go through an entire sequence before we can feed it the reference and query sequence we need to: align the corresponding chromosome contigs. identify which gene plus surrounding sequence in annotated corresponds to which section in unannotated (Ellen's script and gene modeller can provide some hints for this) take the two linked regions in unannotated and reference and give these to projector as input it can only predict for regions for which you have told it to at the moment it can only be run by the person who wrote it but it is being callibrated and underdevelopement for wider use. can show where it observed conservation on sequence level for both (for untranslated, exon and intron)

Exploring different gene finding tools for P. knowlesi originated from the complex and slow process of manually building a training set for unannotated organism making use of an annotated relative (P. falciparum) SNAPab initio GENE MODELLER comparative, sensitive blast, then tries to find start/ stop/ splice site near BLAST hit ends; needs refinement PROJECTORcomparative gives us a good opportunity to evaluate strengths and weaknesses of each trial on an ordered contig set for knowlesi chr6 which had been annotated.

Sensitivity and specificity performance for single exon and multi exon genes Single exon >1 exon

Sensititivity and specificity measured against a set of 156 manually annotated genes

How well are start and stop codons predicted?

Conclusions on gene prediction performance Specificity Projector (26)> SNAP (6) > Gene Modeller (0) “New” projector: 20 % (17 %) exact specificity of the gene models made Sensitivity SNAP (154 ) > Gene Modeller(143 ) > Projector (128 ) SNAP/Gene modeller although not specific are sensitive Gene Modeller due to the blast parameters chosen (low penalties for gap opening, extension and mismatch, word size 9) Can the strengths of Gene modeller or SNAP be combined with the specificity of projector?

Future work New run of the latest contig ordering set using projector informed with additional data as “intervals” to improve sensitivity.