An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.

Slides:



Advertisements
Similar presentations
Bioinformatics lectures at Rice University
Advertisements

Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Ab initio gene prediction Genome 559, Winter 2011.
Unit 1: DNA and the Genome Key area 8: Genomic sequencing.
Patterns, Profiles, and Multiple Alignment.
數據分析 David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Profiles for Sequences
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Applications of Hidden Markov Models in the Avian/Mammalian Genome Comparison Christine Bloom Animal Science College of Agriculture University of Delaware.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Comparative ab initio prediction of gene structures using pair HMMs
“Gene Finding in Novel Genomes” by Ian Korf Presented by: Christine Lee SoCAL BSI 2004.
Eukaryotic Gene Finding
Lecture 12 Splicing and gene prediction in eukaryotes
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Using blast to study gene evolution – an example.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
Genes & Chromosomes Chromosomes - single large DNA molecule and its associated proteins, containing many genes; stores & transmits genetic info Gene -
.1Sources of DNA and Sequencing Methods.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 2 Genome Assembly.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
CuffDiff ran successfully. Output files include gene_exp.diff What are the next steps? Use Navigation bar to find files; they may be under DNA Subway if.
Applied Bioinformatics
Lecture 21 – Genome Annotation & Sequenced Genomes Based on Chapther 8 Genomics: The Mapping and Sequencing of Genomes Copyright © 2010 Pearson Education.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences Authors: Michael M. Yin and Jason T. L. Wang Sources: Information.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G.
Introducing Hidden Markov Models First – a Markov Model State : sunny cloudy rainy sunny ? A Markov Model is a chain-structured process where future states.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
bacteria and eukaryotes
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
What is a Hidden Markov Model?
Bioinformatics lectures at Rice University
Hidden Markov Models (HMM)
Ab initio gene prediction
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Evolution of eukaryote genomes
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV)
Profile HMMs GeneScan TMMOD
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Computational genomics
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Gene Structure Prediction Using Neural Networks and Hidden Markov Models June 18, 권동섭 신수용 조동연.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon

Basic Reference Information “Gene Finding in Novel Genomes” Written by Ian Korf BMC Bioinformatics Published in May of

Purpose of Gene Finding Given a genome, we would like to predict which areas actually code for proteins and which areas do not This is important because we can then focus on the areas that actually code for something Can also point us at places in the genome to look for unknown genes

Gene Finding Techniques Gene Finding is very difficult to do accurately Current methods employ Hidden Markov Models to discover genes We are able to recognize patterns by training our HMM with test data where we already know which areas are genes and which are not

Gene finding in new Genomes The problem is that we are sequencing genomes faster than we can research them and therefore we have a lack of training sets to create good HMMs Currently, the best way to find genes in new genomes is to use a program designed for a different genome and hope it gives a good approximation

SNAP – Korf’s Approach Korf believes that the current approach does not provide a good approximation for finding genes in new genomes Designed SNAP, which runs several other gene finding programs and estimates parameters based on their results SNAP also uses a Hidden Markov Model

SNAP HMM State Diagram E: Exon State I: Intron State N: Intergenic

Methods of Testing Used genomes from A. thaliana, O. sativa, C. elegans, and D. melanogaster. Simple genomes Compared his software to other leading gene finding software including Genescan, Genefinder, HMMGene, and Augustus Compared how well the programs performed

Data Used in Testing Table 1. Data set characteristics At Arabidopsis thaliana, Ce Caenorhabditis elegans, Dm Drosophila melanogaster, Os Oryza sativa.Arabidopsis thalianaCaenorhabditis elegansDrosophila melanogasterOryza sativa GenomeSequenceGenesGCSingle-exon GenesMean ExonMean Intron At 1.89 Mb %19.8%230 bp157 bp Ce 3.02 Mb %2.2%220 bp334 bp Dm 3.66 Mb %24.9%394 bp948 bp Os 1.55 Mb %22.9%237 bp350 bp

Performance of SNAP

Parameters taken from other species

Analysis of parameters that his program used and demonstration of how they would be better suited for new genomes

Next Steps Since he used a relatively simple genome, the next step is to analyze larger genomes to see if he gets similar results Gene finding is still very difficult and additional research will be made regarding how to better estimate HMM parameters

My Opinions Results were very clear and organized Program is available free online Needed a better explanation of how his program took results from other programs and used that information Better documentation for his program so that more people are able to use and specialize it for specific genomes