”Gene Finding in Eukaryotic Genomes”

Slides:



Advertisements
Similar presentations
An Introduction to Bioinformatics Finding genes in prokaryotes.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Lecture 4: DNA transcription
Molecular Genetics DNA RNA Protein Phenotype Genome Gene
Ab initio gene prediction Genome 559, Winter 2011.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Gene Expression Overview
ECE 501 Introduction to BME
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark
Eukaryotic Gene Finding
Lecture 12 Splicing and gene prediction in eukaryotes
What was the most interesting thing that you did over Winter Break? Create a double bubble map comparing/contrasting DNA and RNA.
Eukaryotic Gene Finding
 Assemble the DNA  Follow base pair rules  Blue—Guanine  Red—Cytosine  Purple—Thymine  Green--Adenine.
Biological Motivation Gene Finding in Eukaryotic Genomes
 ribose  Adenine  Uracil  Adenine  Single.
Gene Structure and Identification
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
© 2012 Pearson Education, Inc. Lecture by Edward J. Zalisko PowerPoint Lectures for Campbell Biology: Concepts & Connections, Seventh Edition Reece, Taylor,
Transcription and Translation
Protein Synthesis 12-3.
RNA and Protein Synthesis
You should be able to label these pictures Label the following: –RNA polymerase –DNA –mRNA –tRNA –5’ end –3’ end –Amino acid –Ribosome –Polypeptide chain.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
12-3 RNA and Protein Synthesis
Genome Annotation Rosana O. Babu.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
PROTEIN SYNTHESIS HOW GENES ARE EXPRESSED. BEADLE AND TATUM-1930’S One Gene-One Enzyme Hypothesis.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Finding genes in the genome
Starter What do you know about DNA and gene expression?
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
The beginning of protein synthesis. OVERVIEW  Uses a strand of nuclear DNA to produce a single-stranded RNA molecule  Small section of DNA molecule.
RNA, Transcription, and the Genetic Code. RNA = ribonucleic acid -Nucleic acid similar to DNA but with several differences DNARNA Number of strands21.
Unit-II Synthetic Biology: Protein Synthesis Synthetic Biology is - A) the design and construction of new biological parts, devices, and systems, and B)
Unit 1: DNA and the Genome Structure and function of RNA.
Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark
12-3 RNA and Protein Synthesis Page 300. A. Introduction 1. Chromosomes are a threadlike structure of nucleic acids and protein found in the nucleus of.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Molecular Genetics Transcription & Translation
Transcription Part of the message encoded within the sequence of bases in DNA must be transcribed into a sequence of bases in RNA before translation can.
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
Pharmacogenetics and Pharmacoepidemiology
Protein Synthesis Genetics.
Eukaryotic Gene Finding
Ab initio gene prediction
Recitation 7 2/4/09 PSSMs+Gene finding
Transcription & Translation.
Synthetic Biology: Protein Synthesis
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
Pharmacogenetics and Pharmacoepidemiology
Gene Expression Activation of a gene to transcribe DNA into RNA.
GENE EXPRESSION / PROTEIN SYNTHESIS
The Structure of the Genome
4. HMMs for gene finding HMM Ability to model grammar
Structure of the Genome
”Gene Finding in Eukaryotic Genomes”
Gene Structure.
Gene Structure.
Presentation transcript:

”Gene Finding in Eukaryotic Genomes” DTU course #27011 23.03.2004 Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark nikob@cbs.dtu.dk

Today’s plan 13.00-13.30 Lecture on gene finding Gene features, Repeatmasker, etc. 13.30-14.00 Get notebooks (building 208; secretary)+Pause 14.00-16.00 Work on project Nikolaj present from 14.00-14.30 Lars present from 14.45-15.15

Practical Stuff Webpage, Literature, Textbooks Report writing format Contribution from each student specified E.g. Lars & Dorte mainly wrote the Introduction and Methods: Lise & Jens wrote the Results and Discussion sections Repeatmasker http://www.repeatmasker.org/

Gene Features Codon frequency/bias Transcriptional Exon/introns Organism dependent Hexamer statistics Transcriptional Promoters/enhancers Exon/introns Length distributions ORFs Splicing Donor/acceptor sites Branchpoints Translational Start codon context

Codon Bias tRNA availability Expression level Gene Finders are often organism specific Coding regions often modelled by 5th order Markov chain (hexamers/di-codons)

Human genes: Short exons Long introns

Human genes: Introns lengths have broad distribution Min. Length ca. 60 bp

Intron Prevalence

Gene Prediction – Performance of Genscan

NIX – Visualizing Gene Predictions http://www.hgmp.mrc.ac.uk/NIX/ NO method is always best!

Performance of Genscan – Exon Length Low performance at short exon lengths

Future Challenges Bootstrapping: prediction improves as more genes become known ’Extreme’ genes (long/short) still difficult Initial and terminal exons are predicted with lower confidence Combine with Sequence Similarity Matches Non-coding RNAs Most gene prediction programs only predict protein-coding genes tRNA and rRNA genes are not predicted Predict alternatice splicing, enhancers and silencers Predict matrix- and scaffold-attachment regions, insulators and boundary elements

Gene Prediction Take home messages Prediction methods are not perfect! Genes may be predicted by computer programs Masking of repetitive sequences may be required for large genomic sequences ’Unusual’ genes are difficult (high GC%, short or terminal exons) HMM-based gene prediction programs are suitable for “Gene Grammar” Prediction methods are not perfect!

Repeatmasker Repetitive sequences in human/eukaryotic genomes are a problem Run gene predictions on large genomic regions before and after masking of repetitive sequence: Up to 45% of human genomic sequence derived from transposable/repetitive elements

Repeatmasker http://www.repeatmasker.org/ Screens DNA sequences for interspersed repeats and low complexity DNA sequences Matches against database of known repeat elements Repeats in genomic sequence may cause wrong gene predictions

Select ”html” format

>chr19_not_repeatmasked hg16_dna range=chr19:6318243-6334922 5'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=none AGGTGTGTTGGCACACGCCTGTAATCCCAGCTACTGAGGAGGCTGAGGCATGAGAATCGCTTGAACCTGAGAGGCGGAGGTTGTAGTGAGTCGAGATTGCACCACTGCACTCCAGCCTGGGTGACAAAGTGAGACCCTGTCTCAAAAAAAAAAAAAAAAAAAAAGTGAATGTTCCACAGCATCACAGATGAATTTTGCAAATATGTTGCATGAAAGAAGAATAAACACTCTGTGATTCCATTTATTTAAACTATAAAAACAAGGAGAGCTAATTTATGCTGTTAGAGGAGTGGTTGCTTTGGGGTATGGGGAGGGGGTGGCAAGGATTAGTGACTGTCGTGGGCCCAAGTGGGGTTTCAGGGGTGCTGGCATTATTCCATCTCTTGGTCTGGGTGCTGGTCCTGTAGGGTATGTTCAGTCTGAAAATCCATCCCACCAGACATTTACGAATCATGCCCTTTCCTGGGTGTATATTATACATCAATAACAATTTTTTTTTTTTTTTGAGATGGAGTCTTGCTTTGTTGCCCAGGCTGGAGTGCAGTGGTGCAGTCTCCACCTCCCAGATTTAAGTGATTCTCATACCTCAGCCTCCCTAGTAGCTGGGATTACAGGCGTGTGCCACCACACCTGGCTCATTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCATGGTGAAACTTTGAAGGCCAATGGTGAAACATGAGGCCAAACTCCTGGCCTCAAGTGGTCCACCCACCT >chr19_repeatmasked hg16_dna range=chr19:6318243-6334922 5'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=N nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGTGAATGTTCnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

Repetitive Elements LINE = Long interspersed elements ______ 45% LINE = Long interspersed elements SINE = Short interspersed elements

The End