Presentation is loading. Please wait.

Presentation is loading. Please wait.

”Gene Finding in Eukaryotic Genomes”

Similar presentations


Presentation on theme: "”Gene Finding in Eukaryotic Genomes”"— Presentation transcript:

1 ”Gene Finding in Eukaryotic Genomes”
DTU course #27011 Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark

2 Today’s plan Lecture on gene finding Gene features, Repeatmasker, etc. Get notebooks (building 208; secretary)+Pause Work on project Nikolaj present from Lars present from

3 Practical Stuff Webpage, Literature, Textbooks Report writing format
Contribution from each student specified E.g. Lars & Dorte mainly wrote the Introduction and Methods: Lise & Jens wrote the Results and Discussion sections Repeatmasker

4 Gene Features Codon frequency/bias Transcriptional Exon/introns
Organism dependent Hexamer statistics Transcriptional Promoters/enhancers Exon/introns Length distributions ORFs Splicing Donor/acceptor sites Branchpoints Translational Start codon context

5 Codon Bias tRNA availability Expression level
Gene Finders are often organism specific Coding regions often modelled by 5th order Markov chain (hexamers/di-codons)

6 Human genes: Short exons Long introns

7 Human genes: Introns lengths have broad distribution Min. Length ca. 60 bp

8 Intron Prevalence

9

10 Gene Prediction – Performance of Genscan

11 NIX – Visualizing Gene Predictions
NO method is always best!

12 Performance of Genscan – Exon Length
Low performance at short exon lengths

13 Future Challenges Bootstrapping: prediction improves as more genes become known ’Extreme’ genes (long/short) still difficult Initial and terminal exons are predicted with lower confidence Combine with Sequence Similarity Matches Non-coding RNAs Most gene prediction programs only predict protein-coding genes tRNA and rRNA genes are not predicted Predict alternatice splicing, enhancers and silencers Predict matrix- and scaffold-attachment regions, insulators and boundary elements

14 Gene Prediction Take home messages Prediction methods are not perfect!
Genes may be predicted by computer programs Masking of repetitive sequences may be required for large genomic sequences ’Unusual’ genes are difficult (high GC%, short or terminal exons) HMM-based gene prediction programs are suitable for “Gene Grammar” Prediction methods are not perfect!

15 Repeatmasker Repetitive sequences in human/eukaryotic genomes are a problem Run gene predictions on large genomic regions before and after masking of repetitive sequence: Up to 45% of human genomic sequence derived from transposable/repetitive elements

16 Repeatmasker http://www.repeatmasker.org/
Screens DNA sequences for interspersed repeats and low complexity DNA sequences Matches against database of known repeat elements Repeats in genomic sequence may cause wrong gene predictions

17

18 Select ”html” format

19 >chr19_not_repeatmasked hg16_dna range=chr19: 'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=none AGGTGTGTTGGCACACGCCTGTAATCCCAGCTACTGAGGAGGCTGAGGCATGAGAATCGCTTGAACCTGAGAGGCGGAGGTTGTAGTGAGTCGAGATTGCACCACTGCACTCCAGCCTGGGTGACAAAGTGAGACCCTGTCTCAAAAAAAAAAAAAAAAAAAAAGTGAATGTTCCACAGCATCACAGATGAATTTTGCAAATATGTTGCATGAAAGAAGAATAAACACTCTGTGATTCCATTTATTTAAACTATAAAAACAAGGAGAGCTAATTTATGCTGTTAGAGGAGTGGTTGCTTTGGGGTATGGGGAGGGGGTGGCAAGGATTAGTGACTGTCGTGGGCCCAAGTGGGGTTTCAGGGGTGCTGGCATTATTCCATCTCTTGGTCTGGGTGCTGGTCCTGTAGGGTATGTTCAGTCTGAAAATCCATCCCACCAGACATTTACGAATCATGCCCTTTCCTGGGTGTATATTATACATCAATAACAATTTTTTTTTTTTTTTGAGATGGAGTCTTGCTTTGTTGCCCAGGCTGGAGTGCAGTGGTGCAGTCTCCACCTCCCAGATTTAAGTGATTCTCATACCTCAGCCTCCCTAGTAGCTGGGATTACAGGCGTGTGCCACCACACCTGGCTCATTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCATGGTGAAACTTTGAAGGCCAATGGTGAAACATGAGGCCAAACTCCTGGCCTCAAGTGGTCCACCCACCT >chr19_repeatmasked hg16_dna range=chr19: 'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=N nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGTGAATGTTCnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

20 Repetitive Elements LINE = Long interspersed elements
______ 45% LINE = Long interspersed elements SINE = Short interspersed elements

21

22

23

24 The End

25


Download ppt "”Gene Finding in Eukaryotic Genomes”"

Similar presentations


Ads by Google