Download presentation
Presentation is loading. Please wait.
Published byWalter Hicks Modified over 9 years ago
1
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008
2
PSU Projects Organism Annotated genome Finished genome Database entry Artemis & ACT
3
Annotation using Artemis: mapping domains in proteins
4
Primary DNA sequence Dotter BlastN BlastX Gene finders tRNA scan RepeatsPseudo-genesrRNA CDSs tRNA Preannotation manual curation
5
Primary DNA sequence Dotter BlastN BlastX Gene finders tRNA scan RepeatsPseudo-genesrRNA CDSs tRNA FastaBlastPPfamPrositePsortSignalPTMHMM Preannotation Manual curation Manual curation Annotated sequence
6
Gene model annotation Protein function
7
Annotation of Protein-coding genes: (from gene model to protein function) -search programs: local (BLAST) and global (FASTA) alignments, EST hits -Protein domains and motifs: InterPro (Pfam, Prosite, SMART etc.) -Transmembrane / signal peptide prediction (TMHMM, SignalP, Phobius) - Base annotation on characterised proteins where possible (manually curated SWISSPROT entry) -Read the literature (PUBMED) Use several lines of evidence!
8
Annotation of non-protein-coding genes: (tRNAs, rRNAs, snRNAs, other ncRNAs) -Initial searches: -BlastN, GC-plots -tRNA scan -sno scan -Others -Search in specialised databases: -Rfam scan -microRNAdb etc. -Comparative ncRNA prediction tools: -RNAZ -Evofold -QRNA etc. -Structure prediction of ncRNAs: - MFOLD -Others Use several lines of evidence! Structural conservation of ncRNAs!
9
Statistical significance of database hits E-values (Expectation value) E-value = No alignments with the equivalent score that you would expect to find by random chance. An e-value of 5 would mean that you would expect 5 alignments with the equivalent or higher score to have occurred by random chance more reliable than the % ID Caution: Repeat regions / non-curated protein sequences
10
Sequence similarity searching: BLAST (Basic Local Alignment Search Tool) analysis: Nucleotide sequences: blastn: nucleotide sequence compared to nucleotide database blastx: nucleotide sequence translated and all 6 frame translations compared to protein database tblastn: protein query vs translated database Protein sequences blastp: protein query vs protein database tblastx: translated query vs translated database (all 6 frames) FastA: Provides sequence similarity and homology searching against nucleotide and protein databases using the Fasta programs. Fasta can be very specific when identifying long regions of low similarity especially for highly diverged sequences.
11
(Global) FASTA BLAST (Local)
12
Orthologues and paralogues Human hemoglobin Mouse hemoglobin Human hemoglobin Human myoglobin orthologues paralogues Originate from gene duplication Diverged functions Originate from evolution Similar functions Best tool to look for orthologues? Blast or FastA? FastA!
13
Functional assignment: alignments of modular proteins A B A B C A B C
14
A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. The extracted model parameters can then be used to perform further analysis, for example for pattern recognition applications. An HMM can be considered as the simplest dynamic Bayesian network. WHAAAAT??? HMMs
15
..HMPLKHRLHP....RMPLKHRPHP....GMRLKHRHHP....PMGLKHAGHP.. Profile aligned sequences..-MPLKHR-HP.. HMM for the aligned motif that can be used to search databases for proteins containing this motif
16
FastA Blast Psi-blast HMM searches HMM-HMM comparison: HHPred server http://toolkit.tuebingen.mpg.de/hhpred Remote homology detection Psi-blast Psi-blast HMM searches..-MPLKHR-HP.. Create HMM Search database with HMM..RMPLKHRFHP....PMPLKHRIHP....HMPLKHDVHP....YMDLKHELHP....-MPLKHR-HP.. HMM-HMM comparison: HHPred server http://toolkit.tuebingen.mpg.de/hhpred
17
Psi-blast HMM building HMM-HMM comparison Alignment Secondary structure prediction Secondary structure comparison Extremely sensitive remote homology detection 3D structure modelling Input protein sequence
18
Module 3 Exercises: Section A: Sequence retrieval of a P. falciparum protein (cyclophilin) using SRS BLAST and Fasta searches by cutting & pasting the sequence. Section B: Exercise 1 Part I: Search PROSITE server by cutting & pasting the cyclophylin sequence Exercise 1 Part II: Pfam server Exercise 1 Part III: SMART server Exercise 1 Part IV: InterPro server Exercise 2: Sequence retrieval of P. falciparum PFC0125w protein using SRS. TMHMMv2.0 server. SignalPv3.0 server. Section C: Other web resources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.