Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

BLAST Sequence alignment, E-value & Extreme value distribution.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
BLAST.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Sequence alignment, E-value & Extreme value distribution
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Effect of gap penalty on Local Alignment Score:Score: 161 at (seq1)[2..36] : (seq2)[53..90] 2 ASTV----TSCLEPTEVFMDLWPEDHSNWQELSPLEPSD || | | |||||||||||||||||||||||||||
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Protein Bioinformatics Course
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Part I: Identifying sequences with … Speaker : S. Gaj Date
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein and RNA Families
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Step 3: Tools Database Searching
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
What is BLAST? Basic BLAST search What is BLAST?
Protein Families, Motifs & Domains.
Basics of BLAST Basic BLAST Search - What is BLAST?
Demo: Protein Information Resource
Sequence based searches:
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Protein Bioinformatics Course
BLAST.
Comparative Genomics.
Basic Local Alignment Search Tool (BLAST)
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008

PSU Projects Organism Annotated genome Finished genome Database entry Artemis & ACT

Annotation using Artemis: mapping domains in proteins

Primary DNA sequence Dotter BlastN BlastX Gene finders tRNA scan RepeatsPseudo-genesrRNA CDSs tRNA Preannotation manual curation

Primary DNA sequence Dotter BlastN BlastX Gene finders tRNA scan RepeatsPseudo-genesrRNA CDSs tRNA FastaBlastPPfamPrositePsortSignalPTMHMM Preannotation Manual curation Manual curation Annotated sequence

Gene model annotation Protein function

Annotation of Protein-coding genes: (from gene model to protein function) -search programs: local (BLAST) and global (FASTA) alignments, EST hits -Protein domains and motifs: InterPro (Pfam, Prosite, SMART etc.) -Transmembrane / signal peptide prediction (TMHMM, SignalP, Phobius) - Base annotation on characterised proteins where possible (manually curated SWISSPROT entry) -Read the literature (PUBMED) Use several lines of evidence!

Annotation of non-protein-coding genes: (tRNAs, rRNAs, snRNAs, other ncRNAs) -Initial searches: -BlastN, GC-plots -tRNA scan -sno scan -Others -Search in specialised databases: -Rfam scan -microRNAdb etc. -Comparative ncRNA prediction tools: -RNAZ -Evofold -QRNA etc. -Structure prediction of ncRNAs: - MFOLD -Others Use several lines of evidence! Structural conservation of ncRNAs!

Statistical significance of database hits E-values (Expectation value) E-value = No alignments with the equivalent score that you would expect to find by random chance. An e-value of 5 would mean that you would expect 5 alignments with the equivalent or higher score to have occurred by random chance more reliable than the % ID Caution: Repeat regions / non-curated protein sequences

Sequence similarity searching: BLAST (Basic Local Alignment Search Tool) analysis: Nucleotide sequences: blastn: nucleotide sequence compared to nucleotide database blastx: nucleotide sequence translated and all 6 frame translations compared to protein database tblastn: protein query vs translated database Protein sequences blastp: protein query vs protein database tblastx: translated query vs translated database (all 6 frames) FastA: Provides sequence similarity and homology searching against nucleotide and protein databases using the Fasta programs. Fasta can be very specific when identifying long regions of low similarity especially for highly diverged sequences.

(Global) FASTA BLAST (Local)

Orthologues and paralogues Human hemoglobin Mouse hemoglobin Human hemoglobin Human myoglobin orthologues paralogues Originate from gene duplication Diverged functions Originate from evolution Similar functions Best tool to look for orthologues? Blast or FastA? FastA!

Functional assignment: alignments of modular proteins A B A B C A B C

A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. The extracted model parameters can then be used to perform further analysis, for example for pattern recognition applications. An HMM can be considered as the simplest dynamic Bayesian network. WHAAAAT??? HMMs

..HMPLKHRLHP....RMPLKHRPHP....GMRLKHRHHP....PMGLKHAGHP.. Profile aligned sequences..-MPLKHR-HP.. HMM for the aligned motif that can be used to search databases for proteins containing this motif

FastA Blast Psi-blast HMM searches HMM-HMM comparison: HHPred server Remote homology detection Psi-blast Psi-blast HMM searches..-MPLKHR-HP.. Create HMM Search database with HMM..RMPLKHRFHP....PMPLKHRIHP....HMPLKHDVHP....YMDLKHELHP....-MPLKHR-HP.. HMM-HMM comparison: HHPred server

Psi-blast HMM building HMM-HMM comparison Alignment Secondary structure prediction Secondary structure comparison Extremely sensitive remote homology detection 3D structure modelling Input protein sequence

Module 3 Exercises: Section A: Sequence retrieval of a P. falciparum protein (cyclophilin) using SRS BLAST and Fasta searches by cutting & pasting the sequence. Section B: Exercise 1 Part I: Search PROSITE server by cutting & pasting the cyclophylin sequence Exercise 1 Part II: Pfam server Exercise 1 Part III: SMART server Exercise 1 Part IV: InterPro server Exercise 2: Sequence retrieval of P. falciparum PFC0125w protein using SRS. TMHMMv2.0 server. SignalPv3.0 server. Section C: Other web resources