Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.

Slides:



Advertisements
Similar presentations
Bioinformatics Multiple sequence alignments Scoring multiple sequence alignments Progressive methods ClustalW Other methods Hidden Markov Models Lecture.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Profiles for Sequences
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Fa05CSE 182 CSE182-L5: Position specific scoring matrices Regular Expression Matching Protein Domains.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Tutorial 5 Motif discovery.
Protein Modules An Introduction to Bioinformatics.
Multiple sequence alignments and motif discovery Tutorial 5.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Protein Sequence Alignment and Database Searching.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
MCB 5472 Lecture #4: Probabilistic models of homology: Psi-BLAST and HMMs February 17, 2014.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Profile Searches Revised 07/11/06. Overview Introduction Motif representation Motif screening Motif Databases Exercise.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Motif discovery and Protein Databases Tutorial 5.
Protein Domain Database
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Step 3: Tools Database Searching
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
DNA / protein sequence analysis 第九組成員: 吳宇軒 侯卜夫 朱子豪 王俊偉
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Free for Academic Use. Jianlin Cheng.
Sequence similarity, BLAST alignments & multiple sequence alignments
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Protein Families, Motifs & Domains.
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Dot Plots, Path Matrices, Score Matrices
Predicting Active Site Residue Annotations in the Pfam Database
Sequence Based Analysis Tutorial
Dr Tan Tin Wee Director Bioinformatics Centre
Sequence Based Analysis Tutorial
Protein structure prediction.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Presentation transcript:

Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST –SAM

Exercises Overview Query Sequence Unknown Blast Sequence to search for close homologs Search pFAM, Prosite for conserved motifs You detected homology with an annotated protein family Make a multiple sequence alignment Generate profile or HMM Search database for remote homologs Blast ClustalW PFAM PROSITE HMMer, PSSM Profile Search PSI-blast

Exercises OUT IN Cytc Fe Cu B Fe e-e- I e-e- e-e- O2O2 H2OH2O Terminal Oxidases Unknown protein is a heme cupper oxidase Enzyme that reduces O2 to H2O in respiratory chain Subunit contains 2 hemes and a Cu prosthetic group The residues that are ligands of these groups have been conserved in all types of terminal oxidase complexes

Exercises H+H+H+H+ e-e- Nadh.dh succ.dh NADH e-e-e-e- e-e-e-e- succinate O2O2 H2OH2O O2O2 H2OH2O O2O2 H2OH2O e-e-e-e- cytochrome c oxidase ? ? quinol oxidase e-e- H+H+H+H+ cytbc 1 quinol e-e-e-e- Cytc Terminal Oxidases

Exercises Multiple Alignment

Exercises Multiple alignment: standard gap cost Multiple Alignment Ligands Cu center Ligands hemes Prosite pattern

Exercises Multiple alignment: large gap cost Multiple Alignment Ligands Cu center Prosite pattern Ligands hemes

Exercises Phylogenetic Tree Tree based on subselection

Exercises PROSITE

Exercises Prosite

Exercises Prosite

Exercises Prosite

Exercises Prosite

Exercises Prosite

Exercises

Prosite

Exercises Prosite domain Prosite

Exercises

Pattern & profile

Exercises

pFAM

Exercises Pfam

Exercises Pfam

Exercises Pfam

Exercises COX family Pfam

Exercises Pfam

Exercises Pfam

Exercises Pfam

Exercises Pfam

Exercises Pfam

Exercises Pfam

Exercises BLOCKS

Exercises Blocks

Exercises

Overview Query Sequence Unknown Blast Sequence to search for close homologs Search pFAM, Prosite for conserved motifs You detected homology with an annotated protein family Make a multiple sequence alignment Generate profile or HMM Search database for remote homologs Blast PFAM PROSITE HMMer, PSSM Profile Search PSI-blast

Exercises PSI-BLAST

Exercises PSI BLAST –Start from a single sequence –Blast it against NCBI –Select high scoring hits –Perform multiple alignment –Construct profile –Iterate and find remote homologs Usually cut the sequence in pieces Avoid to give as input multi domain proteins PSI-BLAST

Exercises PSI-BLAST

Exercises PSI-BLAST

Exercises PSI-BLAST

Exercises PSI-BLAST

Exercises PSI-BLAST

Exercises PSI-BLAST

Exercises SAM

Exercises SAM

Exercises SAM

Exercises SAM

Exercises

SAM Markov model Emission probability per AA Transition probabilities Insertion probability per AA position short.t2k-w0.5.mod

Exercises SAM input targets

Exercises SAM

Exercises SAM

Exercises SAM Hit with highest score! Hit with a protein family for which the 3D structure has been determined

Exercises Try to view the structure of the family SAM

Exercises SAM

Exercises SAM

Exercises Logos of the secondary structure prediction SAM

Exercises SAM

Exercises HMMer states Emission probability per AA Null model Transition probabilities Insertion probability per AA

Exercises