Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.

Slides:



Advertisements
Similar presentations
Bioinformatics Multiple sequence alignments Scoring multiple sequence alignments Progressive methods ClustalW Other methods Hidden Markov Models Lecture.
Advertisements

Profiles for Sequences
Introduction to Bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Heuristic alignment algorithms and cost matrices
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Introduction to bioinformatics
Sequence similarity.
Multiple sequence alignments and motif discovery Tutorial 5.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Multiple sequence alignment
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
PROTEIN SEQUENCE ANALYSIS. Need good protein sequence analysis tools because: As number of sequences increases, so gap between seq data and experimental.
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Construction of Substitution Matrices
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
©CMBI 2005 Transfer of information The main topic of this course is transfer of information. A month in the lab can easily save you an hour in front of.
Motif discovery and Protein Databases Tutorial 5.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Sequence Alignment.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Demo: Protein Information Resource
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
Presentation transcript:

Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at Georgetown University Medical Center

2 Retrieval, Sequence Search & Classification Methods  Retrieve protein info by text / UID  Sequence Similarity Search BLAST, FASTA, Dynamic Programming BLAST, FASTA, Dynamic Programming  Family Classification Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks  Integrated Search and Classification System

3 Sequence Similarity Search  Based on Pair-Wise Comparisons  Dynamic Programming Algorithms Global Similarity: Needleman-Wunch Global Similarity: Needleman-Wunch Local Similarity: Smith-Waterman Local Similarity: Smith-Waterman  Heuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid) FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs Gapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated Search PHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated Search PSI-BLAST: Position-Specific Iterated Search

4 Sequence Similarity Search  Similarity Search Parameters Scoring Matrices – Based on Conserved Amino Acid Substitution Scoring Matrices – Based on Conserved Amino Acid Substitution Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity)Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 Gap Penalty Gap Penalty  Search Time Comparisons Smith-Waterman: 10 Min Smith-Waterman: 10 Min FASTA: 2 Min FASTA: 2 Min BLAST: 20 Sec BLAST: 20 Sec

5 Feature Representation  Features: Residue Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features  Alternative Alphabets: Classification of Amino Acids To Capture Different Features of Amino Acid Residues

6 Substitution Matrix  Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time  Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7)  Positive Score: Conservative Substitution (e.g., Lys/Arg, +3)  High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)

7 BLAST BLAST (Basic Local Alignment Search Tool)  To search a sequence against the database  Extremely fast  Robust  Most widely used It finds very short segment pairs between the query and sequence in the database These segments are then extended in both directions until the maximum possible score of this particular segment is reached

8 BLAST Search  From BLAST Search Interface  Table-Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment

9 BLAST/SSEARCH Results SSEARCH Alignment BLAST Alignment

10 Family Classification Methods  Based on Family Information  ClustalW Multiple Sequence Alignment  ProSite Pattern Search  Profile Search  Hidden Markov Models (HMMs)  Neural Networks  Integrated Analysis

11 Multiple Sequence Alignment  ClustalW  Progressive Pairwise Approach Base on Exhaustive Pairwise Alignments Base on Exhaustive Pairwise Alignments  Neighbor Joining Joining Order Corresponding to a Tree Joining Order Corresponding to a Tree  Alignment Varies Dependent on Joining Order Dependent on Joining Order

12 How do you build a tree?  Pick sequences to align  Align them  Verify the alignment  Keep the parts that are aligned correctly  Build and evaluate a phylogenetic tree

13 Multiple Alignment and Tree  From Text/Sequence Search Result or ClustalW Alignment Interface

14

15 Motif Patterns (Regular Expressions)  Signature Patterns for Functional Motifs ProClass Motif Alignments

16 PIR Pattern Search  From Text/Sequence Search Result or Pattern Search Interface  One Query Sequence Against PROSITE Pattern Database  One Query Pattern (PROSITE or User-Defined) Against Sequence DB

17 Pattern Search Result (I)  One Query Sequence Against PROSITE Pattern Database

18 Pattern Search Result (II)  One Query Pattern Against Sequence Database

19 Profile Method  Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments Num of Rows = Num of Aligned Positions Num of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue. Each row contains a score for the alignment with each possible residue.  Profile Searching Summation of Scores for Each Amino Acid Residue along Query Sequence Summation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved Positions Higher Match Values at Conserved Positions

20 PIR HMM Domain/Motif Search  From Text/Sequence Search Result or HMM Search Interface  HMMER Model Building & Sequence Search  Search One Query Protein Against All HMMs  Search One HMM Against Sequence DB

21 HMM Search Result (I)  One Query Protein Against All Pfam HMMs

22 HMM Search Result (II)  Search User-Built HMM Against Protein Sequence DB  Input Sequences (Optional Residue Ranges) -> Multiple Sequence Alignment -> Model Building -> HMM Search

23 Secondary Structure Features   Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an  Helix (Amphipathic)   Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

24 Integrated Bioinformatics System for Function and Pathway Discovery  Data Integration  Associative Analysis

25 Analytical Pipeline Query Sequence PIR-NREF iProClass Top-Matched Superfamilies/Domains BLAST SearchHMM Domain Search Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs SSEARCH CLUSTALW Superfamily/Domain/Motif Alignments Family Relationships & Functional Features Family Classification & Functional Analysis HMM Motif Search Pattern SearchSignalP/TMHMM

26 Integrated Bioinformatics System  Global Bioinformatics Analysis of 1000’s of Genes and Proteins  Pathway Discovery, Target Identification

27

28 Lab Section Lab Section

29 Peptide Search & Results

30 Blast Similarity Search

31 Blast Search Results

32 Pair-Wise Alignment

33 Multiple Sequence Alignment

34 Pattern Search Results

35 HMM Domain Search Result

36 Building HMM Profile

37 Using HMM Profile for Searching

38 Rabbit Alpha Crystallin A Chain An iProClass View of the entry