Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Based Analysis Tutorial

Similar presentations


Presentation on theme: "Sequence Based Analysis Tutorial"— Presentation transcript:

1 Sequence Based Analysis Tutorial
NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at Georgetown University Medical Center

2 Retrieval, Sequence Search & Classification Methods
Retrieve protein info by text / UID Sequence Similarity Search BLAST, FASTA, Dynamic Programming Family Classification Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks Integrated Search and Classification System

3 Sequence Similarity Search
Based on Pair-Wise Comparisons Dynamic Programming Algorithms Global Similarity: Needleman-Wunch Local Similarity: Smith-Waterman Heuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated Search

4 Sequence Similarity Search
Similarity Search Parameters Scoring Matrices – Based on Conserved Amino Acid Substitution Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 Gap Penalty Search Time Comparisons Smith-Waterman: 10 Min FASTA: 2 Min BLAST: 20 Sec 10

5 Feature Representation
Features: Residue Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features Alternative Alphabets: Classification of Amino Acids To Capture Different Features of Amino Acid Residues

6 Substitution Matrix Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys) 10

7 BLAST BALST (Basic Local Alignment Search Tool) Extremely fast Robust
Most frequently used It finds very short segment pairs (“seeds”) between the query and the database sequence These seeds are then extended in both directions until the maximum possible score for extensions of this particular seed is reached

8 BLAST Search From BLAST Search Interface
Table-Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment

9 BLAST/SSEARCH Results
SSEARCH Alignment BLAST Alignment

10 Family Classification Methods
Based on Family Information ClustalW Multiple Sequence Alignment ProSite Pattern Search Profile Search Hidden Markov Models (HMMs) Neural Networks Integrated Analysis

11 Multiple Sequence Alignment
ClustalW Progressive Pairwise Approach Base on Exhaustive Pairwise Alignments Neighbor Joining Joining Order Corresponding to a Tree Alignment Varies Dependent on Joining Order

12 How do you build a tree? Pick sequences to align Align them
Verify the alignment Keep the parts that are aligned correctly Build and evaluate a phylogenetic tree

13 Multiple Alignment and Tree
From Text/Sequence Search Result or ClustalW Alignment Interface

14 Here is an example of two different functions easily separated on a phylogenetic tree. Each functional group is used to build an HMM.

15 Motif Patterns (Regular Expressions)
Signature Patterns for Functional Motifs ProClass Motif Alignments

16 PIR Pattern Search From Text/Sequence Search Result or Pattern Search Interface One Query Sequence Against PROSITE Pattern Database One Query Pattern (PROSITE or User-Defined) Against Sequence DB

17 Pattern Search Result (I)
One Query Sequence Against PROSITE Pattern Database

18 Pattern Search Result (II)
One Query Pattern Against Sequence Database

19 Profile Method Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments Num of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue. Profile Searching Summation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved Positions

20 PIR HMM Domain/Motif Search
From Text/Sequence Search Result or HMM Search Interface HMMER Model Building & Sequence Search Search One Query Protein Against All HMMs Search One HMM Against Sequence DB

21 HMM Search Result (I) One Query Protein Against All Pfam HMMs

22 HMM Search Result (II) Search User-Built HMM Against Protein Sequence DB Input Sequences (Optional Residue Ranges) -> Multiple Sequence Alignment -> Model Building -> HMM Search

23 Secondary Structure Features
a Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an a Helix (Amphipathic) b Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

24 Integrated Bioinformatics System for Function and Pathway Discovery
Data Integration Associative Analysis

25 Analytical Pipeline Family Classification & Functional Analysis
Query Sequence PIR-NREF iProClass Top-Matched Superfamilies/Domains BLAST Search HMM Domain Search Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs SSEARCH CLUSTALW Superfamily/Domain/Motif Alignments Family Relationships & Functional Features Family Classification & Functional Analysis HMM Motif Search Pattern Search SignalP/TMHMM Analytical Pipeline

26 Integrated Bioinformatics System
Global Bioinformatics Analysis of 1000’s of Genes and Proteins Pathway Discovery, Target Identification

27 Lab Section

28 Peptide Search & Results

29 Blast Similarity Search

30 Blast Search Results

31 Pair-Wise Alignment

32 Multiple Sequence Alignment

33 Pattern Search Results

34 HMM Domain Search Result

35 Building HMM Profile

36 Using HMM Profile for Searching

37 Rabbit Alpha Crystallin A Chain An iProClass View of the entry

38 alpha-Crystallin and Related Proteins


Download ppt "Sequence Based Analysis Tutorial"

Similar presentations


Ads by Google