Download presentation
Presentation is loading. Please wait.
Published byChristina McBride Modified over 9 years ago
1
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at Georgetown University Medical Center
2
2 Retrieval, Sequence Search & Classification Methods Retrieve protein info by text / UID Sequence Similarity Search BLAST, FASTA, Dynamic Programming BLAST, FASTA, Dynamic Programming Family Classification Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks Integrated Search and Classification System
3
3 Sequence Similarity Search Based on Pair-Wise Comparisons Dynamic Programming Algorithms Global Similarity: Needleman-Wunch Global Similarity: Needleman-Wunch Local Similarity: Smith-Waterman Local Similarity: Smith-Waterman Heuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid) FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs Gapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated Search PHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated Search PSI-BLAST: Position-Specific Iterated Search
4
4 Sequence Similarity Search Similarity Search Parameters Scoring Matrices – Based on Conserved Amino Acid Substitution Scoring Matrices – Based on Conserved Amino Acid Substitution Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity)Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 Gap Penalty Gap Penalty Search Time Comparisons Smith-Waterman: 10 Min Smith-Waterman: 10 Min FASTA: 2 Min FASTA: 2 Min BLAST: 20 Sec BLAST: 20 Sec
5
5 Feature Representation Features: Residue Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features Alternative Alphabets: Classification of Amino Acids To Capture Different Features of Amino Acid Residues
6
6 Substitution Matrix Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)
7
7 BLAST BLAST (Basic Local Alignment Search Tool) To search a sequence against the database Extremely fast Robust Most widely used It finds very short segment pairs between the query and sequence in the database These segments are then extended in both directions until the maximum possible score of this particular segment is reached
8
8 BLAST Search From BLAST Search Interface Table-Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment
9
9 BLAST/SSEARCH Results SSEARCH Alignment BLAST Alignment
10
10 Family Classification Methods Based on Family Information ClustalW Multiple Sequence Alignment ProSite Pattern Search Profile Search Hidden Markov Models (HMMs) Neural Networks Integrated Analysis
11
11 Multiple Sequence Alignment ClustalW Progressive Pairwise Approach Base on Exhaustive Pairwise Alignments Base on Exhaustive Pairwise Alignments Neighbor Joining Joining Order Corresponding to a Tree Joining Order Corresponding to a Tree Alignment Varies Dependent on Joining Order Dependent on Joining Order
12
12 How do you build a tree? Pick sequences to align Align them Verify the alignment Keep the parts that are aligned correctly Build and evaluate a phylogenetic tree
13
13 Multiple Alignment and Tree From Text/Sequence Search Result or ClustalW Alignment Interface
14
14
15
15 Motif Patterns (Regular Expressions) Signature Patterns for Functional Motifs ProClass Motif Alignments
16
16 PIR Pattern Search From Text/Sequence Search Result or Pattern Search Interface One Query Sequence Against PROSITE Pattern Database One Query Pattern (PROSITE or User-Defined) Against Sequence DB
17
17 Pattern Search Result (I) One Query Sequence Against PROSITE Pattern Database
18
18 Pattern Search Result (II) One Query Pattern Against Sequence Database
19
19 Profile Method Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments Num of Rows = Num of Aligned Positions Num of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue. Each row contains a score for the alignment with each possible residue. Profile Searching Summation of Scores for Each Amino Acid Residue along Query Sequence Summation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved Positions Higher Match Values at Conserved Positions
20
20 PIR HMM Domain/Motif Search From Text/Sequence Search Result or HMM Search Interface HMMER Model Building & Sequence Search Search One Query Protein Against All HMMs Search One HMM Against Sequence DB
21
21 HMM Search Result (I) One Query Protein Against All Pfam HMMs
22
22 HMM Search Result (II) Search User-Built HMM Against Protein Sequence DB Input Sequences (Optional Residue Ranges) -> Multiple Sequence Alignment -> Model Building -> HMM Search
23
23 Secondary Structure Features Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an Helix (Amphipathic) Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6
24
24 Integrated Bioinformatics System for Function and Pathway Discovery Data Integration Associative Analysis
25
25 Analytical Pipeline Query Sequence PIR-NREF iProClass Top-Matched Superfamilies/Domains BLAST SearchHMM Domain Search Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs SSEARCH CLUSTALW Superfamily/Domain/Motif Alignments Family Relationships & Functional Features Family Classification & Functional Analysis HMM Motif Search Pattern SearchSignalP/TMHMM
26
26 Integrated Bioinformatics System Global Bioinformatics Analysis of 1000’s of Genes and Proteins Pathway Discovery, Target Identification
27
27
28
28 Lab Section Lab Section
29
29 Peptide Search & Results
30
30 Blast Similarity Search
31
31 Blast Search Results
32
32 Pair-Wise Alignment
33
33 Multiple Sequence Alignment
34
34 Pattern Search Results
35
35 HMM Domain Search Result
36
36 Building HMM Profile
37
37 Using HMM Profile for Searching
38
38 Rabbit Alpha Crystallin A Chain An iProClass View of the entry
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.