Structure-based Evidence for Function (TIGRfam, Pfam and PDB)

Slides:



Advertisements
Similar presentations
ProgressBook User Start-Up
Advertisements

 Use the Left and Right arrow keys or the Page Up and Page Down keys to move between the pages. You can also click on the pages to move forward.  To.
Bioinformatics lectures at Rice University
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Patterns, Profiles, and Multiple Alignment.
數據分析 David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Profiles for Sequences
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Sequence similarity.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Annotation Presentation Alternative Start Codons &
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Hidden Markov Models As used to summarize multiple sequence alignments, and score new sequences.
Protein Sequence Alignment and Database Searching.
T-COFFEE Multiple Alignments of Orthologous Sequences Horizontal Gene Transfer (Phylogenetic Trees) WebLogo.
Chapter 6 Advanced Report Techniques
Hidden Markov Models for Sequence Analysis 4
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
Office 2003 Advanced Concepts and Techniques M i c r o s o f t Access Project 5 Enhancing Forms with OLE Fields, Hyperlinks, and Subforms.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
The Pfam and MEROPS databases EMBO course 2004 Robert Finn
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Reporting in Version 5 Application Reports AKA: In Context or Right Click AKA: In Context or Right Click Export to Excel from Listing pages Management.
Office 2003 Advanced Concepts and Techniques M i c r o s o f t Access Project 5 Enhancing Forms with OLE Fields, Hyperlinks, and Subforms.
Lab7 QRNA, HMMER, PFAM. Sean Eddy’s Lab
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
1 NORMA Lab. 7 Generating Reports More Display Options File: NORMA_Lab6.ppt. Author: T. Halpin. Last updated: 2009 June 9.
Copyright OpenHelix. No use or reproduction without express written consent1.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Office 2003 Post-Advanced Concepts and Techniques M i c r o s o f t Access Project 7 Advanced Report and Form Techniques.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Copyright OpenHelix. No use or reproduction without express written consent1.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
AESuniversity Ad hoc Reporting Version 5. for the special purpose or end presently under consideration concerned or dealing with a specific subject, purpose,
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
Step 3: Tools Database Searching
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Using BLAST to Identify Species from Proteins
KARES Demonstration.
Bioinformatics lectures at Rice University
Annotation Presentation
Demo: Protein Information Resource
Getting the Most out of the PDBe
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
BLAST.
Annotation Presentation
Basic Local Alignment Search Tool
Reporting Site Manager User Guide February 2019.
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Lesson 13 Working with Tables
Presentation transcript:

Structure-based Evidence for Function (TIGRfam, Pfam and PDB)

TIGRfams are protein families categorized by functional role

Concept: HMMs HMM: A Hidden Markov Model is a probabilistic model developed from observed sequences of proteins of a known function. The profile HMM is used to score the alignment of the amino acid sequence entered to other proteins base on amino acid identity and position A concrete example of an HMM: Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined exclusively by the weather on a given day. Alice has no definite information about the weather where Bob lives, but she knows general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like. Alice believes that the weather operates as a discrete Markov chain (system in various states that can change randomly). There are two states, "Rainy" and "Sunny", but she cannot observe them directly, that is, they are hidden from her. On each day, there is a certain chance that Bob will perform one of the following activities, depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system is that of a hidden Markov model (HMM). Alice knows the general weather trends in the area, and what Bob likes to do on average. In other words, the parameters of the HMM are known.

 Follow this link from the lab notebook TIGRfams: Haft et al. (2001) Nucleic Acids Research 29:

 Change Database to “TIGRFAMS”  Change Scope to GLOBAL  Change E-value cutoff to “0.01”  Enter protein sequence in FASTA format in the box  Click on “Start HMM search”  Then wait… Search TIGRFAM database “Click”

 Enter the TIGRfam number (format -- TIGRXXXXX) from 'Model' column into imgACT lab notebook in box for significant TIGRfam hit  Enter TIGRfam name from ‘Description’ column into notebook  NOTE: If full name is cut off in ‘Description’ column, go to  Enter Score and E-value into Notebook as well Score and E-value RESULTS: Only hits with positive Score & E-value  should be recorded “Click”

To obtain full TIGRfam name:

Then what? Full name Complete description

TIGRfam Results in imgACT Notebook

Terms to Know for Pfam Domain: A structural unit which can be found in multiple protein contexts. e.g., zinc finger, leucine zipper Family: A collection of related proteins containing the same domain. e.g., immunoglobulins, CD4, MHC, TCR, etc. Clan: A collection of multiple protein families. The relationship may be defined by similarity of sequence, structure, or profile-HMM. e.g., ATPase functioning in ETC vs. ATPase functioning in DNA replication.

Click on the link provided in your notebook.

You know the Drill! Enter your FASTA format amino acid sequence Change E-value to “Click”

WAIT…this can sometimes take awhile

RESULTS! Notice there may be two types of results based on your designated E-value: Significant and insignificant matches. Only investigate significant matches. NOTE: Insignificant matches may have valid E-value... but this Pfam result is considered insignificant because the length of the alignment is very short & Pfam has detected and flagged this. If you do not have any significant matches, make a note of this in your notebook by creating a COMMENTS section, entering “No significant hits”. Be sure your search criteria was accurate (e.g., E-value of 0.001) Graphic view of domain organization

Investigate SIGNIFICANT matches Click on [Show] to view the “pairwise alignment” for the Pfam match Copy/paste this pair-wise alignment into designated box in your notebook.

Top row (#HMM): all capital letters indicate conserved residues in the HMM consensus sequence. Middle row (#MATCH): identical or functionally conserved (similar) amino acids Bottom row (#SEQ): query sequence aligned to HMM representing the domain/family How do I interpret the alignment? Legend for #MATCH Upper case = identical match (conserved and high frequency) Lower case = identical match (conserved but low frequency) + symbol = functionally similar (i.e. aspartic vs. glutamic acid) Space = no match What is an HMM consensus sequence?

The HMM consensus sequence Right “Click” Pfam link & open in new tab On Pfam family summary page, click on “Alignments”’

The HMM consensus sequence Full: Total number of sequences in database that have been categorized into this Pfam family Seed: Number of sequences within multiple sequence alignment representing architectural variations within a single Pfam family What does this mean?

Architecture Diversity Domain organization within context of full protein

Leave default settings and press the [View] button The HMM consensus sequence

Click on [Start Jalview] button to view the multiple sequence alignment The HMM consensus sequence A new window will pop up as shown:

TOO MANY COLORS! How do we read this?!? The HMM consensus sequence Another new window will pop up as shown:

Let’s make the view more manageable by simplifying the colors... The HMM consensus sequence Select “Percentage Identity” from menu. NOTE: Take the time to browse other color schemes to learn more about your protein.

Pay special attention to BOTTOM graph: Consensus sequence for protein family This view reveals the amount of conservation in your amino acid sequence. Dark = highest frequency Light = lower frequency Letters show which amino acids occur most frequently at that position. This consensus sequence is used to construct the HMM The HMM consensus sequence

Return to Summary page for Pfam family What else do I need for my notebook? Pfam name and Pfam number Pfam number Abbreviated Pfam name Full Pfam name Copy/paste full & abbreviated Pfam name as well as Pfam number into your lab notebook

Note: Pay Attention to possible 3D Image You may see a 3D image when you view your summary. If you see this image, then this is your first clue that you should expect to have significant hits in the PDB search (next section of this module). If you don’t see an image, then this suggests no structure has yet been solved for proteins containing the domain identified by Pfam.

What else do I need for my notebook? HMM Logo On Summary page, click on “HMM logo”

SAVE this image in.png format and insert into your notebook. What else do I need for my notebook? HMM Logo

How do we interpret the HMM Logo? HMM Logo: -- Highly conserved amino acids are represented by wide letters -- Amino acids with a high frequency of occurrence in the alignment used to generate the HMM consensus sequence are represented by tall letters

Return to Summary page: What else do I need for my notebook? Clan name and number Click BROWSE to search for clan information Use key words from Pfam family name for clan search

Investigate possible clans based on key word search from Pfam family description. To learn more about the clan, click on hyperlink for more clan information. What else do I need for my notebook? Clan name and number

Abbreviated Clan name Clan number Tells you which Pfam families belong to this clan. If the Pfam family to which your protein belongs is not in this list, then your protein is NOT a member of this clan. What else do I need for my notebook? Clan name and number Full Clan name NOTE: Not all Pfam families belong to a clan. If no clan is found, enter “None found” in your lab notebook.

What else do I need for my notebook? Key functional residues You have THREE key tools to assist you in identifying the KEY FUNCTIONAL RESIDUES of your protein. Tool #1: Pairwise Alignment Tool #2: HMM Logo Tool #3: Jalview consensus

Capital letter in #MATCH line Capital letter in #MATCH line Tall, wide letter in HMM logo Tall, wide letter in HMM logo Tall bar in graphical depiction of consensus sequence Tall bar in graphical depiction of consensus sequence How do we identify key functional residues?

Formula:AA(start+HMM#-1) Example: C(47+8-1)= C54 How do we report key functional residues in the notebook? HMM#

1.Use the HMM pair-wise alignment to identify possible key functional residues. 2. Use the HMM Logo and Jalview alignment tools to verify key functional residues. 3. Scan the entire amino acid sequence and record all key functional residues using proper notation. SUMMARY: Identifying key functional residues

Recording results in your Lab Notebook Scroll down

Recording results in your Lab Notebook

REPEAT procedure for all significant Pfam hits 3 hits = 3 notebook entries

PDB Protein Data Bank o Worldwide depository for three-dimensional structures of large biological molecules, including proteins and nucleic acids o Contains information about structure such as... Berman et al. (2003) Nature Structural Biology 10: 980. sequence details atomic coordinates crystallization conditions 3-D structure neighbors derived geometric data structure factors 3-D images

Click on the link provided in your notebook.

Select “Advanced Search”

Select “Sequence (Blast/Fasta)” option Copy/paste your FASTA format protein sequence into query box Click when ready to initiate search Change E-value cut off to 0.001

Scroll down Results of PDB Search Search hits listed by ascending E-value

Alignment and statistics Assess quality of the alignment: Is the E-value less than ? Is a significant proportion of the protein aligned? (Hint: compare alignment length to total length) PDB NAME PDB CODE Thumbnail of 3D structure. Click on it to get a high-resolution image for notebook. Evaluating PDB Results If so, good hit. Citation

NOTE: Revise or add headings and boxes as needed Recording results in your Lab Notebook Scroll down Add to your notebook

You cannot simply copy/paste the entire alignment with correct formatting into your lab notebook…. DELETE THIS SECTION. X