Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Slides:



Advertisements
Similar presentations
Welcome to Famis From W&M home page – Search famis.
Advertisements

Whats New in Office 2010?. Major Changes in Office 2010 The Office Ribbon, which first made its appearance in Office 2007, now appears in all Office 2010.
Introduction to OBIEE:
Customizing the MOSS 2007 Search Results November 2007 Rafael Perez.
Enrichment Map GSEA Tutorial
DNA BLAST Lab.
Sequence Comparison and Genome Alignment in the Human Genome Jian Ma Jian Ma | Sequence Comparison and Genome Alignment1 Powerpoint: Casey Hanson.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Microsoft Office 2010 Access Chapter 1 Creating and Using a Database.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
Using Frames in a Web Site
MBAC 611.  We have been using MS Access to query and modify our databases.  MS Access provides a GUI (Graphical User Interface) that hides much of the.
Using 3D-SURFER. Before you start 3D-Surfer can be accessed at For visualization.
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
Laboratory Exercise # 3 – Basic File Management Office Productivity Tools 1 Laboratory Exercise # 3 Basic File Management Objectives: At the end of the.
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Polymorphism and Variant Analysis Lab
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
| | Tel: | | Computer Training & Personal Development Microsoft Office Publisher 2007 Expert.
CHAPTER 9 Introducing Microsoft Office Learning Objectives Start Office programs and explore common elements Use the Ribbon Work with files Use.
User Interface Elements of User Interface Group View.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
HTML, XHTML, and CSS Chapter 8 Adding Multimedia Content to Web Pages.
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools Menu and Other Concepts Alerts Event Log SLA Management Search Address Space Search Syslog Download NetIIS Standalone Application.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Database Systems Microsoft Access Practical #3 Queries Nos 215.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
HTML Concepts and Techniques Fifth Edition Chapter 4 Creating Tables in a Web Site.
Mapping local community assets online Read this if you want to learn how to: 1)Create online maps of local community assets using Google Maps 2)Allow other.
HTML Concepts and Techniques Fifth Edition Chapter 3 Creating Web Pages with Links, Images, and Formatted Text.
Creating and Editing a Web Page
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
Copyright OpenHelix. No use or reproduction without express written consent1.
Creating and Editing a Web Page Using Inline Styles
Creating Web Pages with Links, Images, and Embedded Style Sheets
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles Protein Sequence, Structure, and Function Lab v1 | Gustavo Caetano - Anolles 1.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Using BLAST to Identify Species from Proteins
Regulatory Genomics Lab
Variant Calling Workshop
Tutorial for using Case It for bioinformatics analyses
This tutorial is designed to be used in a “follow along” fashion
Data Upload & Management
How to PostPower Point Presentations
Basic Local Alignment Search Tool (BLAST)
Chapter 1: Digital Communication Tools
How to Open PowerPoint Maryam Fatima.
Creating your first website
Regulatory Genomics Lab
Microsoft Windows 7 Basics
Regulatory Genomics Lab
An Introduction to Designing and Executing Workflows with Taverna
Presentation transcript:

Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Exercise In this exercise we will be doing the following: 1.Visualize the structure of various proteins in the Protein Data Bank. 2.Use the Superfamily HMM tool to uncover common protein domains in aligned sequences. 3.Reconstruct Phylogenies of Structurally Related Proteins Using Mr. Bayes. 2Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 0: Local Files For viewing and manipulating the files needed for this laboratory exercise, insert your flash drive. Denote the path to the flash drive as the following: [course_directory] We will use the files found in: [course_directory]/08_Protein_Structure/data/ Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 20153

Protein Visualization in PDB In this exercise, we will become familiar with the Protein Data Bank, a database that provides various information on the structure and function of proteins. We will concentrate on Acyl Phosphotase (2ACY) in our exercises. We will primarily be using this tool to visualize the 3D structure of proteins in the browser, and then making predictions on their secondary structure from this view. We will validate our predictions using a PDBsum Hera Diagram. Additionally, we will use CATH (a tool that imposes a hierarchical structure to PDB) to look at the folds (hierarchy) for 2ACY. 4

Step 1A: Accessing PDB Open a browser and go to the following web address: In the search box, type the PDB ID of Acyl Phosphotase and press Enter: 2ACY 5Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 1B: Accessing PDB On the right side of the screen, under biological assembly, by 3D View: click JSmol. On the next page you may get warnings regarding Java. If so follow the directions on the next slide. 1.If Java™ needs your permission to run, click Run This Time 2.If a Security Warning pops up, select the checkbox and click Run. 3.If a Block Window pops up, select Don’t Block 6Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 2: Visualization of 2ACY The visualization window should look something like the this: Holding Left Click down and moving the mouse should enable you to rotate the protein in 3D space! Look at the protein. Can you detect what its secondary structure is from this 3D diagram? Write down your prediction in Notepad and we will test it next. 7Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 3A: Verification of 2ACY Secondary Structure Though we could do this in PDB, we will consult a secondary resource to verify our prediction. Go to the following web address: In the search box PDB code box, type 2ACY and click Find. 8Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 3B: Verification of 2ACY Secondary Structure Under the Protein Chain header click the The Protein Chain page should look like the following: 9Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 3C:Verification of 2ACY Secondary Structure How does your prediction compare with this domain? Click on the domain icon on the right side of the screen for a nice diagram of the domain. 10 N terminus C terminus Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 4A: Analysis of Folds for 2ACY Now, we will look at 2ACY’s fold in the CATH hierarchy. CATH (Class, Architecture, Topology, and Homologous Superfamily) is a novel hierarchical clustering of proteins according to these 4 attributes. To view the CATH hierarchy from our 2ACY Domain page, click on the CATH button. 11Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 4B: Analysis of Folds for 2ACY 12Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Using blastP For Finding Sequence Matches to GATA-1 In this exercise, we will utilize a different BLAST tool called blastP to find all protein sequence matches to GATA-1 (the erythroid transcription factor from an earlier lab). Using SUPERFAM HMM we will analyze which protein domains these homologous sequences have. 13

Step 5A: BLASTing GATA-1 Go to the following web address to access BLAST: The program we want to run is protein blast.. 14Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 5B: BLASTing GATA-1 The protein FASTA sequence is available in our data directory: [course_directory]/09_Protein_Structure/data/gata1.fasta Click the Choose File button and upload our gata1.fasta file. Under Database choose Protein Data Bank(pdb). Ensure that for Algorithm, blastp is selected. Click BLAST. 15Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 5C: BLASTing GATA-1 The screenshot below details the correct configuration. 16Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 5D: BLASTing GATA-1 The distribution of hits should look similar to below: 17Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 5E: BLASTing GATA-1 In this step, we will download all of the significant alignments in this plot. Scroll down the window to the Sequences producing significant alignments box: Click Select All. Click Download Select FASTA (complete sequence) Click Continue 18Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 6A: Running Superfamily HMM The file from the previous step is available in our data directory as: [course_directory]/08_Protein_Structure/data/gata1_homologs.fasta To run SUPERFAMILY HMM go to the following web address: 19Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 6B: Running Superfamily HMM On the next screen, next to Multiple Sequence FASTA File, click Choose File. Select our homolog file we just downloaded or the file in the data directory: gata1_homologs.fasta Ensure that Amino Acid sequence is selected from the dropdown menu at the top. Ensure Notification is Browser. Click Submit. 20Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 6C: Running Superfamily HMM Ignore the short sequence warnings and click the View the domain assignment results link at the bottom of the page. The results are shown in pictorial and tabular form (scroll down on the page) and are sorted according to e-value of whether or not the sequence belongs to a given superfamily. The picture to the right shows a diverse set of domains showing up. 21Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 6D: Running Superfamily HMM Many homologs though show the same domain family as 1GAT. In the tabular view, you can see the e-values for superfamily assignments and family assignments for each on of these homologs. In general, the superfamily assignment must not exceed to be considered significant, while the family assignment can not exceed Those sequences that violate these constraints have their e-values grayed in the tabular view. 22Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Finding Structural Neighbors and Rebuilding Phylogenetic Trees In this section, we will search a database for sequences with a similar structure to a protein of interest, 3GE4 – a DNA STARVATION PROTEIN. In particular, we will look at Chain A. Then, utilizing Mr. Bayes we will reconstruct a Phylogenetic Tree utilizing the alignment data we get from DALI, our structural alignment program. 23

Step 7A: DALI There is a nice web interface for using DALI at the following link: To run our query against the database we need to just specify two things. In PDB identifier type 3GE4 In Chain type A. 24 NOTE: DO NOT CLICK SUBMIT. WE HAVE PRECOMPUTED THE RESULTS. Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 7B: DALI The DALI results for this protein-chain have already been computed and are available in the HTML file in our data directory. [course_directory]/08_Protein_Structure/data/Dali_mol1A.html In the browser, it should look similar to the following: a ranked list of sequences to the query (3GE4) decreasing in similarity. 25Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 7C: DALI Select the following hits: (Ctrl-F to search for something in the web page) 3ge4-A 1tjo-C 1ji4-L 1bcf-H 3uoi-J 1eum-A Click on Structural Alignment 26Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 7D: DALI The structural alignment is shown below where the TOP figure shows the alignment of the residues while the BOTTOM figure shows the secondary structure identifier for the residue (L = coil, H= Helix, E = Strand). 27Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 8A: Reconstructing Phylogenies Using Mr. Bayes A nexus file for the tracks we selected in the previous stage is provided in the data directory: [course_directory]/08_Protein_Structure/data/alignment.nex We will run a program called Mr. Bayes that will reconstruct the phylogenies from these structural alignments. Its icon is located on the desktop. 28Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 8B: Reconstructing Phylogenies Using Mr. Bayes Unfortunately, Mr. Bayes does not handle paths well. In order to use our alignment.nex file, we have to copy it into the directory where Mr. Bayes is installed. To navigate to this directory, Right Click on the Mr. Bayes icon on the Desktop. Click Find Target… 29Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 8C: Reconstructing Phylogenies Using Mr. Bayes Open up our data directory in a window side by side with our Mr. Bayes directory. Drag our alignment.nex file to the Mr. Bayes directory. 30Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 8D: Reconstructing Phylogenies Using Mr. Bayes 31 $ execute alignment.nex $ showmodel $ set autoclose=yes; # close chains and go to next statement $ mcmcp ngen=10000 printfreq=100 samplefreq=100 nchain=4 savebrlens=yes filename=alignment; # define parameters of the run $ mcmc; # Run Markov Chain Monte Carlo $ sump # Summarize your mcmc results $ sumt # Output Trees Run the following commands in Mr. Bayes to reconstruct the phylogeny. Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Step 9: Analyzing the Phylogenies The phylogeny is shown in the output of Mr. Bayes. A screenshot is shown below. 32Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015