Protein Structure, Databases and Structural Alignment

Slides:



Advertisements
Similar presentations
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Advertisements

Protein Structure Prediction
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Structural bioinformatics
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
Lecture 1 BNFO 240 Usman Roshan. Course overview Perl progamming language (and some Unix basics) Sequence alignment problem –Algorithm for exact pairwise.
Thomas Blicher Center for Biological Sequence Analysis
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Structures and Structure Descriptions Chapter 8 Protein Bioinformatics.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Object Recognition. Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Structural Bioinformatics Seminar Dina Schneidman
Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Protein Structure Alignment
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Protein Structural Prediction. Protein Structure is Hierarchical.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Evolving Models of Biological Sequence Similarity Daniel P. Miranker The University of Texas at Austin [Chenetal98]
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Basic Computations with 3D Structures
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
PROTEINS PROTEINS Levels of Protein Structure.
Considerations for Protein Crystallography (BT Chapter 18) 1.Growing crystals Usually require 0.5mm in shortest dimension, except if using Synchrotron.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Structural proteomics
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
Biochemistry - as science; biomolecules; metabolic ways. Structure of proteins, methods of its determination.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
Motif Search and RNA Structure Prediction Lesson 9.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Protein Structure Prediction
Protein Structures.
Protein structure prediction.
Protein Structure Alignment
Protein structure prediction
Presentation transcript:

Protein Structure, Databases and Structural Alignment

Basics of protein structure

Why Proteins Structure ? Proteins are fundamental components of all living cells, performing a variety of biological tasks. Each protein has a particular 3D structure that determines its function. Protein structure is more conserved than protein sequence, and more closely related to function.

Protein Structure Protein core - usually conserved. Protein loops - variable regions Surface loops Hydrophobic core

Supersecondary structures Assembly of secondary structures which are shared by many structures. Beta-alpha-beta unit Beta hairpin Helix hairpin

Fold: General structure composed of sets of Supersecondary structures Hemoglobin (1bab)

How Many Folds Are There ? http://scop.berkeley.edu/count.html

? Structure – Sequence Relationships Two conserved sequences similar structures Two similar structures conserved sequences ? There are cases of proteins with the same structure but no clear sequence similarity.

Principles of Protein Structure Today's proteins reflect millions of years of evolution. 3D structure is better conserved than sequence during evolution. Similarities among sequences or among structures may reveal information about shared biological functions of a protein family.

The Levinthal paradox Assume a protein is comprised of 100 AAs and that each AA can take up 10 different conformations. Altogether we get:10100 (i.e. google) conformations. If each conformation were sampled in the shortest possible time (time of a molecular vibration ~ 10-13 s) it would take an astronomical amount of time (~1077 years) to sample all possible conformations, in order to find the Native State.

The Levinthal paradox Luckily, nature works out with these sorts of numbers and the correct conformation of a protein is reached within seconds.

How is the 3D Structure Determined ? Experimental methods (Best approach): X-rays crystallography. NMR. Others (e.g., neutron diffraction).

How is the 3D Structure Determined ? In-silico methods Ab-initio structure prediction given only the sequence as input - not always successful.

A note on ab-initio predictions: The current state is that “failure can no longer be guaranteed”…

A note on ab-initio secondary structure prediction: Success ~70%.

How is the 3D Structure Determined ? In-silico methods Threading = Sequence-structure alignment. The idea is to search for a structure and sequence in existing databases of 3D structure, and use similarity of sequences + information on the structures to find best predicted structures.

Comments X-ray crystallography is the most widely used method. Quaternary structure of large proteins (ribosomes, virus particles, etc) can be determined by electron microscopes (cryoEM).

Protein Databases

PDB: Protein Data Bank Holds 3D models of biological macromolecules (protein, RNA, DNA). All data are available to the public. Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%). Submitted by biologists and biochemists from around the world.

PDB: Protein Data Bank Founded in 1971 by Brookhaven National Laboratory, New York. Transferred to the Research Collaboratory for Structural Bioinformatics (RCSB) in 1998. Currently it holds > 49,426 released structures. 61695

PDB - model A model defines the 3D positions of atoms in one or more molecules. There are models of proteins, protein complexes, proteins and DNA, protein segments, etc … The models also include the positions of ligand molecules, solvent molecules, metal ions, etc.

PDB – Protein Data Bank http://www.pdb.org/pdb/home/home.do

The PDB file – text format

The PDB file – text format Residue identity The coordinates for each residue in the structure Atom identity chain Atom number Residue number X Y Z ATOM: Usually protein or DNA HETATM: Usually Ligand, ion, water

Structural Alignment

Why structural alignment? Structural similarity can point to remote evolutionary relationship Shared structural motifs among proteins suggest similar biological function Getting insight into sequence-structure mapping (e.g., which parts of the protein structure are conserved among related organisms).

As in any alignment problem, we can search for GLOBAL ALIGNMENT or for LOCAL ALIGNMENT

Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90%

What is the best transformation that superimposes the unicorn on the lion?

Solution: Regard the shapes as sets of points and try to “match” these sets using a transformation

This is not a good result….

Good result:

Kinds of transformations: Rotation Translation Scaling and more….

Translation: Y X

Rotation: Y X

Scale: Y X

We represent a protein as a geometric object in the plane. The object consists of points represented by coordinates (x, y, z). Lys Met Gly Thr Glu Ala

The aim: Given two proteins Find the transformation that produces the best Superimposition of one protein onto the other

Correspondence is Unknown Given two configurations of points in the three dimensional space: +

Find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points ?

The best transformation:

Simple case – two closely related proteins with the same number of amino acids. + Question: how do we asses the quality of the transformation?

Scoring the Alignment Two point sets: A={ai} i=1…n B={bj} j=1…m Pairwise Correspondence: (ak1,bt1) (ak2,bt2)… (akN,btN) (1) Bottleneck max ||aki – bti|| (2) RMSD (Root Mean Square Distance) Sqrt( Σ||aki – bti||2/N)

RMSD – Root Mean Square Deviation Given two sets of 3-D points : P={pi}, Q={qi} , i=1,…,n; rmsd(P,Q) = √ S i|pi - qi |2 /n Find a 3-D transformation T* such that: rmsd( T*(P), Q ) = minT √ S i|T(pi) - qi |2 /n Find the highest number of atoms aligned with the lowest RMSD

Pitfalls of RMSD all atoms are treated equally (residues on the surface have a higher degree of freedom than those in the core) best alignment does not always mean minimal RMSD does not take into account the attributes of the amino acids Atoms on the surface have a higher degree of freedom than those in the core

Flexible alignment vs. Rigid alignment

Some more issues

Does the fact that all proteins have alpha-helix indicates that they are all evolutionary related? No. Alpha helices reflect physical constraints, as do beta sheets. For structures – it is difficult sometimes to separate convergent evolution from evolutionary relatedness.

Structural genomics: solve or predict 3D of all proteins of a given organism (X-ray, NMR, and homology modelling). Unlike traditional structural biology, 3D is often solved before anything is known on the protein in question. A new challenge emerged: predict a protein’s function from its 3D structure.

CASP: a competition for predicting 3D structures. Instead of running to publish a new 3D structure, the AA sequence is published and each group is invited to give their predictions.

Capri: same as casp – but for docking.

Homology modeling: predicting the structure from a closely related known structure. This can be important for example to predict how a mutation influences the structure