Basics of protein structure and modeling Rui Alves.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Proteins. What are Proteins? The most complex biological molecules Contain C, H, O and N Sometimes contain S May form complexes with other molecules containing.
General Info about Proteins Most diverse and most important macromolecules. Our entire DNA codes for proteins only, and nothing else. Therefore they are.
Protein Structure Prediction
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
©CMBI 2001 The amino acids in their natural habitat.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Amino Acid and Protein1. 2  The formation of a peptide bond between glycine and alanine is shown in Figure 5.8. The product is called dipeptide, the.
The Structure and Functions of Proteins BIO271/CS399 – Bioinformatics.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Polypeptides – a quick review A protein is a polymer consisting of several amino acids (a polypeptide) Each protein has a unique 3-D shape or Conformation.
Basic protein structure and stability V: Even more protein anatomy
Protein structures in the PDB
Protein Structure Elements Primary to Quaternary Structure.
Protein Structure Lecture 2/26/2003. beta sheets are twisted Parallel sheets are less twisted than antiparallel and are always buried. In contrast, antiparallel.
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
Proteins Dr. Sumbul Fatma Clinical Chemistry Unit
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Housekeeping Your performance on the exam has caused me to re-evaluate how homework will be handled I will now be picking up every problem assigned on.
Proteins Major group of biological molecules. Proteins Monomers: amino acids ▫Always contain an amino group and carboxylic acid group Polymers: peptides.
Types of Proteins Proteomics - study of large sets of proteins, such as the entire complement of proteins produced by a cell E. coli has about 4000 different.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Lecture 10: Protein structure
Introduction to Protein Structure
Proteins. Proteins? What is its How does it How is its How does it How is it Where is it What are its.
Protein Folding & Biospectroscopy F14PFB Dr David Robinson Lecture 2.
Proteins: Amino Acid Chains DNA Polymerase from E. coli Standard amino acid backbone: Carboxylic acid group, amino group, the alpha hydrogen and an R group.
Protein “folding” occurs due to the intrinsic chemical/physical properties of the 1° structure “Unstructured” “Disordered” “Denatured” “Unfolded” “Structured”
Representations of Molecular Structure: Bonds Only.
Protein Folding & Biospectroscopy F14PFB David Robinson Mark Searle Jon McMaster
Amino acids and proteins … for AS Biology. Amino acids Proteins are macromolecules consisting of long unbranched chains of amino acids. All amino acids.
Proteins and Amino Acids 1. Biological Functions of Proteins Facilitate biochemical reactions Structural support Storage and Transport Immune protection.
CS790 – BioinformaticsProtein Structure and Function1 Review of fundamental concepts  Know how electron orbitals and subshells are filled Know why atoms.
Operone lac Principles of protein structure and function Function is derived from structure Structure is derived from amino acid sequence Different.
Mrs. Einstein Research in Molecular Biology. Importance of proteins for cell function: Proteins are the end product of the central dogma YOU are your.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure (Foundation Block) What are proteins? Four levels of structure (primary, secondary, tertiary, quaternary) Protein folding and stability.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Proteins Dr. Sumbul Fatma Clinical Chemistry Unit Department of Pathology Tel
3-D Structure of Proteins
Structure of proteins by X-ray crystallography
Protein- Secondary, Tertiary, and Quaternary Structure.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Structural organization of proteins
Mir Ishruna Muniyat. Primary structure (Amino acid sequence) ↓ Secondary structure ( α -helix, β -sheet ) ↓ Tertiary structure ( Three-dimensional.
Protein Structure BL
Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional.
Proteins Primary structure: Amino acids link together to form a linear polypeptide. The primary structure of a protein is a linear chain of amino acids.
Basic protein structure and stability V: Even more protein anatomy
The heroic times of crystallography
Protein Structure September 7,
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
Lecture 5 Protein Structure.
Conformationally changed Stability
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
Conformationally changed Stability
Levels of Protein Structure
Protein structure prediction.
Fig 3.13 Reproduced from: Biochemistry by T.A. Brown, ISBN: © Scion Publishing Ltd, 2017.
Protein structure prediction
2.4 - Proteins.
Presentation transcript:

Basics of protein structure and modeling Rui Alves

MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVT KRPRFLFEIAMALNCDPVWLQYGTKRGKAA atgcaaactctttctgaacgcctcaagaagaggcgaattgcgttaaaaatgacgcaaaccgaa ctggcaaccaaagccggtgttaaacagcaatcaattcaactgattgaagctggagtaaccaa gcgaccgcgcttcttgtttgagattgctatggcgcttaactgtgatccggtttggttacagtacgg aactaaacgcggtaaagccgcttaa augcaaacucuuucugaacgccucaagaagaggcgaauugcguuaaaaaugacgcaaacc gaacuggcaaccaaagccgguguuaaacagcaaucaauucaacugauugaagcuggagua accaagcgaccgcgcuucuuguuugagauugcuauggcgcuuaacugugauccgguuug guuacaguacggaacuaaacgcgguaaagccgcuuaa Proteins are the primary functional manifestation of genomes DNA sequence RNA sequence protein sequence protein structure Protein function transcription translation Being able to predict the protein sequence from the gene sequence allows us to predict structure, which in turn helps us understand how the protein does what it does

DNA sequence to protein sequence From protein sequence to secondary structure Protein tertiary structure Predicting protein structure Outline

Predicting protein sequence from DNA sequence Protein sequence can be predicted by translating the cDNA and using the genetic code.

Translating cDNA to protein ATGTCTCTTATATGA… MetSerLeuIle Ter No Gene!!!!!

Translating cDNA to Protein

Translating yeast mitochondrial cDNA into protein sequence ATGTCTCTTATATGA………SECIS sequence MetSerThrMetsCys MetSerLeuIleTer There is a Gene with a considerably different protein sequence from the one we would predict from the universal genetic code!!!!!

DNA sequence to protein sequence From protein sequence to secondary structure Protein tertiary structure Predicting protein structure Outline

The sequence of AAs is the primary structure of proteins Sequence determines structure Amino acids don’t fall neatly into classes How we casually speak of them can affect the way we think about their behavior. For example, if you think of Cys as a polar residue, you might be surprised to find it in the hydrophobic core of a protein unpaired to any other polar group. But this does happen. The properties of a residue type can also vary with conditions/environment Amino acids are the primary building blocks of proteins

Grouping the amino acids by properties Livingstone & Barton, CABIOS, 9, , 1993.

Proteins are made by controlled polymerization of amino acids H 2 NCHC R 1 OH O H 2 NCHC R 2 OH O H 2 NCHC R 1 NH O CHC R 2 OH O peptide bond is formed + HOH residue 1 residue 2 two amino acids condense to form......a dipeptide. If there are more it becomes a polypeptide. Short polypeptide chains are usually called peptides while longer ones are called proteins. water is eliminated N or amino terminus C or carboxy terminus

Repeating torsion angles  /  angles characterize the secondary structure

Secondary structure elements in proteins beta-strand (nonlocal interactions) alpha-helix (local interactions) A secondary structure element is a contiguous region of a protein sequence characterized by a repeating pattern of main-chain hydrogen bonds and backbone phi/psi angles Reflect the tendency of backbone to hydrogen bond with itself in a semi-ordered fashion when compacted

Principal types of secondary structure found in proteins Repeating (f,y) values -63 o -42 o -57 o -30 o -119 o +113 o -139 o +135 o   -helix (1  5) (right-handed) 3 10 helix (1  4) Parallel  -sheet Antiparallel  -sheet

The alpha-helix: repeating i,i+4 h-bonds By DSSP definitions, which of residues 1-12 are in the helix? Does this coincide with the residues in the helical region of phi-psi space? right-handed helical region of phi-psi space hydrogen bond -63 o -42 o   -helix (1  5) (right-handed)

 strands/sheets Is this a parallel or anti-parallel sheet? beta-strand region of phi- psi space By DSSP definitions, which of res are in the sheet? Does this coincide with the residues in the beta-strand region of phi-psi space? -119 o +113 o  Parallel  -sheet

Contact maps of protein structures 1avg--structure of triabin map of C  -C  distances < 6 Å rainbow ribbon diagram blue to red: N to C -both axes are the sequence of the protein near diagonal: local contacts in the sequence off-diagonal: long-range (nonlocal) contacts

If, from the primary structure one can predict secondary structure, then this may help in predicting protein function, via evolutionary relationships with known folds What does secondary structure teach

DNA sequence to protein sequence From protein sequence to secondary structure Protein tertiary structure Predicting protein structure Outline

Tertiary structure in proteins Single polypeptide chain The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology Pattern of contacts between side chains/backbone also an aspect of tertiary structure Outer surface and interior

Obvious interactions in native protein structures disulfide crosslinks polar interactions (hydrogen bond/salt bridge) hydrophobic interactions

The protein databank The protein databank is a central repository of protein structures

Major structure classification systems SCOP (Structural Classification of Proteins) CATH (Class-Architecture-Topology-Homology) DALI/FSSP (Fold classification based on Structure- Structure Alignment) SCOP and CATH are quite similar and generally combine automated and manual aspects. They are both “curated” by human experts.

DNA sequence to protein sequence From protein sequence to secondary structure Protein tertiary structure Predicting protein structure Outline

Training set of known structures Training set of corresponding sequences Test set of known structures Test set of corresponding sequences The knuts and bolts behind fold predition p(  -helix) p(coil) p(  -strand) A Database of known structures Database of corresponding sequences ACDEFGTYAEE… …  -helix coil  -strand p(  -helix) p(coil) p(  -strand) A…C…A…C..A…C… A0.1… … …0.21 p(aa1-coil) p(aa1-helix) p(aa1-strand) … Predict 2 ary structure Compare Bad Predictions: Reshuffle training set and test set and repeat until predictions are correct Good Predictions: Method ready for new sequence 2 ndary structure prediction

How does a fold prediction server work? Database of known structures Database of corresponding sequences Database of probabilities of aa in 2 ndary structure YOUR SEQUENCE Homology based helix coil-strand profile folds database Server Strong Homology … Fold Prediction Weak/No Homology Helix-coil-strand profile prediction … Fold Prediction

Predicting protein folding

Predicting protein structure Homology Modeling –3D-JIGSAW, SWISSMODEL Ab initio Modeling –ROBETTA

Predicting protein structure by homology

How does a homology modeling server work? Database of known structures Database of corresponding sequences …YDVRSEQVENCE… Server/ Program Strong Homologues Best possible alignment (Sequence+ Structure) …YDVR-SEQVENCE… …YDVRMSD-VDNCD… …YDVR-SEQVENCE… …YDVRMSD-VDNCD… … … Thread sequence to predict over known structure according to alignment … … Optimization via energy minimization, etc…

Predicting protein structure Homology Modeling –3D-JIGSAW,SWISSMODEL Ab initio Modeling –ROSETTA

Predicting protein structure by ab initio methods Database of corresponding sequences …YDVRSEQVENCE… Server/ Program NO Homologues Database of structures for smaller amino acid runs …YDVR-SEQ …YDVRMSD-… …YDVR-SEQ …YPVRMSD-… … …VENCE… …YDNCD… …VENCE… …VEQCE… … … Assemble Energy minimization & optimization …

Accuracy of modelling Accuracy is widely varying. The quality of the model is VERY dependent on the quality of the alignment Globular proteins are more accurately predicted Membrane proteins are still a big problem Homology modelling is “bad” if Homology<30% CASP is a bienial meeting where accuracy of the different methods is predicted –Baker group is usually and consistently more accurate than others

DNA sequence to protein sequence From protein sequence to secondary structure Protein tertiary structure Predicting protein structure Summary

“Accessible Surface” Lee & Richards, 1971 Shrake & Rupley, 1973 represent atoms as spheres w/appropriate radii and eliminate overlapping parts... mathematically roll a sphere all around that surface... the sphere’s center traces out a surface as it rolls...

The outer surface: water in protein structures Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively well- defined positions, and waters may also occur internally Water is not just surrounding the protein--it is interacting with it

Water interacts with protein surfaces second shell water: only contacts other waters first shell waters: in contact with/ hydrogen bound to protein most waters visible in structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both

Side chain conformation side chains differ in their number of degrees of conformational freedom (some don’t have any, such as Ala and Gly) but side chains of very different size can have the same number of c angles.

Supersecondary structures/structural motifs just as there are certain secondary structure elements that are common, there are also particular arrangements of multiple secondary structure elements that are common supersecondary structures emphasize issue of topology in protein structure  motif greek key motif

Topology: differences in connectivity “greek key”“up-and-down” example: a four-stranded antiparallel b sheet can have many different topologies based on the order in which the four b strands are connected:

Topology: differences in handedness example: An extremely common supersecondary structure in proteins is the beta-alpha-beta motif, in which two adjacent beta-strands are arranged in parallel and are separated in the sequence by a helix which packs against them. if the two parallel strands are oriented to face toward you, the helix can be either above or below the plane of the strands. huge preference for right-handed arrangement in proteins

DIY: The sequence

DIY: The server

DIY: The reply

DIY: fine tuning

DIY: That is it!

The CATH Hierarchy 1. Divide PDB structure entries into domains (using domain recognition algorithms--domain is the fundamental unit of structure classification 2. Classify each domain according to a five level hierarchy: Class Architecture Topology Homologous Superfamily Sequence Family the top 3 levels of the hierarchy are purely phenetic--based on characteristics of the structure, not on evolutionary relationships the bottom two levels include some phyletic classification as well-- groupings according to putative common ancestry based on structural similarity, functional similarity, and sequence similarity There is no purely phyletic system of protein classification! (also unlikely that there is any common ancestor to all proteins)

SCOP: A different (but similar) taxonomy system Correspondences between SCOP and CATH hierarchies: SCOPCATHclass architecture foldtopology homologous superfamily superfamily familysequence familydomain CATH more directed toward structural classification, whereas SCOP pays more attention to evolutionary relationships. Both have in common that they have manual aspects and are curated by experts.

Internal interactions in a protein

Amino acids: the building blocks of proteins H 2 NCHC R OH O H 3 NCHC R O O The zwitterionic form is the predominant form at neutral pH amino group carboxylic acid group side chain alpha carbon H 3 N C C R O O H