Download presentation
Presentation is loading. Please wait.
1
BMI 731 Protein Structures and Related Database Searches
2
Biology … Protein… DNA (Genotype) Protein
3
A single amino acid substitution in a protein causes sickle- cell disease…
4
What the.....!?
5
Why do we care about structure? In the factory of living cells, proteins are the workers, performing a variety of biological tasks. In the factory of living cells, proteins are the workers, performing a variety of biological tasks. Each protein has a particular 3-D structure that determines its function. Each protein has a particular 3-D structure that determines its function. Protein structure is more conserved than protein sequence, and more closely related to function. Protein structure is more conserved than protein sequence, and more closely related to function. Sequence -> Structure -> Function Sequence -> Structure -> Function
6
Structural Information Protein Data Bank: maintained by the Research Collaboratory of Structural Bioinformatics (RCSB) Protein Data Bank: maintained by the Research Collaboratory of Structural Bioinformatics (RCSB) –http://www.rcsb.org/pdb/ http://www.rcsb.org/pdb/ –> 15,000 structures of proteins –Also contains of structures of Protein/Nucleic Acid Complexes, Nucleic Acids, Carbohydrates Most structures are determined by X-ray crystallography. Other methods are NMR and electron microscopy (EM). Some structures are also theoretically predicted. Most structures are determined by X-ray crystallography. Other methods are NMR and electron microscopy (EM). Some structures are also theoretically predicted.
7
PDB Content Growth
8
Protein? Protein are linear heteropolymers: one or more polypeptide chains Protein are linear heteropolymers: one or more polypeptide chains Building blocks: 20(?) amino acid residues. Building blocks: 20(?) amino acid residues. Range from a few 10s-1000s Range from a few 10s-1000s Three-dimensional shapes (“fold”) adopted vary enormously. Three-dimensional shapes (“fold”) adopted vary enormously.
9
Structure… Structure…
10
Structure cont…
11
Basic measurements on structures… Bond lengths Bond lengths Bond angles Bond angles Dihedral (torsion) angles Dihedral (torsion) angles
12
Bond Length The distance between bonded atoms is constant The distance between bonded atoms is constant Depends on the “type” of the bond Depends on the “type” of the bond Varies from 1.0 Å(C-H) to 1.5 Å(C-C) Varies from 1.0 Å(C-H) to 1.5 Å(C-C) BOND LENGTH IS A FUNCTION OF THE POSITION OF TWO ATOMS. BOND LENGTH IS A FUNCTION OF THE POSITION OF TWO ATOMS.
13
Bond Angle… All bond angles are determined by chemical makeup of the atoms involved, and are constant. All bond angles are determined by chemical makeup of the atoms involved, and are constant. Depends on the type of atom, and number of electrons available for bonding. Depends on the type of atom, and number of electrons available for bonding. Ranges from 100° to 180° Ranges from 100° to 180° BOND ANGLES IS A FUNCTION OF THE POSITION OF THREE ATOMS. BOND ANGLES IS A FUNCTION OF THE POSITION OF THREE ATOMS.
14
Dihedral Angles These are usually variable These are usually variable Range from 0-360° in molecules Range from 0-360° in molecules Most famous are , , and Most famous are , , and DIHEDRAL ANGLES ARE A FUNCTION OF THE POSITION OF FOUR ATOMS. DIHEDRAL ANGLES ARE A FUNCTION OF THE POSITION OF FOUR ATOMS. http://www.colby.edu/chemistry/OChem/DEMOS/dihedral.html
15
Dihedral Angles A torsion angles is defined by 4 atoms, A, B, C and D. When atoms A, B, C and D are mainchain atoms (ie. the carboxylic carbon, C1; the alpha carbon, C2 or C-alpha; and the amide group nitrogen, N), There are THREE repeating torsion angles along the backbone chain called phi, psi and omega. http://bmbiris.bmb.uga.edu/wampler/tutorial/prot2.html
16
Ramachandran / phi-psi plot http://www.biochem.ucl.ac.uk/~roman/procheck/manual/examples/plot_01.html
17
Levels of Structure… 1 - Primary structure 2 - Secondary structure 3 - Tertiary structure 4 - Quaternary structure
18
Primary structure… This is simply the amino acid sequences of polypeptide chains This is simply the amino acid sequences of polypeptide chains
19
Secondary structure Local organization of protein backbone: - helix, -strand (which assemble into -sheet), turn and interconnecting loop. Local organization of protein backbone: - helix, -strand (which assemble into -sheet), turn and interconnecting loop.
20
The -helix One of the most closely packed arrangement of residues. One of the most closely packed arrangement of residues. Turn: 3.6 residues Turn: 3.6 residues Pitch: 5.4 Å/turn Pitch: 5.4 Å/turn
21
The -sheet Backbone almost fully extended, loosely packed arrangement of residues. Backbone almost fully extended, loosely packed arrangement of residues.
22
Ramachandran/phi-psi plot
23
Tertiary structure… Packing the secondary structure elements into a compact spatial unit Packing the secondary structure elements into a compact spatial unit “Fold” or domain– this is the level to which structure prediction is currently possible. “Fold” or domain– this is the level to which structure prediction is currently possible.
24
Quaternary structure… Assembly of homo or heteromeric protein chains. Assembly of homo or heteromeric protein chains. Usually the functional unit of a protein, especially for enzymes Usually the functional unit of a protein, especially for enzymes
25
Classification… Class Class Fold/Architecture Fold/Architecture Superfamily Superfamily
26
Databases of structural classification SCOP SCOP –Murzin AG, Brenner SE, Hubbard T, Chothia C –Structural classification of protein structures –Manual assembly by inspection –All nodes are annotated (e.g.. All- , / ) –Structural similarity search using 3dSearch(Singh and Brutlag) CATH CATH –Dr. C.A. Orengo, Dr. A.D. Michie, etc –Class-Architecture-Topology-Homologous superfamily –Manual classification at Architecture level –Automated topology classification using the SSAP algorithms –No structural similarity search
27
Databases of structural classification FSSP FSSP –L.L. Holm and C. Sander –Fully automated using the DALI algorithms (Holm and Sander) –No internal node annotations –Structural similarity search using DALI Pclass Pclass –A. Singh, X. Liu, J. Chang, D. Brutlag –Fully automated using the LOCK and 3dSearch algorithms –All internal nodes automatically annotated with common terms –JAVA based classification browser –Structural similarity search using 3dSearch
28
Why Structure Alignment? For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment—elucidates the common ancestry of the proteins. For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment—elucidates the common ancestry of the proteins. For nonhomologous proteins, allows us to identify common substructures of interest. For nonhomologous proteins, allows us to identify common substructures of interest. Allows us to classify proteins into clusters, based on structural similarity. Allows us to classify proteins into clusters, based on structural similarity.
29
How do we recognize structural similarities? By eye (Alexei Murzin) By eye (Alexei Murzin) SCOP--Gold standard for structure classification! Algorithmically Algorithmically Growth of PDB demands automated techniques for classification and fold detection
30
Algorithms for Structure Alignment Distance based methods Distance based methods –DALI (Holm and Sander): Aligning scalar distance plots –STRUCTAL (Gerstein and Levitt): Dynamic programming using pairwise inter-molecular distances –SSAP (Orengo and Taylor): Dynamic programming using intra- molecular vector distance Vector based methods Vector based methods –VAST (Bryant): Graph theory based secondary structure alignment –3dSearch (Singh and Brutlag): Fast secondary structure index lookup Both vector and distance based Both vector and distance based –LOCK (Singh and Brutlag): Hierarchically uses both secondary structures vectors and atomic distances
31
DALI Based on aligning 2-D intra-molecular distance matrices Based on aligning 2-D intra-molecular distance matrices Computes the best subset of corresponding residues from the two proteins such that similarity between the 2-D distance matrices is maximized. Computes the best subset of corresponding residues from the two proteins such that similarity between the 2-D distance matrices is maximized. Searches through all possible alignments of residues using Monte-Carlo algorithms Searches through all possible alignments of residues using Monte-Carlo algorithms
32
VAST-Vector Alignment Search Tool Aligns only secondary structure elements (SSE) Aligns only secondary structure elements (SSE) Represents each SSE as a vector Represents each SSE as a vector Finds all possible pairs of vectors from the two structures that are similar Finds all possible pairs of vectors from the two structures that are similar Uses a graph theory algorithms to find maximal subset of similar vectors Uses a graph theory algorithms to find maximal subset of similar vectors Overall alignment scores is based on the number of similar pairs of vectors between the two structures. Overall alignment scores is based on the number of similar pairs of vectors between the two structures.
33
LOCK Define local secondary structures Define local secondary structures Find an initial superposition by using DP to align secondary structure vectors. Find an initial superposition by using DP to align secondary structure vectors. Use greedy algorithms to find nearest neighbors and minimize RMSD between the C- atoms from query and target. Use greedy algorithms to find nearest neighbors and minimize RMSD between the C- atoms from query and target. Find the core of aligned C- atoms and minimize RMSD between them. Find the core of aligned C- atoms and minimize RMSD between them.
34
Where is the data? GenBank DB are equivalent
35
NCBI Reference Sequences RefSeq http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html GenPept Database http://inn.weizmann.ac.il/databanks/genpept.html STATS: http://www.expasy.org/sprot/relnotes/relstat.htmlhttp://www.expasy.org/sprot/relnotes/relstat.html http://www.expasy.org/sprot/ PIR International Protein Sequence Database http://pir.georgetown.edu/pirwww/search/textpsd.shtml http://www.rcsb.org/pdb/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
36
MMDB by NCBI…
37
Protein sequence Database similarity search Does sequence align with protein of known 3D structure? Protein family, domain, cluster analysis Relation- ship to known structure? Structural analysis 3D comparative modeling Predicted three dimensional structure Is there a predicted structure? 3D analysis in laboratory yes no A Flow chart for structure prediction
39
Images.. 3-dimensional model showing the electron density in a molecule of buckminsterfullerene, an allotrope of carbon (C60). 3-dimensional model showing the electron density in a molecule of buckminsterfullerene, an allotrope of carbon (C60).
40
Images… Computer generated image, showing 3-D structure of uteroglobin, a protein secreted in the uterus of mammals. Computer generated image, showing 3-D structure of uteroglobin, a protein secreted in the uterus of mammals.
41
Images… (NMR… EPR…) A computer image of the charge density over the molecule chymosin, an important enzyme in cheese making. Overall negative charge is depicted as red, overall positive charge is shown in blue. A computer image of the charge density over the molecule chymosin, an important enzyme in cheese making. Overall negative charge is depicted as red, overall positive charge is shown in blue.
42
X-ray crystallography.
43
Thanks Thanks to Selnur Erdal for preparing initial versions of these slides.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.