Presentation is loading. Please wait.

Presentation is loading. Please wait.

EECS 800 Research Seminar Mining Biological Data

Similar presentations


Presentation on theme: "EECS 800 Research Seminar Mining Biological Data"— Presentation transcript:

1 EECS 800 Research Seminar Mining Biological Data
Font size High light should be consistent Instructor: Luke Huan Fall, 2006

2 Introduction Cartoon Space filling Surface Ribbon Protein
A sequence from 20 amino acids Adopts a stable 3D structure that can be measured experimentally Lys Gly Leu Val Ala His Oxygen Nitrogen Carbon Sulfur Space filling Cartoon Surface Ribbon We -> I EDITS-jfp (in talking mention amino acids are small molecules and that there are 20 of them) x-ray crystallography, Electron Density Map Left and right are showing the same protein with the same configuration Enphamsize secondary structure 1MBA N terminal 1-120 New cartoon Space filling showing the Van der walse surface cartoon We formalize the LSC problem as the frequent subgraph mining problem (Huan et al. RECOMB’04) A protein structure may be represented by a graph (contact map) Node: a ball Edge: a stick Nodes and edges are colored An alternative procedure formalized the LSC problem as the largest common point set problem among point sets (LCP, Nussinov & Wolfson, PNAS, 1991)

3 Exponential Growth of Protein Structures
Year # of structures 35,000 2005 Growth of Known Structures in Protein Data Bank 1988 The total number of known protein structures Newly characterized proteins in that year

4 Protein Structure Space
Each individual protein structure is a complicated object. The space of protein structures is even more complicated. The outer cycle shows 17 proteins samples from the protein structure space, i.e. all possible protein structures. The purpose of this figure is to show the very top level organization of the protein structure space. There are two prevalent secondary structure components, alpha helices and beta sheets. There are four classes of proteins: alpha, beta a+b (anti-pararllel proteins) and a/b (parallel beta). We may found even proteins within the same class may be quite different. The organization of secondary structure is generally referred to as the fold of the protein though the precise definition is always arguable. Global structure similarity between a pair of protein structures may lead to functional similarity and evolutionary relations. DALI: Distance mAtrix aLIgnment. CE: Combinatorial Extension Alpha and beta proteins (a/b) [51349] (130) Mainly parallel beta sheets (beta-alpha-beta units) Alpha and beta proteins (a+b) [53931] (260) Mainly antiparallel beta sheets (segregated alpha and beta regions) The Berkeley Structural Genomics Center has developed a method to visualize the vast universe of protein structures in which proteins of similar structure are located close together and those of different structures far away in the space. This map, constructed using about 500 of the most common protein folds, reveals a highly non-uniform distribution, and shows segregation between four elongated regions corresponding to four different protein classes (shown in four different colors). Such a representation reveals a high-level of organization of the protein structure universe.

5 Structure Space is Described Hierarchically
From SCOP: Structure classification of proteins ( Class Fold Superfamily Family Protein domains

6 SCOP Statistics Class Number of folds Number of superfamilies
Number of families All alpha proteins 218 376 608 All beta proteins 144 290 560 Alpha and beta proteins (a/b) 136 222 629 Alpha and beta proteins (a+b) 279 409 717 Multi-domain proteins 46 61 Membrane and cell surface proteins 47 88 99 Small proteins 75 108 171 Total 945 1539 2845 25973 PDB Entries (July 2005) Domains.

7 Amino Acids: Building Blocks of Proteins

8 20 Naturally-occurring Amino Acids

9 Protein Secondary Structure
α Helix

10 Protein Secondary Structure
β strands

11 Top Level of Structure Space: Structure Classes
There are four major classes: α proteins β proteins α + β (anti-parallel β strands) α / β (parallel β strands).

12 Protein Folds Protein fold is the way how secondary structures are organized in a 3D structure.

13 Popular Folds The eight most frequent SCOP folds
Where are we with respect to our objectives ? The eight most frequent SCOP folds

14 Superfamily and Family
Proteins within the same superfamily and family will tend to have similar sequence and similar function

15 The Nature of Protein Structure Data
The ball-stick model is an element-based structure representation A structure is decomposed into a set of amino acids Protein geometry, topology, and attributes are defined with respect to the amino acid set Geometry is the coordinates of amino acids Topology is the phyisco-chemical interactions of the residues Attributes are the physico-chemical properties of the residues …. 3D proteins structures are obtained using 5Experimental techniques: X-ray or NMR Modeling/simulation techniques What are proteins? From biochemistry, proteins are chains of amino acids residues (residues). There a total of 20 commonly observed amino acids. To computer scientist, proteins are string that are composed by 20 different characters. These strings can fold into stable 3D structure under right conditions. Here we show the 3D structure of myoglobin. The alpha helix (hi-liks, he-licis) is the spring-like components found in protein structures. Connecting two helices is a floppy region call a “loop”. Helices and loops are called secondary structure elements. Proteins carry biological functions in cell. Enzyme catalyze chemical reactions; binding proteins binds to chemicals or other proteins to carry out their function. Proteins belongs to these two classes take active role in fulfilling their function. Structural proteins takes passive role in fulfilling their function. The form the backbone of the cell. Yet another class of proteins embed in the membrane of the cell and take care of the communication in/outside the cell. [ notes: Hemoglobin 1MBA N –terminal 120, total length 147 (~150) Cellular nitrogen metabolism: fix nitrogen molecules to ammonia in bacterial living with plant: there is not too much clue offered by the slide (which I got from Tammy in Duke): it says cellular nitrogen metabolism: it can be (1) degrading amino acid (liver for mammals), (2) amino acid synthesis, (3) transform nitrogen to ammonia (bacterial living in the root of plants (symbiosis)). Hair protein: keratin. 1JGW Protein are important for the cell. They carry biochemical functions such as binding to the oxygen molecules or cellular function such as immune reactions. The first crystal structure of a macromolecule was solved in 1958 (Kendrew, J.C. et al. (1958) A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature 181, 1971: Protein Data Bank established at Brookhaven National Laboratory PDB contains 2 structures. ]

16 Grant Challenges: Proteomics
Part of the biological system in a cell at the molecular level Megan W. T. Talkington, Gary Siuzdak and James R. Williamson, Nature 438, Data are produced at different levels: cells, organs, organisms, populations. ~35,000 ~50,000 Source:

17 References Bioinformatics: Genes, Proteins, and Computers, Christine Orengo, David Jones, Janet Thornton edit, Bios Scientific Publishers, (ISBN: )


Download ppt "EECS 800 Research Seminar Mining Biological Data"

Similar presentations


Ads by Google