Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein structures in the PDB

Similar presentations


Presentation on theme: "Protein structures in the PDB"— Presentation transcript:

1 Protein structures in the PDB

2 Domains proteins can be modular
single chain may be divisible into smaller independent units of tertiary structure called domains domains are the basic unit of structure classification different domains in a protein are also often associated with different functions carried out by the protein.

3 Definition of domain “A polypeptide or part of a polypeptide chain that can independently fold into a stable tertiary structure...” from Introduction to Protein Structure, by Branden & Tooze “Compact units within the folding pattern of a single chain that look as if they should have independent stability.” from Introduction to Protein Architecture, by Lesk note that Lesk’s definition is more careful... sometimes domains within multidomain proteins can evolve to have dependent stabilities, even when the same types of units occur independently in other proteins. MBP Figure to go here

4 Motif (Supersecondary Structure)
there are certain favored arrangements of multiple secondary structure elements that recur again and again in proteins--these are known as motifs or supersecondary structures a motif is usually smaller than a domain but can encompass an entire domain motif is a fairly broad concept a motif is usually part of a domain and does not fold on its own: for instance, some domains are composed of repeating beta-alpha-beta units. but sometimes a motif is big enough to fold on its own and encompasses a whole domain--a jellyroll barrel might be an example of that. when using the word motif to describe proteins, it sometimes has a functional rather than a structural meaning, like a phosphate-binding motif. here, though, we are speaking in a purely structural sense. greek key beta-alpha-beta

5 Protein Taxonomy-The CATH Hierarchy
1. Divide PDB structure entries into domains (using domain recognition algorithms--the domain is the fundamental unit of structure classification 2. Classify each domain according to a five level hierarchy: Class Architecture Topology Homologous Superfamily Sequence Family the top 3 levels of the hierarchy are purely phenetic--based on characteristics of the structure, not on evolutionary relationships the bottom two levels include some phyletic classification as well-- groupings according to putative common ancestry based on structural similarity, functional similarity, and sequence similarity pheneticessentially means--what does it look, feel, taste like? for instance, a phenetic classification of organisms could group birds and bats on the basis of the fact that both have wings, although bats are more closely related in terms of common ancestry to humans, which do not have wings. phyletic means--relationship according to common ancestry. for the most part, we do not understand the evolutionary relationships between proteins that havedifferent folds, e.g. we do not know that beta-barrels evolved from beta-sandwiches, though they might have. Even proteins that havevery similar structures cannot always be inferred to be related, even though they might be. Some folds may have arisen independently many times in evolution, such as TIM barrels (ref?) might make the point that our lack of knowledge about protein evolution is not necessarily permanent--possible that someday a phyletic system will exist. protein evolution is not well understood-- there is to date no purely phyletic classification system

6 Class In the CATH hierarchy, Class simply describes what type of secondary structure is present. There are only four classes: mainly a mainly b a & b few secondary structures 90% of structures are trivial to assign at this level.

7 Architecture Architecture is hard to define precisely
In CATH it is defined broadly as describing “general features of protein shape” such as arrangements of secondary structure in 3D space It does not define connectivities between secondary structural elements--that’s what the topology level does. It does not even explicitly define directionality of secondary structure, e.g. parallel or antiparallel beta-sheets. in CATH, architectures are presently assigned manually, by visual inspection. let’s look at some architectures! Might add that a narrower definition of architecture could include some directionality of secondary structure, such as “antiparallel beta-sandwich” versus “parallel beta-sandwich” for instance.

8 Some mostly beta architectures

9 Some mixed alpha-beta architectures

10 Topology (Fold) if two proteins have the same topology, it means they have the same number and arrangement of secondary structures, and the connectivities between these elements are the same. this is also sometimes called the fold of a protein. in CATH, automated structure alignment is used to group proteins according to topology. We will discuss this later. we will now look at some examples which illustrate differences in topology.

11 Topology: differences in connectivity
example: a four-stranded antiparallel beta-sheet can have many different topologies based on the order in which the four beta-strands are connected. “up-and-down” “greek key”

12 Topology: differences in handedness
example: in a beta-alpha-beta motif, if the two parallel strands are oriented to face toward you, the helix can be either above or below the plane of the strands.

13 Visualizing protein topology--TOPS cartoons
up triangles=up-facing beta strands down triangles=down-facing beta strands horizontal rows of triangles=beta sheets (beta barrel would be a ring of triangles) circles=helices lines=loops if loops enter from top, line drawn to ctr. if loops enter from bottom, line drawn to boundary fold above is clearly an antiparallel beta-sandwich

14 Visual summary of top three levels of CATH hierarchy
CLASS ARCHITECTURE TOPOLOGY

15 Discovery of New Folds structural taxonomy reveals that although structures are being solved more rapidly than ever, fewer and fewer of them have new folds! Will we get them all soon? one of the consequences of structural taxonomy efforts is that when a new protein structure is solved, we can ask, does it have a new fold or is it similar to preexisting ones?

16 Homologous superfamily/ Sequence family
The lowest two levels in the CATH hierarchy relate to common ancestry some, but not all proteins with the same fold show evidence of common ancestry the surest way of identifying common ancestry is that two proteins have sequences roughly >30% identical (sequence family level) if protein sequences are not that similar, common ancestry may still be inferred on the basis of a combination of structural and functional similarity, and possibly weak sequence similarity (homologous superfamily level)

17 Multifunctional “Superfolds”
some architectures have many folds-- “superarchitecture” some folds have many homologous superfamilies, which means they are used for a variety of functions. these are called “superfolds” How is structure related to function? Do all proteins that have the same structure have the same function? Some folds are only associated with one type of function, but others, called “superfolds” may have many functions.

18 “Common core” structures need not share exactly the same number, type and connectivity of secondary structural elements to be grouped into a single fold type. in fact, evolutionarily related proteins often share a common core of structurally related elements but may differ in presence or absence of a secondary structure element or two. common core concept is important because protein structures in same homologous superfamily or even sequence family can have superficial differences. If these superficial differences led to them being classed as having different folds, that would violate the hierarchical structure.

19 Problems in Fold Classification
“Structure space” has a continuous aspect, especially in certain types of folds, which makes clustering structures into fold families difficult. This is an inherent problem for any classification method based on hierarchical clustering. It seems reasonable to group as having the same fold proteins which share some common core but differ in addition/subtraction of a few secondary structure elements. But this can lead to unnaturally large and diverse fold families via the Russian doll effect and motif overlap.

20 Russian Doll Effect A continuous range of slight size differences will lead to clustering proteins of very different size. small--> medium-->large.

21 Motif Overlap Motif overlap effects: Sometimes two proteins will share a common core but one of them will share a slightly different (but not necessarily larger) common core with a third protein. A continuous range of overlapping common cores AB-->BC-->CD will lead to grouping proteins that have no common core. The motif overlap problem also illustrates why the domain rather than a full protein structure is the fundamental unit of classification--if a two domain protein contained fold A and fold B, another contained fold B and fold C, and still a third contained fold C and fold D...well, you see the point. One could in fact have a structure classification based on the motif, rather than the domain, as the fundamental unit. The reason that isn’t done is that motifs aren’t as conveniently separable as domains. Message: there’s no problem-free hierarchical means of structural classification.

22 Comparison of SCOP and CATH Hierarchies
SCOP CATH class class architecture fold topology homologous superfamily superfamily family sequence family domain domain CATH more directed toward structural classification, SCOP pays more attention to evolutionary relationships

23 Another SCOP/CATH difference
in CATH, there is one class to represent mixed alpha-beta in SCOP there are two: a/b: beta structure is largely parallel, made of bab motifs a+b: alpha and beta structure segregated to different parts of structure

24 SCOP and CATH they have in common that they are hierarchical and based on abstractions they both include some manual aspects and are curated by experts in the field of protein structure are there automated methods for structure classification/comparison?


Download ppt "Protein structures in the PDB"

Similar presentations


Ads by Google