Presentation is loading. Please wait.

Presentation is loading. Please wait.

Target selection strategies for the mouse genome

Similar presentations


Presentation on theme: "Target selection strategies for the mouse genome"— Presentation transcript:

1 Target selection strategies for the mouse genome
Adam Godzik

2 Q1:How far we are from a complete coverage of a sample genome
~200

3 The same, split into finer categories
~200 ~60, essential (predicted)

4 Why mouse? Everybody’s favorite model organism
Diseases (cancer, diabetes) Several large experimental collaborations target mouse Genome sequenced 2002, shotgun, but also cDNA small (but significant) differences with the human genome

5 Basic statistics 2.5M bases (14% shorter than human)
~30,000 protein coding genes (about the same as human, with the same uncertainties) 20 chromosomes, significant rearrangements of conserved sequence elements as compared to humans Largest expansions/contractions seen in genes involved in apoptosis, immune response, olfactory functions Fold repertoire is likely to be essentially identical

6 Differences are quantitative, rather than qualitative

7 We are facing many challenges
There are several challenges we usually don’t see for bacterial proteins Eukaryotic proteins do not express very well HT E.coli expression ~5% success rate yes Baculovirus ~15% test Cell free expression high (claimed) no They tend to have many domains Many approaches to domain recognition, less to defining precise domain boundaries Many form complexes – problems with solubility and crystallization Rational approach to mutation of surface residues

8 How bad can it get (NAC example)
three domains can be reliably modeled (one of them was solved), fold of two additional ones can be predicted, one is unknown Together with two other labs at TBI we spent three years studying the second domain Where exactly it ends? Is the mystery domain part of it? No structure so far after several hundred constructs for five human paralogs and dozens of homologs from several species Four “linkers” of about 300aa – domains? Unstructured linkers? 18 paralogs in mouse and 23 in human. What they do? Different paralogs are involved in diseases as different as cancer, autoimmune diseases and innate immunity disorders – despite identical domain structures ! 1480 1 PAAD AAA* ? LR ? (NB) CARD

9 Target selection principles
Eliminate what is already known (or not interesting or should be solved by other techniques) Homologs of proteins with known fold Transmembrane domains Disordered regions Choose representatives of the unknown Clustering and sampling strategies T1 T2 T3 known Other pipeline

10 Basic PSI target estimates
Structures for ~1/2 (2/5 length) can be reliably predicted using SCOP type domain families with multiple representatives and profile-profile algorithm Proteins with transmembrane domains account for ~¼ of the proteome 25, ,000 domains are needed to cover the remaining ~ ¼ at 30% sequence identity Reliability of the excluded domain prediction Uncertainty of disordered/low complexity regions Surprising number of ORFan regions (10-20,000) Clustering with profile-profile algorithms can lower this number to 5-10,000 + ORFans

11 What could be a model for mouse?
Mouse is a favorite model for human processes and diseases For fold survey and modeling purposes we can use proteins from lower organisms

12 Bacterialization of the mouse proteome
Bacterial homologs can be found for most of the mouse proteins (54% with PSI-BLAST, 65% with profile-profile (FFAS) and it could still be extended) Distribution of bacterial homologs is an interesting problem it itself

13 JCSG eukaryotic protein pilot project
mouse ~400 targets ~1000 targets* Homologs of mouse proteins in bacteria *together about 20% effort, the rest spent on Thermotoga and few other pilot projects

14 Experiences so far Mouse
– 400 selected 222 expressed 32 crystals 4 structures Effort per structure – about 10 times that of a bacterial protein Bacterialized mouse 1025 selected 380 expressed 63 crystals 12 structures Effort per structure – about 1/10 that of a mouse protein

15 The plan Some mouse proteins can be solved in a high throughput mode
Bacterial homologs are used as a “salvage” pathway for proteins that failed in the direct approach Exact domain boundaries Modeling mutations MR on bacterial templates

16 Some observations on mouse “bacterialization”
Prokaryotic genomes Eukaryotic genomes Common part ~300 proteins, ~5-10% of a typical genome Common part ~5000 proteins, ~20-30% of a typical genome

17 Conservation patterns between genomes
Prokaryotic genomes Eukaryotic genomes Groups of functionally related proteins often are found in specific organisms

18 Mouse is just a somewhat bigger bacteria?
For some functional groups the bacterialization is a very natural approach mitochondrial proteins – mitochondria evolved from prokaryotic symbionts basic metabolic enzymes – the basic biochemistry and fundamental process are the same in prokaryotes and eukaryotes Many homologies are completely puzzling

19 Not all homologies are trivial
Periplasmic binding proteins In G (-) bacteria used to scavenge for food in preriplasm Wide specificity Closed conformation recognized by specialized transporters Human gated ion channels Transporting ions through membranes Regulated by glutamate, glycine and zinc (and other things) Closed conformation opens the channel Distant homology, RMDS ~3Ǻ, models of human proteins built on bacterial templates successfully used in planning experiments

20 Bacteria as a catalog of spare parts
Focus on fundamental processes in the eukaryotic cell Energy production – mitochondria core machinery of life – set of fundamental pathways and processes

21 Mitochondrial proteome
618 proteins, including 32 from the mitochondrial genome and 586 imported from the nucleus 392 (predicted) soluble proteins 192 with less then 30% sequence id to any known structure 65 with unknown folds

22 Central core machinery of life

23 Enzymes from the “central core” are mostly shared between eukaryotes (mouse) and bacteria

24 Structural coverage of the core set of metabolic pathways
~10,000 enzymes Most have homologs in multiple organisms, including bacteria Could be covered by ~1000 targets at the superfamily level

25 Missing genes ?

26 Conclusions We are within a striking distance of achieving complete fold coverage of selected bacterial genomes (~200 structures for T. maritima) Prokaryotic genomes look like a catalog of spare parts eukaryotic genomes were build from Thousands of structures still needed for the coverage of the remaining ~25% of the mouse (or almost any eukaryotic) genome and most of them have bacterial homologs Functionally related groups of proteins could be attractive targets


Download ppt "Target selection strategies for the mouse genome"

Similar presentations


Ads by Google