Download presentation
Presentation is loading. Please wait.
1
Genome Biology and Biotechnology
Genoom Biologie Prof. M. Zabeau Genome Biology and Biotechnology 9. The localizome Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology (VIB) University of Gent International course 2005 Academiejaar
2
Summary DNA localizome or DNA interactome Protein localizome
Genome-wide mapping of DNA binding proteins Transcription factor binding sites Localization of replication origins Protein localizome High throughput localization of proteins in cellular compartments
3
Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)
4
Genome-wide Analysis of Regulatory Sequences
Gene expression is regulated by transcription factors selectively binding to regulatory regions protein–DNA interactions involve sequence-specific recognition Other factors, such as chromatin structure may be involved Sequence-specific DNA-binding proteins from eukaryotes generally recognize degenerate motifs of 5–10 base pairs Consequently, potential recognition sequences for transcription factors occur frequently throughout the genome Genome-wide surveys of in vivo DNA binding proteins provides a platform to answer these questions
5
Genome-wide Analysis of Regulatory Sequences
Methods combine Large-scale analysis of in vivo protein–DNA crosslinking microarray technology ChIP-on-chip Chromatin Immuno-Precipitation on DNA chips Reprinted from: Biggin M., Nature Genet. 28, 303 (2001)
6
Genome-Wide Location and Function of DNA Binding Proteins
Ren et. al., Science, 290, 2306 (2000) Paper presents proof of principle for microarray-based approaches to determine the genome-wide location of DNA-bound proteins Study of the binding sites of a couple of well known gene-specific transcription activators in yeast: Gal4 and Ste12 Combines data from in vivo DNA binding analysis with expression analysis to identify genes whose expression is directly controlled by these transcription factors
7
Chromatin Immuno Precipitation (Chip) Procedure
Cells are fixed with formaldehyde, harvested, and sonicated DNA fragments cross-linked to a protein of interest are enriched by immunoprecipitation with a specific antibody Immuno-precipitated DNA is amplified and labeled with the fluorescent dye Cy5 Control DNA not enriched by immunoprecipitation is amplified and labeled with the different fluorophore Cy3 DNAs are mixed and hybridized to a microarray of intergenic sequences The relative binding of the protein of interest to each sequence is calculated from the IP-enriched/unenriched ratio of fluorescence from 3 experiments Reprinted from: Ren et. al., Science, 290, 2306 (2000)
8
Modified Chromatin Immuno Precipitation (Chip) Procedure
Genoom Biologie Prof. M. Zabeau Modified Chromatin Immuno Precipitation (Chip) Procedure Close-up of a scanned image of a micro-array containing 6361 intergenic region DNA fragments of the yeast genome ChIP-enriched DNA fragment Fig. 1. The genome-wide location profiling method. (A) Close-up of a scanned image of a microarray containing DNA fragments representing 6361 intergenic regions of the yeast genome. The arrow points to a spot where the red intensity is over-represented, identifying a region bound in vivo by the protein under investigation. (B) Analysis of Cy3- and Cy5-labeled DNA amplified from 1 ng of yeast genomic DNA using a single-array error model (8). The error model cutoffs for P values equal to 103 and 105 are displayed. (C) Experimental design. For each factor, three independent experiments were performed and each of the three samples were analyzed individually using a single-array error model. The average binding ratio and associated P value from the triplicate experiments were calculated using a weighted average analysis method Reprinted from: Ren et. al., Science, 290, 2306 (2000) Academiejaar
9
Proof of concept: Gal4 transcription factor
Identification of sites bound by the transcriptional activator Gal4 in the yeast genome and genes induced by galactose Gal4 activates genes necessary for galactose metabolism The best characterized transcription factor in yeast 10 genes were bound by Gal4 and induced in galactose 7 genes in the Gal pathway, previously reported to be regulated by Gal4 3 novel genes: MTH1, PCL10, and FUR4 Reprinted from: Ren et. al., Science, 290, 2306 (2000)
10
Genome-wide location of Gal4 protein
Genes whose promoter regions are bound by Gal4 and whose expression levels were induced at least twofold by galactose Reprinted from: Ren et. al., Science, 290, 2306 (2000)
11
Role of Gal4 in Galactose-dependent Cellular Regulation
The identification of MTH1, PCL10, and FUR4 as Gal4-regulated genes explains how regulation of several different metabolic pathways can be coordinated increases intracellular pools of uracil Fur4 Pcl10 MTH1 reduces levels of glucose transporter Reprinted from: Ren et. al., Science, 290, 2306 (2000)
12
Conclusions The genes whose expression is controlled directly by transcriptional activators in vivo Are identified by a combination of genome-wide location and expression analysis Genome-wide location analysis provides information On the binding sites at which proteins reside in the genome under in vivo conditions
13
Genomic Binding Sites of the Yeast Cell-cycle Transcription Factors SBF and MBF
Iyer et al., Nature 409: 533 (2001) Paper presents The use of CHIP and DNA microarrays to define the genomic binding sites of the SBF and MBF transcription factors in vivo The SBF and MBF transcription factors are active in the initiation of the cell division cycle (G1/S) in yeast A few target genes of SBF and MBF are known but the precise roles of these two transcription factors are unknown The two transcription factors are heterodimers containing the same Swi6 subunit and a DNA binding subunit MBF is a heterodimer of Mbp1 and Swi6 SBF is a heterodimer of Swi4 and Swi6
14
Genomic targets of SBF and MBF
Genoom Biologie Prof. M. Zabeau Genomic targets of SBF and MBF Figure 3 Genomic targets of SBF and MBF. Percentile ranks of intergenic fragments that meet selection thresholds are inicated (blue–yellow colour scale). Loci with 70% overall nucleotide sequence identity to another yeast locus (potentially crosshybridizing) are indicated (closed circles). The combination of Cy3 and Cy5 labelled probes, the antibody used for IP (if used) and the culture conditions for each experiment are summarized (left panel). Experiments 9, 10, 17 and 18 involved independent crosslinking and IPs. DNA microarrays that included all yeast ORFs and other features, in addition to the intergenic fragments, were used for experiments 3, 8, 13 and 14. Reprinted from: Iyer et al., Nature 409: 533 (2001) Academiejaar
15
In Vivo Targets of SBF and MBF
The CHIP experiments identified 163 possible targets of SBF 87 possible targets of MBF 43 possible targets of both factors Support for the possible in vivo targets Most of the genes downstream of the putative binding sites peak in G1/S Target genes are highly enriched for functions related to DNA replication, budding and the cell cycle In vivo binding sites are highly enriched for sequences matching the defined consensus binding sites Reprinted from: Iyer et al., Nature 409: 533 (2001)
16
Expression Profiles of SBF and MBF Targets
Genoom Biologie Prof. M. Zabeau Transcriptome data for synchronized cell cultures Expression Profiles of SBF and MBF Targets Figure 4 Expression profiles of SBF and MBF targets. a, Expression patterns of SBF and MBF targets are indicated (red–green colour scale). Cell-cycle data are from ref. 12 and sporulation data are from ref. 18. The stages of the cell cycle are: M/G1, yellow; G1, green; S, blue; S/G2l, red; and G2/M, orange. Yellow boxes indicate the presence of consensus binding sites in the intergenic sequences upstream of each ORF (right), and the median percentile rank in IPs of the upstream sequences is also indicated (blue–yellow colour scale), as in Fig. 3. For each set of targets, the top panel contains cell-cycle regulated genes, the bottom panel contains genes that are members of divergently transcribed pairs in which the other member was cell-cycle regulated, and the middle panel contains the remainder of the non-cell-cycle regulated genes. b, Average expression profiles of the cell-cycle regulated targets of SBF and MBF, computed by averaging the log 2 (Cy5/Cy3) ratios. Note the specific induction of MBF targets during sporulation. Reprinted from: Iyer et al., Nature 409: 533 (2001) Academiejaar
17
Expression Profiles of SBF and MBF Targets
Why are two different transcription factors used to mediate identical transcriptional programmes during the cell-division cycle in yeast? A possible answer is suggested by differences in the functions of the genes that they regulate Many of the targets of SBF have roles in cell-wall biogenesis and budding 25% of the MBF target genes have known roles in DNA replication, recombination and repair The results support a model in which SBF is the principal controller of membrane and cell-wall formation MBF primarily controls DNA replication The need for DNA replication and membrane / cell-wall biogenesis may be different in the mitotic and meiotic cell cycle Reprinted from: Iyer et al., Nature 409: 533 (2001)
18
A high-resolution map of active promoters in the human genome
Kim et. al., Nature 436: (2005) Paper presents a genome-wide map of active promoters in human fibroblast cells determined by experimentally locating the sites of RNA polymerase II preinitiation complex (PIC) binding map defines 10,567 active promoters corresponding to 6,763 known genes >1,196 un-annotated transcriptional units Global view of functional relationships in human cells between transcriptional machinery chromatin structure gene expression
19
Identification of active promoters in the human genome
Genoom Biologie Prof. M. Zabeau Identification of active promoters in the human genome Microarrays cover All non-repeat DNA at 100 bp resolution Pol II preinitiation complex (PIC) RNA polymerase II transcription factor IID general transcription factors ChIP of PIC-bound DNA monoclonal antibody against TAF1 subunit of the complex (TBP associated factor 1 ) FIGURE 1. Identification and characterization of active promoters in the human genome. a, Outline of the strategy used to map TFIID-binding sites in the genome. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
20
Results from TFIID ChIP-on-chip analysis
Genoom Biologie Prof. M. Zabeau Results from TFIID ChIP-on-chip analysis FIGURE 1. Identification and characterization of active promoters in the human genome. b, A representative view of the results from TFIID ChIP-on-chip analysis. Top panel, the logarithmic ratio (log2R) of hybridization intensities between TFIID ChIP DNA and a control DNA. Middle panel, RefSeq gene annotation. Bottom panel, a close-up view of two replicate sets of TFIID ChIP-on-chip hybridization signals around the 5' end of the TCFL1 gene. Arrows indicate the position of the TFIID-binding site determined by a peak-finding algorithm. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
21
Characterization of active promoters
Genoom Biologie Prof. M. Zabeau Characterization of active promoters Matched the 12,150 TFIID-binding sites to the 5' end of known transcripts in transcript databases 87% of the PIC-binding sites were within 2.5 kb of annotated 5' ends of known messenger RNAs 8,960 promoters were mapped within annotated boundaries of 6,763 known genes in the EnsEMBL genes FIGURE 1. Identification and characterization of active promoters in the human genome. d, e, Venn diagrams showing the number of identified promoters that matched EnsEMBL genes (d) or promoters annotated in DBTSS (e). Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
22
The chromatin-modification features of the active promoters
Genoom Biologie Prof. M. Zabeau The chromatin-modification features of the active promoters Validation of active promoters ChIP-on-chip using an anti-RNAP antibody ChIP-on-chip analysis using anti-acetylated histone H3 (AcH3) antibodies anti-dimethylated lysine 4 on histone H3 (MeH3K4) antibodies known epigenetic markers of active genes FIGURE 2. The chromatin-modification features of the active promoters. a, Logarithmic ratios of the ChIP-on-chip hybridization intensities (log2R) of probes from 0.5 kb upstream to 0.5 kb downstream of the identified TFIID-binding sites for TFIID, RNAP, AcH3 and MeH3K4 are plotted in a yellow−blue colour scale for 9,328 transcript-matched promoters. The bottom panel shows the colour scale with corresponding log2R values. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
23
TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of RPS24 gene
Genoom Biologie Prof. M. Zabeau TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of RPS24 gene FIGURE 2. The chromatin-modification features of the active promoters. b, A detailed view of TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of RPS24 gene. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
24
Additional findings Promoters of non-coding transcripts
Genoom Biologie Prof. M. Zabeau Additional findings Promoters of non-coding transcripts Are very similar to promoters of protein coding genes Promoters of novel genes Estimate 13% of human genes remain to be annotated in the genome Clustering of active promoters co-regulated genes tend to be organized into coordinately regulated domains Genes using multiple promoters Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
25
Multiple promoters in human genes
Genoom Biologie Prof. M. Zabeau Multiple promoters in human genes WEE1 gene locus Two different transcripts with alternative 5’ends Encoding different proteins Two different TFIID-binding sites- two promoters Differential transcription during the cell cycle FIGURE 3. Use of multiple promoters by human genes. a, Annotation of the WEE1 gene locus and the corresponding TFIID-binding profile. Black bars over the first and second exons in transcripts indicate the positions of the primers used for analysis of each transcript, using real-time quantitative PCR with reverse transcription (RT−PCR). b, RT−PCR analysis of NM_ and AK transcripts in an asynchronous population of IMR90 cells. c, Real-time quantitative RT−PCR analysis of NM_ and AK transcripts in cell-cycle synchronized populations of IMR90 cells. Transcript levels observed for each cell-cycle phase were normalized to the level observed in the asynchronous population. Error bars represent standard deviation. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
26
The transcriptome of a cell line
Genoom Biologie Prof. M. Zabeau The transcriptome of a cell line Functional relationship between transcription machinery and gene expression correlated genome-wide expression profiles with PIC promoter occupancy Four general classes of promoters Actively transcribed genes Weakly expressed genes Weakly PIC bound genes Inactive genes FIGURE 4. Four distinct classes of promoters define the transcriptome of IMR90 cells. a, A matrix describes the distribution of genes defined by expression and PIC occupancy on the promoter. b, c, Matrices showing the percentages of genes associated with AcH3 (b) or MeH3K4 (c) modification for each of the four classes of genes. Italicized numbers in some boxes represent extrapolation from the 29 ENCODE regions. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar
27
Genome-Wide Distribution of ORC and MCM Proteins in yeast: High-Resolution Mapping of Replication Origins Wyrick et. al., Science, 294, 2357 (2001) Paper presents Genome-wide location analysis to map the DNA replication origins in the 16 yeast chromosomes by determining the binding sites of prereplicative complex proteins
28
Chromosome Replication In Eukaryotic Cells
initiates from origins of replication distributed along chromosomes Origins of replication comprise autonomously replicating sequences (ARS) ARS contain an 11-bp ARS consensus sequence (ACS) Essential for replication initiation Recognized by the Origin Recognition Complex (ORC) The majority of sequence matches to the ACS in the genome do not have ARS activity Prereplicative complexes at replication origins comprise Origin Recognition Complex (ORC) proteins Minichromosome Maintenance (MCM) proteins Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
29
Prereplicative Complexes At Origins Of Replication
Genoom Biologie Prereplicative Complexes At Origins Of Replication Prof. M. Zabeau The complexities of duplication. The proteins that form the pre-replication complex (pre-RC) required for the initiation of DNA replication. The ORC binds to origins of replication (oris) in the chromosomes and establishes docking sites for the other protein components, such as MCM proteins, of the pre-RC. In metazoan species, geminin, which is degraded during mitosis, inhibits the activity of Cdt1, which is necessary for binding of MCM proteins to the origins of replication. Reprinted from: Stillman, Science, 294, 2301(2001) Academiejaar
30
ORC- and MCM-binding sites compared with known ARSs
Genoom Biologie Prof. M. Zabeau ORC- and MCM-binding sites compared with known ARSs High degree of correlation between MCM and ORC binding sites and known ARSs Correct identification of 88% known ARSs The method can accurately identify the position of ARSs to a resolution of 1 kb or less Figure 1. ORC and MCM binding to previously identified replication origins. Average binding ratios (blue/white) of ORC and MCM proteins to the known ARS-containing loci on chromosomes III and VI (ARS308 and ARS604 were not present on the arrays) and some randomly selected loci are shown. Random selection was accomplished with the "randbetween" function in Excel. The "i" preceding the locus name indicates the intergenic region to the right of the gene. Asterisks indicate randomly selected loci adjacent to or within 1 kb of a predicted origin. Data for other known origins are available in Web table 1 (18). Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Academiejaar
31
Genome-wide Location Of Potential Replication Origins
Genoom Biologie Genome-wide Location Of Potential Replication Origins Prof. M. Zabeau Identification of 429 potential origins on the entire genome Figure 2. Genome-wide location of potential replication origins. The genomic position of each probe present on the arrays is plotted to scale as a green bar (Web table 3) (18). The predicted origin-containing loci (pro-ARS) are plotted to scale as a red bar and named systematically (Web table 2) (18). Variations in width and apparent intensities of green or red color reflect different probe lengths, not hybridization ratios. Probes to Watson and Crick ORFs are plotted on the top and bottom rows; intergenic sequences are plotted on the center rows. Asterisks indicate known ARSs that were not identified. Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Academiejaar
32
Conclusions The ChIP-based method identified the majority of origins found in the analysis of genome-wide replication timing in yeast and provides direct, high-resolution mapping of potential origins Similar approaches identified origins in other organisms For example: Coordination of replication and transcription along a Drosophila chromosome MacAlpine et al., Genes & Dev. 18: (2004) Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
33
Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)
34
Global analysis of protein localization in budding yeast
Huh et. al., Nature 425, (2004) Paper presents An approach to define the organization of proteins in the context of cellular compartments involving the construction and analysis of a collection of yeast strains expressing full-length, chromosomally tagged green fluorescent protein fusion proteins
35
Experimental Strategy
Systematic tagging of yeast ORFs with green fluorescent protein (GFP) GFP is fused to the carboxy terminus of each ORF Full length fusion proteins are expressed from their native promoters and chromosomal location The collection of yeast strains expressing GFP fusions was analyzed by fluorescence microscopy to determine the primary subcellular localization of the fusion proteins Defines 12 categories co-localization with red fluorescent protein (RFP) markers to refine the subcellular localization Defines 11 additional categories Reprinted from: Huh et. al., Nature 425, (2004)
36
Construction of GFP fusion proteins
For each ORF a pair of PCR primers was designed Homologous to the chromosomal insertion site Matching a GFP – selectable marker construct Yeast was transformed with the PCR products to generate Strains expressing chromosomally tagged ORFs Reprinted from: Huh et. al., Nature 425, (2004)
37
Representative GFP Images
Nucleus Nuclear periphery ER Bud neck mitochondrion Lipid particle Reprinted from: Huh et. al., Nature 425, (2004)
38
GFP and RFP Co-localization Images
Nucleolar marker Reprinted from: Huh et. al., Nature 425, (2004)
39
Global results Constructed ~6.000 ORF-GFP fusions
22 categories Constructed ~6.000 ORF-GFP fusions 4.156 had localizable GFP signals (~75% of the yeast proteome) Good concordance with data from earlier studies GFP does not affect the location Localized 70% of the new proteins Major compartments: cytoplasm (30%) and the nucleus (25%) 20 other compartments: 44% of the proteins Most the proteins can be located in discrete cellular compartments Reprinted from: Huh et. al., Nature 425, (2004)
40
The proteome of the nucleolus
Detected 164 proteins in the nucleolus Plus 45 identified in other studies Data are consistent with MS analysis of human Nucleolar proteins Allows identification of yeast-human orthologs Reprinted from: Huh et. al., Nature 425, (2004)
41
Transcriptional co-regulation and subcellular localization are correlated
33 transcription modules Co-regulated genes Reprinted from: Huh et. al., Nature 425, (2004)
42
Conclusion The high-resolution, high-coverage localization data set
represents 75% of the yeast proteome classified into 22 distinct subcellular localization categories, Analysis of these proteins in the context of transcriptional, genetic, and protein–protein interaction data provides a comprehensive view of interactions within and between organelles in eukaryotic cells. helps reveal the logic of transcriptional co-regulation Reprinted from: Huh et. al., Nature 425, (2004)
43
Genome Biology and Biotechnology
Genoom Biologie Prof. M. Zabeau Genome Biology and Biotechnology 10. The proteome International course 2005 Academiejaar
44
Summary Protein interactome Proteome Multilevel functional genomics
Yeast two-hybrid protein interaction mapping Proteome Isolation of protein complexes Multilevel functional genomics Combination of phenome analysis protein interaction mapping
45
Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)
46
Basic Concept of the Yeast Two-hybrid System
Eukaryotic transcription factors activate RNA polymerase II at promoters by binding to upstream activating DNA sequences (UAS) Basic structure of eukaryotic transcription factors The DNA binding and the activating functions are located in physically separable domains The DNA-binding domain (DB) The activation domain (AD) The connection between DB and AD is structurally flexible Protein-protein interactions can reconstitute a functional transcription factor by bringing the DB domain and the AD domain into close physical proximity Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999)
47
Yeast two-hybrid system
Genoom Biologie Prof. M. Zabeau Yeast two-hybrid system ‘Architectural blueprint’ for a functional transcription factor DB-X/AD-Y, where X and Y could be essentially any proteins from any organism Gal4 transcription-activation domain prey X AD Y bait Gal4 DNA binding domain DB UAS Upstream Activating Sequence Selectable marker gene Academiejaar
48
Yeast two-hybrid system
The yeast two-hybrid system allows Genetic selection of genes encoding potential interacting proteins without the need for protein purification System is to isolate genes encoding proteins that potentially interact with DB-X (referred to as the ‘bait’) in complex AD-Y libraries (referred to as the ‘prey’) Limitations of the system include False positives: clones with no biological relevance False negatives: Failure to identify knowm interactions Stringent criteria must be used to evaluate both the specificity and the sensitivity of the assay Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999)
49
Walhout et al, Science 287: 116 (2000)
Genoom Biologie Protein Interaction Mapping in C. elegans Using Proteins Involved in Vulval Development Prof. M. Zabeau Walhout et al, Science 287: 116 (2000) Landmark paper presents First demonstration of large-scale two-hybrid analysis for protein interaction mapping in C. elegans starting with 27 proteins involved in vulval development in C. Elegans Protein interaction mapping using large-scale two-hybrid analysis has been proposed as a way to functionally annotate large numbers of uncharacterized proteins predicted by complete genome sequences. This approach was examined in Caenorhabditis elegans, starting with 27 proteins involved in vulval development. The resulting map reveals both known and new potential interactions and provides a functional annotation for approximately 100 uncharacterized gene products. A protein interaction mapping project is now feasible for C. elegans on a genome-wide scale and should contribute to the understanding of molecular mechanisms in this organism and in human diseases. Academiejaar
50
Experimental Approach
Start from known genes in vulval development Used Recombinational cloning to introduce ORFs of 29 known genes involved in vulval development into two-hybrid vectors Matrix two-hybrid experiment with 29 ORFs Each DB-vORF/AD-vORF pairwise combination was tested for protein-protein interactions by scoring two-hybrid phenotypes Exhaustive two-hybrid screen using 27 vORF-DB fusion proteins as baits to select interactors from a AD-Y cDNA library sequenced the selected clones: interaction sequence tag (IST) Reprinted from:Walhout et al, Science 287: 116 (2000)
51
Construction of DB and AD Fusions by Recombinational Cloning
Genoom Biologie Construction of DB and AD Fusions by Recombinational Cloning Prof. M. Zabeau DNA binding domain Activation domain Phage lambda excision: Integrase, IHF & Exisionase Recombinational cloning (RC) (14). RC is based on the recombination reactions that mediate the integration and excision of phage into and from the E. coli genome, respectively. The integration involves recombination of the attP site of the phage DNA within the attB site located in the bacterial genome (BP reaction) and generates an integrated phage genome flanked by attL and attR sites. The excision recombines attL and attR sites back to attP and attB sites (LR reaction). The integration reaction requires two enzymes [the phage protein Integrase (Int) and the bacterial protein integration host factor (IHF)] (BP clonase). The excision reaction requires Int, IHF, and an additional phage enzyme, Excisionase (Xis) (LR clonase). Artificial derivatives of the 25-bp bacterial attB recombination site, referred to as B1 and B2, were added to the 5' end of the primers used in PCR reactions to amplify the vORFs (Fig. 1B). The resulting products were BP cloned into a "Donor vector" containing complementary derivatives of the phage attP recombination site (P1 and P2) using BP clonase. The resulting "Entry clones" contain vORFs flanked by derivatives of the attL site (L1 and L2) and were subcloned into two-hybrid "destination vectors" which contain derivatives of the attL-ompatible attR sites (R1 and R2) using LR clonase. This resulted in "expression clones" in which vORFs are flanked by B1 and B2 and fused in frame to the DNA-binding domain (DB) or the activation domain (AD) of Gal4p. To ensure that both NH2- and COOH-terminal fusion proteins can be generated, the B1 and B2 sequences were designed to be in frame with the vORF sequences. Note that different RC vectors harbor different selectable markers. In addition, both Entry and Destination vectors contain a toxic gene which prevents growth of most commonly used E. coli strains. This allows a genetic selection for the desired end products of each reaction. In addition to R1 and R2 RC sites DB-dest and AD-dest vectors contain yeast ARS and CEN sequences and LEU2 or TRP1 selectable marker, respectively. Because protein immunoblotting techniques are not compatible with high-throughput experiments, full-length vORF expression was tested using COOH-terminal fusions to GFP. However, no pDB-GFP destination vector is available at this point. Thus, vORFs were shuffled by PCR-Gap repair (17). DB-ORF fusions AD-ORF fusions Reprinted from: Walhout et al, Science 287: 116 (2000) Academiejaar
52
Matrix of Two-hybrid Interactions Between the vORFs
Genoom Biologie Prof. M. Zabeau Matrix of Two-hybrid Interactions Between the vORFs Fig. 2. Protein interaction mapping. (A) Matrix of two-hybrid interactions between vORF-encoded proteins. The 29 vORFs cloned into pDB-dest and pAD-dest (Fig. 1) were transformed into yeast cells of opposite mating types (MaV103 and MaV203, respectively) (17). Diploids for every pairwise combination were generated by mating and tested for two-hybrid phenotypes. Color coding is as follows. Dark gray squares: selfactivation (SA) levels that are too high for two-hybrid screens (SA occurs from the ability of a DB-bait protein to up-regulate two-hybrid reporter gene expression in the absence of any AD interactor); light gray squares: intermediate SA levels which are compatible with two-hybrid screening using higher concentrations of 3-aminotriazole (3AT) (17); blue squares: interactions previously reported either in C. elegans or in other model organisms (potential interologs, Fig. 3A) and undetected in either the DB-X/AD-Y or the AD-X/DB-Y orientation (false negatives); red squares: interactions previously reported and detected in the Matrix assay; pink squares: interactions previously reported and detected in the Matrix assay in the opposite configuration only; orange square: interaction not found in the Matrix but uncovered in the screens described in Fig. 2B; yellow squares: novel potential interactions between the products of vORFs. Reprinted from:Walhout et al, Science 287: 116 (2000) Academiejaar
53
Interaction Sequence Tag (IST) screening
Reprinted from:Walhout et al, Science 287: 116 (2000)
54
Results Matrix two-hybrid experiment with 29 ORFs Two-hybrid screen
~ 50% (6 of 11) of the interactions reported were detected Two novel potential interactions were identified Typically the yeast two-hybrid system will detect ~50% of the naturally occurring interactions Two-hybrid screen Identified 992 AD-Y encoding sequences ISTs corresponded to a total 124 different interacting proteins 15 previously known Provides a functional annotation for 109 predicted genes Reprinted from:Walhout et al, Science 287: 116 (2000)
55
Validation of Potential Interactions
Conservation of interactions in other organisms If X' and Y' are orthologs of X and Y, respectively X/Y conserved interactions are referred to as "interologs" Reprinted from:Walhout et al, Science 287: 116 (2000)
56
Validation of Potential Interactions
Systematic clustering analysis closed loop connections between vORF- encoded proteins X interacts with Y, Y interacts with Z, Z interacts with W, and so on (X/Y/Z/W/...) Mutations with Similar phenotypes Reprinted from:Walhout et al, Science 287: 116 (2000)
57
Conclusions Demonstrated the feasibility of generating a genome-wide protein interaction maps Two-hybrid screens are Simple sensitive amenable to high-throughput Feasible using the C. elegans ORFeome Y2H detects approximately 50% of the interactions provides a useful coverage of biologically important interactions Reprinted from:Walhout et al, Science 287: 116 (2000)
58
A Comprehensive Analysis of Protein–protein Interactions in Saccharomyces Cerevisiae
Uetz et al., Nature 403: 623 (2000) Landmark paper presents The first Large scale high throughput mapping of protein-protein interactions between ORFs predicted in S. cerevisiae using Two complementary yeast two-hybrid screening strategies Two-hybrid array of hybrid proteins High-throughput library screen
59
The two-hybrid array screening
Two-hybrid array of hybrid proteins comprises Haploid yeast colonies derived from ~6,000 yeast ORFs fused to the Gal4 activation domain (AD) The two-hybrid array contained on 16 plates of 384 colonies Matrix screen for interactions 192 different Gal4 DB ORF hybrids were mated to the two-hybrid array 192 two-hybrid array screens were performed in duplicate Each yielded 1–30 positives But only ~ 20% were reproduced in the duplicate screen Putative interacting partners identified 87/192 DB hybrids yielded putative protein–protein interactions Identified 281 interacting protein pairs Reprinted from: Uetz et al., Nature 403: 623 (2000)
60
The two-hybrid array screening
Genoom Biologie The two-hybrid array screening Prof. M. Zabeau Positive control: 6,000 haploid yeast Gal4 activation domain - ORF fusions Two-hybrid positives from a mating with a Gal4 DNA-binding domain - ORF fusion Figure 1. a, The array of 6,000 haploid yeast transformants plated on medium lacking leucine, which allows growth of all transformants. Each transformant expresses one of the yeast ORFs expressed as a fusion to the Gal4 activation domain. b, Two-hybrid positives from a screen of the array with a Gal4 DNA-binding domain fusion of the Pcf11 protein, a component of the pre-mRNA cleavage and polyadenylation factor IA, which also consists of four other polypeptides36. Diploid colonies are shown after two weeks of growth on medium lacking tryptophan, leucine and histidine and supplemented with 3 mM 3-amino-1,2,4-triazole, thus allowing growth only of cells that express the HIS3 two-hybrid reporter gene. Three other components of factor IA, Rna14, Rna15 and Clp1, were identified as Pcf11 interactors. Positives that do not appear in Table 2 were either not reproducible or are false positives that occurred in many screens. 16 microassay plates Reprinted from: Uetz et al., Nature 403: 623 (2000) Academiejaar
61
High-Throughput Library Screen
Used a library Made by pooling ORF-AD fusions Each ORFs was fused separately to a gal4 activation domain ORF-AD fusions were pooled to form an activation-domain library Advantage over traditional cDNA libraries is the uniform presentation of each ORF Protein interactions were screened by mating the DNA-binding domain hybrids in duplicate to the activation domain library 817 yeast ORFs (15%) yielded protein–protein interactions Identified 692 interacting protein pairs 68% of the interactions were identified multiple times Reprinted from: Uetz et al., Nature 403: 623 (2000)
62
Results of the Systematic Two-Hybrid Screens
The matrix array screens gave more interactors 45% of the 192 proteins in the array screens yielded interactions are much more labour- and material-intensive limits the number of screens that can be performed Full matrix would require testing * = interactions! The library screens gave fewer interactors 8% of the proteins tested in the library screens yielded interactions a much higher throughput Reprinted from: Uetz et al., Nature 403: 623 (2000)
63
Analysis of the protein-protein interactions
The analysis reveals Interactions that place unknown proteins into a biological context Novel interactions between proteins involved in the same biological function Novel interactions that connect biological functions into larger cellular processes
64
Interactions involving unknown proteins
Genoom Biologie Prof. M. Zabeau Interactions involving unknown proteins Figure 3 Expanded pathways shown using the software as described in the text. a, Autophagy pathway illustrating potential novel interactions that place functionally unclassified proteins in a biological context. Reprinted from: Uetz et al., Nature 403: 623 (2000) Academiejaar
65
Interactions Between Proteins in the RNA Splicing Complex
Genoom Biologie Prof. M. Zabeau Interactions Between Proteins in the RNA Splicing Complex Figure 3 Expanded pathways shown using the software as described in the text. a, Autophagy pathway illustrating potential novel interactions that place functionally unclassified proteins in a biological context. b, Potential interactions identified by screens of the Sm motif-containing proteins Lsm2, Lsm4 and Lsm8. c, The Clb/Cdc28/Cks1 complex shows novel interactions between proteins involved in the same biological function. d, The Msh5 pathway illustrates novel interactions that link biological functions together into greater cellular processes. Interactions are consistent with the crystallographic data Reprinted from: Uetz et al., Nature 403: 623 (2000) Academiejaar
66
Interaction Connecting two different Complexes
Genoom Biologie Prof. M. Zabeau Interaction Connecting two different Complexes spindle checkpoint complex microtubule checkpoint complex Figure 2 Data analysis software. a, The putative interaction identified between Mad3 and Bub3 which connects the spindle checkpoint complex37 and the microtubule checkpoint complex38. Yeast proteins are shown as yellow spheres with the name of each gene indicated. Interactions in Figs 2 and 3 are shown as black lines (from literature), solid green lines (from library screens in independent matings), dashed green lines (from library screens in one mating), purple (from array screens) and blue lines (from literature and screens). Arrows point away from the protein used as the binding-domain clone when the interaction was identified. Grey nubs indicate other proteins that interact with that protein but have not been expanded. b, Same pathway shown using the homologue viewer; the pathway can be rotated and homologous proteins in human, mouse, rat, Drosophila, Caenorhabditis elegans and Escherichia coli can be displayed. Known interactions between proteins in other species can be viewed: for this pathway interactions between the human proteins hMad3/hBub3, hBub3/hBub1, hBub1/hMad1, hBub3/hMad1 and hCDC20/hMad2, shown in black, are reported in the literature The distance of each species protein icon (shown in key) from the yeast proteins (shown in yellow) represents the amount of overall similarity between the species. The size of the protein icon in each corresponding species indicates the amount of homology with the specific yeast protein. As the human homologues are highlighted in this example, their gene names are shown. Reprinted from: Uetz et al., Nature 403: 623 (2000) Academiejaar
67
Analysis of Interologs
Genoom Biologie Prof. M. Zabeau Yeast Figure 2 Data analysis software b, Same pathway shown using the homologue viewer; the pathway can be rotated and homologous proteins in human, mouse, rat, Drosophila, Caenorhabditis elegans and Escherichia coli can be displayed. Known interactions between proteins in other species can be viewed: for this pathway interactions between the human proteins hMad3/hBub3, hBub3/hBub1, hBub1/hMad1, hBub3/hMad1 and hCDC20/hMad2, shown in black, are reported in the literature The distance of each species protein icon (shown in key) from the yeast proteins (shown in yellow) represents the amount of overall similarity between the species. The size of the protein icon in each corresponding species indicates the amount of homology with the specific yeast protein. As the human homologues are highlighted in this example, their gene names are shown Human Reprinted from: Uetz et al., Nature 403: 623 (2000) Academiejaar
68
Conclusions The two-hybrid array approach is feasible
for systematic genome-wide analysis of protein interactions The large scale mapping of protein-protein interactions reveals many new interactions between proteins that protein interactions should be viewed as potential interactions that must be confirmed independently This conclusion is supported by the fact that the results of different screens only partially overlap Reprinted from: Uetz et al., Nature 403: 623 (2000)
69
A Map of the Interactome Network of the Metazoan C. elegans
Li et. al., Science, 303, (2004) Paper presents Large scale mapping of protein-protein interaction in C. elegans using yeast two-hybrid screens with a subset of metazoan-specific proteins identified > 4000 interactions Together with already described Y2H interactions and interologs predicted in silico, the current version of the Worm Interactome map contains interactions
70
Worm Interactome map Phylogenetic classes Eukaryotic Multi cellular
Reprinted from: Li et. al., Science, 303, (2004)
71
A Protein Interaction Map of Drosophila melanogaster
Genoom Biologie Prof. M. Zabeau A Protein Interaction Map of Drosophila melanogaster Giot et. al., Science, 302, (2003) Paper presents a two-hybrid–based protein-interaction map of the fly proteome by screening 10,623 ORFs against cDNA libraries to produce a draft map of 7048 proteins and 20,405 interactions. Computational rating of interaction confidence produced a high confidence interaction network of 4679 proteins and 4780 interactions showing two levels of organization a short-range organization, presumably corresponding to multiprotein complexes a more global organization, presumably corresponding to intercomplex connections Academiejaar
72
The fly protein-interaction map: Protein family/human disease orthologs
Reprinted from: Giot et. al., Science, 302, (2003)
73
The fly protein-interaction map: Subcellular localization
Reprinted from: Giot et. al., Science, 302, (2003)
74
Towards a proteome-scale map of the human protein–protein interaction network
Rual et. al., Nature 424: (2005) Paper presents First step towards a systematic and comprehensive analysis of the human interactome using stringent, high-throughput yeast two-hybrid system to test pairwise interactions among the products of 8,100 currently available Gateway-cloned open reading frames
75
High-throughput yeast two-hybrid pipeline
Genoom Biologie Prof. M. Zabeau High-throughput yeast two-hybrid pipeline Stringent test Second test using GAL1::HIS3 and GAL1::lacZ Reduces the number of false positives Detected 2,800 interactions FIGURE 1. Towards the generation of a proteome-scale human yeast two-hybrid map. a, Schema of the high-throughput yeast two-hybrid pipeline. Individual steps (middle column) and representative examples (flanking left and right columns) are indicated. The top panel of the left column represents the matrix of all protein pairs. All available ORFs from human ORFeome v1.1 were transferred into both DB and AD vectors by recombinational cloning (middle panel of left column). The top panel of the right column shows the mating process, with each bait mated to individual pools of 188 AD-ORFs. Initial phenotypic testing evaluated growth of diploid cells on selective medium in response to enhanced levels of the GAL1::HIS3 selective marker (bottom panel of left column). All positive diploids from phenotyping no. 1 (red circles) were subsequently tested for activation of both GAL1::HIS3 and GAL1::lacZ reporter genes. Auto-activators were identified by growth on medium containing cycloheximide (bottom panels of left and right columns). Positive colonies from phenotyping no. 2 (outlined in red) were isolated and used to PCR-amplify both DB-ORF and AD-ORF fragments for sequencing. b, Verification of yeast two-hybrid interactions by co-affinity purification assays. Fifteen representative examples of co-affinity purification-positive assays are shown. The middle and bottom panels show expression controls of Myc–prey and GST–bait fusion proteins, respectively. Each lane pair in the top panels shows presence or absence of Myc–prey fusions after affinity purification, demonstrating binding to GST–bait fusion proteins (+ ) or to GST alone (- ). The Table summarizes the data obtained for four different classes of protein pairs. 'Y2H and LCI' describes interactions reported in both the yeast two-hybrid and LCI data sets. 'Y2H/LCI-negative' describes pairs of proteins that were not reported to interact either in the yeast two-hybrid or in the LCI data sets. Rows indicate the total number of interactions tested and considered for scoring (Total), the number of interactions not verified by co-affinity purification (co-AP-), the number of interactions verified by co-affinity purification (co-AP+), the proportion of co-affinity purification-positive interactions (success rate), and the adjusted success rate (which accounts for the observation that one-third of all co-affinity purification experiments yield an apparently positive result without regard to whether or not the protein pair truly interacts; see Supplementary Data IX). Identities, lane positions and scoring of all protein pairs tested by co-affinity purification are provided in Supplementary Tables S2 and S3. Reprinted from: Rual et. al., Nature 424: (2005) Academiejaar
76
Overlap of CCSB-HI1 with literature data
Genoom Biologie Prof. M. Zabeau Overlap of CCSB-HI1 with literature data Compared the overlap between Observed interactions Interactions reported in the literature Conclude that the CCSB-HI1 data set contains 1% of the human interactome Human interactome is estimated at to interactions. FIGURE 2. Overlap of CCSB-HI1 with existing literature-curated (LC) data. a, Overlap between CCSB-HI1 and LC interactions in Space-I (LCI). The top, middle and bottom panels represent the overlap between CCSB-HI1 and LCI, LCI-core and LCI-hypercore, respectively. b, Network graph of the union of all CCSB-HI1 and LCI interactions. Proteins are shown as yellow nodes and CCSB-HI1 and LCI interactions are shown as red and blue edges, respectively. Blue edges with increasing thickness indicate LCI-non-core, LCI-core and LCI-hypercore, respectively. The apparent banding pattern of the yellow nodes is an artefact of the graph layout algorithm (Supplementary Data). Importantly, the layout algorithm was not informed by type of supporting evidence and therefore does not explain the evident separation of blue and red edges. c, Bias in 2-hop network neighbourhood for either CCSB-HI1 or LCI interactions. The frequency of nodes with a given proportion of CCSB-HI1 interactions in their 2-hop neighbourhood is depicted for the interactome network graph in b (solid curve) and for a network in which the types of supporting evidence (CCSB-HI1 or LCI) are randomly permuted among edges (dashed curve). The solid curve indicates that most of the proteins in the network of b have either only CCSB-HI1 or only LCI interactions in their 2-hop neighbourhood. In contrast, neighbourhoods are well mixed when evidence labels are randomly permuted among edges. Reprinted from: Rual et. al., Nature 424: (2005) Academiejaar
77
Interaction network of disease-associated CCSB-HI1 proteins
Genoom Biologie Prof. M. Zabeau Interaction network of disease-associated CCSB-HI1 proteins The human interactome will further the understanding of human health and disease Illustrated by The network of disease-associated proteins (green nodes) EWS protein FIGURE 3. Interaction network of disease-associated CCSB-HI1 proteins The network has 121 OMIM disease-associated proteins (green nodes) and 424 CCSB-HI1 interactions involving them (red edges), along with known LC interactions (solid blue edges represent binary LCI interactions and dashed blue edges represent non-binary interactions). Proteins without an OMIM disease association are depicted as yellow nodes, and blue edges with increasing thickness indicate LCI-non-core, LCI-core and LCI-hypercore interactions, respectively. We note that 94 out of the 424 CCSB-HI1 interactions involve the Ewing sarcoma related protein (EWSR1; also known as EWS). Reprinted from: Rual et. al., Nature 424: (2005) Academiejaar
78
Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)
79
Proteome Analysis Large scale and comprehensive analysis of the proteome has so far not been feasible Lack of suitable and sensitive protein fractionation methods 2-D gels are limited to a few 1000 proteins only – the most abundant Protein characterization is slow and laborious Despite enormous improvements in mass spectrometry, the characterization of individual proteins remains the bottleneck Level of proteome characterization to date is in the order of a few 1000 proteins at best Represents 5% to 25% of the proteome Tandem affinity purification (TAP) technology constitutes an important breakthrough Fast and reliable method of protein purification
80
Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)
A generic protein purification method for protein complex characterization Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) Paper presents a generic procedure to purify protein complexes under native conditions using tandem affinity purification (TAP) tag procedure Using a combination of high-affinity tags for purification
81
Tag-based Characterization of protein complexes
Genoom Biologie Prof. M. Zabeau Tag-based Characterization of protein complexes Figure 1 Analysing protein interactions. In the 'co-precipitation/mass spectrometry' approach used by Gavin et al.1 and Ho et al.2, an 'affinity tag' is first attached to a target protein (the 'bait'; a). b, Bait proteins are systematically precipitated, along with any associated proteins, on an 'affinity column'. c, Purified protein complexes are resolved by one-dimensional SDS–PAGE, a technique that involves running an electric charge through the complexes on a gel, so that proteins become separated according to mass. d, Proteins are excised from the gel, digested with the enzyme trypsin, and analysed by mass spectrometry. Database- search algorithms (bioinformatics) are then used to identify specific proteins from their mass spectra. Reprinted from: Kumar A. and Snyder M., Nature 415, 123(2002) Academiejaar
82
High-affinity Tags High-affinity protein tags
Must allow efficient recovery of proteins present at low concentrations ProtA tag: two IgG-binding units of protein A of S. aureus released from matrix-bound IgG under denaturing conditions CBP tag: calmodulin-binding peptide released from the affinity column under mild conditions Tandem affinity purification (TAP) tag A fusion cassette encoding both the ProtA tag and the CBP tag Separated by a specific TEV protease recognition sequence which allows proteolytic release of the bound material under native conditions Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)
83
Tandem affinity purification (TAP) tag
Genoom Biologie Prof. M. Zabeau Tandem affinity purification (TAP) tag CBP Figure 1: The TAP strategy: rationale and testing. (A) Sequence and structure of the TAP tag. The various domains constituting the TAP tag are indicated. ProtA Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) Academiejaar
84
The TAP Purification Procedure
Genoom Biologie Prof. M. Zabeau The TAP Purification Procedure ProtA affinity purification step TEV protease cleavage step Figure 1: The TAP strategy: rationale and testing. (B) Overview of the TAP procedure. CBP affinity purification step Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) Academiejaar
85
Advantage of the Two-step Procedure
Purification of U1 snRNP Single-step affinity purification yields a high level of contaminating proteins Tow-step affinity purification yields highly specific purification with very low background Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)
86
Functional organization of the yeast proteome by systematic analysis of protein complexes
Gavin et. al., Nature 415, 141 (2002) Landmark paper presents Large-scale application of the TAP technology for a systematic analysis of multiprotein complexes from yeast Generated gene-specific TAP tag cassettes by PCR Insert TAP cassettes by homologous recombination at the 3' end of the genes to generate fusion proteins in their native location Purified protein assemblies from cellular lysates by TAP Separate purified assemblies by denaturing gel electrophoresis Digest individual bands by trypsin Analyze peptides by MALDI–TOF MS to identify the proteins using database search algorithms
87
The Gene Targeting Procedure
Genoom Biologie Prof. M. Zabeau The Gene Targeting Procedure TAP tag gene-specific cassette Figure 1 Synopsis of the screen. a, Schematic representation of the gene targeting procedure. The TAP cassette is inserted at the C terminus of a given yeast ORF by homologous recombination, generating the TAP-tagged fusion protein Reprinted from: Gavin et. al., Nature 415, 141 (2002) Academiejaar
88
Large-scale Analysis of Protein Complexes
Experimental outline Started with a selection of 1,739 genes 1,143 genes representing eukaryotic orthologues 596 genes nonorthologous set Generated 1,167 strains expressing tagged proteins to detectable levels Analyzed 589 protein complexes Comprising 418 different orthologues Generated 20,946 samples for mass spectrometry Identified 16,830 proteins Characterized a total of 232 protein complexes Comprising 1,440 distinct proteins ~ 25% of the ORFs in the genome Reprinted from: Gavin et. al., Nature 415, 141 (2002)
89
Purification and Identification of TAP Complexes
Genoom Biologie Prof. M. Zabeau Purification and Identification of TAP Complexes Figure 1 Synopsis of the screen. c, Schematic representation of the sequential steps used for the purification and identification of TAP complexes (left), and the number of experiments and success rate at each step of the procedure (right). Reprinted from: Gavin et. al., Nature 415, 141 (2002) Academiejaar
90
Sensitivity and Specificity of the Approach
Very efficient large-scale purification and identification of protein complexes 78% of the 589 purified complexes have associated proteins The remaining 22% showing no interacting proteins May not form stable or soluble complexes The TAP tag may interfere with complex assembly or function Complexes are stable and show the same composition when purified with different entry points Example: the polyadenylation machinery, responsible for eukaryotic messenger RNA cleavage and polyadenylation Identified 12 of the 13 known components Identified 7 new components Reprinted from: Gavin et. al., Nature 415, 141 (2002)
91
The Polyadenylation Protein Complex
Genoom Biologie Prof. M. Zabeau The Polyadenylation Protein Complex new components of the polyadenylation complex Figure 3 Primary validation of complex composition by 'reverse' purification: the polyadenylation machinery. b, Proposed model of the polyadenylation machinery. Reprinted from: Gavin et. al., Nature 415, 141 (2002) Academiejaar
92
Composition of the Polyadenylation Complex
Genoom Biologie Prof. M. Zabeau Composition of the Polyadenylation Complex protein tagged for affinity purification < Figure 3 Primary validation of complex composition by 'reverse' purification: the polyadenylation machinery. a, A similar band pattern is observed when different components of the polyadenylation machinery complex are used as entry points for affinity purification. Underlined are new components of the polyadenylation machinery complex for which a physical association has not yet been described. The bands of the tagged proteins are indicated by arrowheads. Reprinted from: Gavin et. al., Nature 415, 141 (2002) Academiejaar
93
Reliability of the TAP Method
High sensitivity identify proteins present at 15 copies per cell High reproducibility 70% of the proteins are detected in independent purifications Low background The background comprises highly expressed proteins Identified 17 contaminant proteins (heat-shock and ribosomal proteins) Limitations 18% of the tagged essential genes gave no viable strains The carboxy-terminal tagging can impair protein function Reprinted from: Gavin et. al., Nature 415, 141 (2002)
94
Organization of the purified assemblies into complexes
589 purified complexes characterized 245 complexes corresponded to 98 known multiprotein complexes in yeast 242 complexes correspond to 134 new complexes In total 232 annotated TAP complexes are identified 102 proteins showed no detectable association with other proteins Reprinted from: Gavin et. al., Nature 415, 141 (2002)
95
Number Of Proteins Per Complex
Average of 12 proteins per complex Reprinted from: Gavin et. al., Nature 415, 141 (2002)
96
Functional Classification Of The Complexes
wide functional distribution of complexes Reprinted from: Gavin et. al., Nature 415, 141 (2002)
97
Protein Complexes are Dynamic
Complexes are not necessarily of invariable composition Using distinct tagged proteins as entry points to purify a complex Core components can be identified as invariably present Regulatory components may be present differentially Dynamic complexes: e.g. signaling complexes The interactions of a signalling enzyme may be sufficiently strong to allow the detection of distinct cellular complexes They may be diagnostic for the role of these enzymes in different cellular activities Reprinted from: Gavin et. al., Nature 415, 141 (2002)
98
Higher-order Organization of The Proteome Map
Most complexes are linked together Complexes belonging to the same functional class often share components mRNA metabolism, cell cycle, protein synthesis and turnover, intermediate and energy metabolism Shared components linking complexes into a network The network connections reflect physical interaction of complexes common architecture, localization or regulation Relationships between complexes suggests integration and coordination of cellular functions The more connected a complex, the more central its position in the network Reprinted from: Gavin et. al., Nature 415, 141 (2002)
99
The Yeast Protein Complex Network
Genoom Biologie Prof. M. Zabeau membrane biogenesis and traffic cell polarity and structure protein synthesis and turnover intermediate and energy metabolism signalling cell cycle Figure 4 The protein complex network, and grouping of connected complexes. Links were established between complexes sharing at least one protein. For clarity, proteins found in more than nine complexes were omitted. The graphs were generated automatically by a relaxation algorithm that finds a local minimum in the distribution of nodes by minimizing the distance of connected nodes and maximizing distance of unconnected nodes. In the upper panel, cellular roles of the individual complexes (ascribed in Supplementary Information Table S3) are colour coded: red, cell cycle; dark green, signalling; dark blue, transcription, DNA maintenance, chromatin structure; pink, protein and RNA transport; orange, RNA metabolism; light green, protein synthesis and turnover; brown, cell polarity and structure; violet, intermediate and energy metabolism; light blue, membrane biogenesis and traffic. The lower panel is an example of a complex (yeast TAP-C212) linked to two other complexes (yeast TAP-C77 and TAP-C110) by shared components. It illustrates the connection between the protein and complex levels of organization. Red lines indicate physical interactions as listed in YPD22. Transcription DNA maintenance chromatin structure RNA metabolism protein and RNA transport Reprinted from: Gavin et. al., Nature 415, 141 (2002) Academiejaar
100
Protein Complexes Have a Similar Composition in Yeast and Human
Genoom Biologie Protein Complexes Have a Similar Composition in Yeast and Human Prof. M. Zabeau Figure 5 Protein complexes have a similar composition in yeast and human. Comparison of three TAP protein complexes isolated from human and yeast cells. All orthologous pairs are indicated by arrows, demonstrating that the complex composition between yeast and human is largely conserved. Coomassie-stained gels are shown only for the human purifications. a, Arp2/3 complex; b, Ccr4–Not2 complex; c, Trapp complex. Hyp. protein, hypothetical protein. Reprinted from: Gavin et. al., Nature 415, 141 (2002) Academiejaar
101
Conclusions The paper clearly demonstrates the merits of the TAP technology for characterizing protein complexes from different compartments, including low-abundance and large complexes TAP data and yeast two-hybrid assay data show only a very small overlap The two methodologies address different aspects of protein interaction and are complementary The TAP analysis provides an outline of the eukaryotic proteome as a network of protein complexes The human–yeast orthologous proteome represents core functions for the eukaryotic cell Orthologous proteins are often responsible for essential functions Reprinted from: Gavin et. al., Nature 415, 141 (2002)
102
Genome Biology and Biotechnology
Genoom Biologie Prof. M. Zabeau Genome Biology and Biotechnology The next frontier: Systems biology International course 2005 Academiejaar
103
Genomics
104
Functional Genomics . .
105
Systems Biology
106
From genes to networks gene pathway network Molecular Biology
60s to mid 80s gene Molecular Genetics since mid 80s pathway Systems Biology since mid 90s network
107
The large-scale organisation of metabolic networks
Jeong et al (2000) Nature 407: 651 Study of the design principles underlying the structure of biological systems Dissection of integrated “pathway-genome” databases providing complex connectivity maps
108
Reprinted from: Jeong et al (2000) Nature 407: 651
Case study Analyses of core cellular metabolisms as described in the `Intermediate metabolism and bioenergetics' portions of the WIT database Prediction of metabolic pathways in organisms on the basis of its annotated genome (presence of presumed open reading frame for enzymes that catalyse a given metabolic reaction) in combination with firmly established data from the biochemical literature. 6 archaea, 32 bacteria and 5 eukaryotes Reprinted from: Jeong et al (2000) Nature 407: 651
109
Graph theoretic representation
Nodes are substrates Links are metabolic reactions (with EC enzyme numbers) Reprinted from: Jeong et al (2000) Nature 407: 651
110
Theoretical Network Architectures
The World Wide Web and social networks have a scale-free structure Probability that a node has k links random uniform scale-free heterogeneous Reprinted from: Jeong et al (2000) Nature 407: 651
111
Connectivity distribution
Metabolic networks are scale-free as shown by the distribution of incoming and outgoing links for each substrate. This is a general rule applying to all organisms studied. Archaeglobus fulgidus E. coli C. elegans All 43 Reprinted from: Jeong et al (2000) Nature 407: 651
112
Network diameter Biochemical pathway length in E. coli Average path length (43) Definition: the shortest “pathway”averaged over all pairs of substrates Archae Bacteria Eukarya Unexpectedly, network diameter does not increase with complexity. Therefore interconnectivity grows with the addition of substrates. incoming links outgoing links Reprinted from: Jeong et al (2000) Nature 407: 651
113
Reprinted from: Jeong et al (2000) Nature 407: 651
Hub properties A few hubs dominate the overall connectivity The sequential (“mutations”) removal of the most connected hubs dramatically increases the network diameter until disintegration the metabolic networks seem highly robust in computer simulations (cf. lethal mutation rate observed in vivo) Reprinted from: Jeong et al (2000) Nature 407: 651
114
Reprinted from: Jeong et al (2000) Nature 407: 651
Conclusions The structure of biological networks are far from random Their contemporary topology reflects a long evolutionary process They show a robust response towards internal defects Contrary to other scale-free networks, metabolic ones do not grow in diameter with increasing complexity which may be represent an additional (necessary?) survival and growth advantage Reprinted from: Jeong et al (2000) Nature 407: 651
115
Extension of the concept
Protein-protein interaction networks are also scale-free yeast Y2H data The probability for a gene to be essential increases with the connectedness of the encoded protein 93% of proteins have 5 links or less 21% of their genes are essential 7% of have more than 15 links 62 % of their genes are essential Jeong et al (2001) Nature 411: 41
116
Reprinted from: Jeong et al (2001) Nature 411: 41
117
A long way to go… List of biological components
Genoom Biologie Prof. M. Zabeau A long way to go… List of biological components cells, genes, proteins, metabolites Description of local relationships expression cluster protein-protein interaction molecule trafficking cell-cell crosstalk Whole system architecture Dynamic regulatory mechanisms System behaviour prediction System manipulation, de novo design need more data! System can be organism, can also be defined as specific mechanisms such as “cell cycle” or “root development” Academiejaar
118
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.