Funzioni sconosciute Funzioni presunte Funzioni note A che servono tutte queste proteine ?
What is Proteomics? Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies to display, identify and/or characterize all the proteins in a given cell, tissue or organism (I.e. the proteome).
Dinner discussion: Integrative Bioinformatics & Genomics VU metabolome proteome genome transcriptome physiome Genomic Data Sources Vertical Genomics
Analisi del genoma/proteoma Fino poco tempo fa i ricercatori studiavano l’espressione di un singolo gene Ora è possibile studiare l’espressione di tutti i geni di un organismo simultaneamente (questo può aiutare a capire meglio la funzione dei singoli geni nel contesto cellulare)
Structural Proteomics Elucidating all 3D structures of proteins in the cell This is also called Structural Genomics Finding out what these proteins do This is also called Functional Genomics
The term proteome, coined in A linguistic equivalent to the concept of genome Proteome - complete set of proteins that is expressed, and modified by the entire genome in the lifetime of a cell. Practical: the complement of proteins expressed by a cell at any one time. Proteome – by the dictionary
Proteomics ( Practical ) - the study of the proteome using technologies of large-scale protein separation and identification. Large scale separation : 2DE Liquid Chromatography Identification : MALDI MS Tandem MS/MS FT-MS ….. Proteomics – by the dictionary
Proteomics by Medline maturation of a field 1730 From 220 publications in the previous millennium (‘94-’99) To 21,350 (!!!) publications in this millennium (‘00-’05)
Proteomics –by Google the ultimate truth.. Proteomics 886,000 hits (2004) 4,700,000 hits (2005) Genomics 2,070,000 hits (2004) 16,000,000 hits (2005)
3 Kinds of Proteomics Expressional Proteomics –Electrophoresis, Protein Chips, DNA Chips –Mass Spectrometry, Microsequencing Functional Proteomics –HT Functional Assays, Ligand Chips –Yeast 2-hybrid, Deletion Analysis, Motif Analysis Structural Proteomics –High throughput X-ray Crystallography/Modelling –High throughput NMR Spectroscopy/Modelling Proteomics-wide scale structural determination
Expressional Proteomics 2-D Gel QTOF Mass Spectrometry
Expressional Proteomics Prostate tumor Normal
Structural Proteomics High Throughput protein structure determination
Structural Proteomics: The Goal
MESDAMESETMESSRSMYN AMEISWALTERYALLKINCAL LMEWALLYIPREFERDREVIL MYSELFIMACENTERDIRATV ANDYINTENNESSEEILIKENM RANDDYNAMICSRPADNAPRI MASERADCALCYCLINNDRKI NASEMRPCALTRACTINKAR KICIPCDPKIQDENVSDETAVS WILLWINITALL 3D structure What is Structural Biology? Organism Cell System Dynamics Cell Structures SSBs polymerase Assemblies helicase primase Complexes Sequence Structural Scales
The Protein Fold Universe How big Is It??? 500? 2000? 10000? ? ∞
Why Structural Proteomics? Structure Function Structure Mechanism Structure-based Drug Design Solving the Protein Folding Problem Keeps Structural Biologists Employed
Structural Genomics “These studies should lead to an understanding of structure/function relationships and the ability to obtain structural models of all proteins identified by genomics. This project will require the determination of a large number of protein structures in a high-throughput mode.” From the NIH Request for Proposals for Structure Genomics Centers: “The next step beyond the human genome project”
Structural Genomics/Proteomics Subfields Protein Production Cloning, expression (e.g., cell-based and cell-free methodologies), purification and labeling of proteins Biophysical Characterization/Structure Determination NMR, X-ray crystallography Bioinformatics Algorithms and databases for biophysical data comparison, prediction methods, homology/molecular modeling, structure refinement, in silico screening Rational Drug Design target identification and optimization Protein Chip (data array)
Proteomics – in view of other fields Proteome Proteomics Proteome Proteomics Genome Genomics Genome Genomics Structural proteome Structural proteome Database Application Database Application Molecular evolution Molecular evolution Data Mining Data Mining Chemistry Cell Biology Imaging Biotechnology Nanotechnology Protein Science Biochemistry
Tools
Identification
Mass spectrometry
Assignments
Production and 3D- structure determination
Protein-protein interaction
Systems Biology & Cell Simulation
By the end... Gene Chip Sequence 2D-Gel Bioinformatics
A cell is an organization of millions of molecules Proper communication between these molecules is essential to the normal functioning of the cell To understand communication: *Determine the arrangement of atoms* Organ Tissue Cell Molecule Atoms Atomic Resolution Structural Biology
Determine atomic structure to analyze why molecules interact Atomic Resolution Structural Biology
Anti-tumor activity Duocarmycin The Reward: Understanding Control Shape Atomic interactions
Atomic Structure in Context Molecule Structural Genomics Pathway Structural Proteomics Activity Systems Biology RPA NER BER RR
The Strategy of Atomic Resolution Structural Biology Break down complexity so that the system can be understood at a fundamental level Build up a picture of the whole from the reconstruction of the high resolution pieces Understanding basic governing principles enables prediction, design, control
High-throughput Biological Data Enormous amounts of biological data are being generated by high- throughput capabilities; even more are coming –genomic sequences –gene expression data –mass spec. data –protein-protein interaction –protein structures –......
Structural Genomics Pipeline Genomic Based Target Selection Data Collection Structure Determination Isolation, Expression, Purification,Crystallization PDB Deposition & Release Functional Annotation Publication
Number of released entries Year
History of the PDB 1970s – Community discussions about how to establish an archive of protein structures – Cold Spring Harbor meeting in protein crystallography – PDB established at Brookhaven (October 1971; 7 structures) 1980s – Number of structures increases as technology improves – Community discussions about requiring depositions – IUCr guidelines established – Number of structures deposited increases 1990s – Structural genomics begins – PDB moves to RCSB 2000s – wwPDB formed
Protein structural data explosion Protein Data Bank (PDB): Structures (6 March 2001) x-ray crystallography, 1810 NMR, 278 theoretical models, others...
Policies and Practices for 3D Coordinate Data Structural biology –Release of coordinates upon publication required by most journals worldwide –Deposition and release required by many US funding agencies –Some depositions from pharmaceutical companies Structural genomics –Deposition of coordinates upon completion of refinement – Release US: 6 weeks, International: 6 months
Sequence versus structural data Despite structural genomics efforts, growth of PDB slowed somewhat down in (i.e did not keep up with Dickerson’s formula). Structural genomics initiatives are now in full swing and growth is up again. More than 300 completely sequenced genomes Increasing gap between structural and sequence data
Structural Proteomics: The Motivation Sequences Structures
Protein Structure Initiative Organize and recruit interested structural biologists and structure biology centres from around the world Coordinate target selection Develop new kinds of high throughput techniques Solve, solve, solve, solve….
Structural Proteomics - Status 20 registered centres (~30 organisms) targets have been selected targets have been cloned targets have been expressed targets are soluble 1493 X-ray structures determined 502 NMR structures determined 1743 Structures deposited in PDB
Structural Genomics Basics Target strategy: systematic sampling of protein sequence families to search for unique protein structures Experimental determination of unique protein structures in high throughput operation Computational modeling of structures of sequence family homologs
Protein Structure Initiative (PSI) Long-Range Goal To make the three-dimensional atomic level structures of most proteins easily available from knowledge of their corresponding DNA sequences
Structure provides information on function and will aid in the design of experiments Development of better therapeutic targets from comparisons of protein structures from: –Pathogens vs. hosts –Diseased vs. normal tissues Expected PSI Benefits
Collection of structures will address key biochemical and biophysical problems –Protein folding, prediction, folds, evolution, etc. Benefits to biologists –Technology developments –Structural biology facilities –Availability of reagents and materials –Experimental outcome data on protein production and crystallization PSI Benefit
PSI Pilot Phase Lessons Learned 1.Structural genomics pipelines can be constructed and scaled-up 2.High throughput operation works for many proteins 3.Genomic approach works for structures 4.Bottlenecks remain for some proteins 5.A coordinated, target selection policy must be developed 6.Homology modeling methods need improvement
Table 1 Lessons from structural genomics Lessons 1. It is possible to construct large-scale facilities that can determine the structures of a hundred or more proteins per year. 2. The difficulties at each step of determining a structure of a particular protein can be quantified. 3. Structures from structural genomics can have an important impact on scientific research. 4. Rapid deposition of data in public databases increases the impact and usefulness of the data. 5. Technology development has played a critical role in structural genomics. 6. Validation of technologies is nearly as important as the technologies themselves. 7. Structures from structural genomics are of high quality. 8. International cooperation advances the field and improves data sharing.
Annual Reviews Ad agosto 2008, il numero di strutture depositate dai consorzi di genomica strutturale è 6048, che corrisponde a circa l’11.5% delle strutture presenti nel database PDB.
Structural proteomics is the large scale study of the structural description of proteins and their higher order complexes present in a given cell. It holds special significance since cellular behavior and disease are functions of the interactions between macromolecular complexes involved in cellular biological transactions. Important questions in structural proteomics involve elucidating the structure of these multicomponent assemblies, including their subcomponents and their assembly, and relating their structure to function.