Presentation is loading. Please wait.

Presentation is loading. Please wait.

PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader.

Similar presentations


Presentation on theme: "PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader."— Presentation transcript:

1 PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

2 NIGMS Expectations http://grants.nih.gov/grants/guide/rfa-files/RFA-GM-05-001.html “… a database for deposition of information on experimental outcome data (both successful and unsuccessful). “These data include … cDNA cloning, expression vector construction, protein production and purification, protein biochemical characterizations, crystallization screening, synchrotron and NMR data collection, etc. “The PSI Research Network centers will be required to provide plans for the collection, maintenance, and transfer of experimental results into this central data repository. PepcDB… will contain information on these important results and provide a platform for cross-center data mining to capitalize on the PSI investment

3 Protocols vs Results General protocols are reported by each PSI Center in PepcDB General protocols have been published in the literature by several Centers However, one of the real values of PepcDB lies in the detailed experimental trial results for each target –Which clones were made? (PSI-MR) –Which constructs yield soluble protein? (which don’t?) –What are the fermentation conditions? Purification? –What was the protein yield? The final concentration? The experimental molecular weight? –What conditions gave crystals? How many crystal forms? What was the cryoprotectant? Which conditions led to diffraction data? To the structure?

4 TargetDB/PepcDB Data Mining TargetDB status is informative, but far more useful would be data about –Small scale expression/solubility testing –Large scale purification yield, concentration, oligomeric state –Conditions that yielded diffracting crystals Publications –Overton et al (2008) Bioinformatics 24:901-907. “ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction” (PDB, TargetDB, PepcDB) –Martin-Galiano et al (2008) Proteins 70:1243-1256 “Predicting experimental properties of integral membrane proteins by a naive Bayes approach” (TargetDB) –Bannen et al (2007) J Struct Funct Genomics 8:217-226 “Effect of low-complexity regions on protein structure determination” (TargetDB/PepcDB) –Smialowski et al (2007) Bioinformatics 23:2536-2542 “Protein solubility: sequence based prediction and experimental verification” (TargetDB) –Slabinski et al (2007) Bioinformatics 23:3403-3405 “XtalPred: a web server for prediction of protein crystallizability” (TargetDB) –Nair & Rost (2004) Nucl Acids Res 32:W517-W521 “LOCnet and LOCtarget: sub- cellular localization for structural genomics targets” (TargetDB)

5 Process vs Reporting 0110 SelectedMol biol in progress 140 Fail PCR Cloning failed 170 220230 315 Failed expresn Failed solubility Fermentation on hold 10 Active 365 Purification on hold 685 665655 640620 482 450 Purified; completed to collaborator Purification research unsuccessful Cryst in screening Crystallization admitted 210 Failed transform 270 Clone completed to ferm 310 Fermentation voided 320 Fermentation waiting 370 Purification waiting 390 Purification in progress 430 Purification technical error 440 Purification failed 460 Purification research marginal 470 Purification research successful 645 Cryst in optimization 650 Screening grainy ppt Optimization grainy ppt Optimization microcrystals Optimization crystals 710 720730 Crystal abandoned Crystal examined Crystal waiting collection 810947 Dataset collected 950 Structure deposited Structure ClonedExpressedSoluble Purified CrystallizedDiffr dataIn PDB Selected

6 Need to Consider the Future… Now How much data are we capturing in our databases compared to how much we are reporting? What will happen to Center data after PSI-2? We should ensure that as much as possible of our Center data is publicly accessible in PepcDB

7 Trial Data Reporting by Center CenterExperimental trial details reported to PepcDB JCSGProtein sequence, cloning vector, fermentation media, purification method, crystallization conditions MCSGProtein sequence, cloning vector, expression host, temperature, media NESGCProtein sequence NYSGXRCDNA and protein sequence, construct boundaries, cloning vector, small scale expression/solubility scores, media, MW, large scale media, volume, induction time/temp, pellet weight, harvest date, SeMet Y/N, purification yield, concentration, purity, MW, oligomeric state, start/end dates, mass spec pass/fail, analysis comments, MW, crystallization conditions, protein concentration, temperature, cryo, harvest/collection dates, anomalous scatterer, diffraction resolution

8 PepcDB Trial Schema

9 NYSGXRC SGX_MOLBIO_PCR ### Molecular Biology - PCR #### PCR start date: 03/20/2007 PCR last updated: 04/16/2007 Notebook #: 1358 Page: 13 SGX_MOLBIO_TOPO_TRANSFORM ### Molecular Biology - cloning #### SGX clonename: 10001b2BSt5p1 Vector: pSGX4 (BS) SGX_MOLBIO_EXPR_SOL ### Small scale expression/solubility ### Expression score: HIGH Solubility rating: HIGH Predicted molecular weight (kDa): 44.95 Growth Media (small scale): ZYP-5052 Observed molecular Weight (kDa): 46 Sonication buffer: PLB1 SGX_FERM_ECOLI_ZYP ### Fermentation ### SGX PID: 11732 Growth Media (large scale): ZYP-5052 Total volume (L): 1 Induction time (hr): 21 Induction temp. (C): 22 Pellet weight (g): 19 Harvest date: 05/17/2006 Selenomet: N SGX_PURIF_ECOLI_BACT ### Purification ### SGX PID: 11732 SGX pool: 1 Selenomet: N Start date: 06/21/2006 Yield (mg): 52.3 Final concentration (mg/ml): 52.3 Observed molecular weight (kDa): 33 Notebook #: 1136 Page: 115 End date: 06/23/2006 Purity (%): 98 Oligomeric state: monomer (1 subunit) DNA source? Primers? Host cells? Antibiotic resistance? Purification steps? Buffers?

10 NYSGXRC SGX_MALDI ### Mass Spec - MALDI ### Mass Spec Status: Passed SGX_ESI-MS ### Mass Spec - ESI-MS ### Mass Spec Status: Passed Observed MW: 32528 SGX_XTAL ### Crystallization ### SGX XID: 27611 Tray barcode: N0081969 Temperature: 21 Protein concentration (mg/ml): 26 Well location: G 12 Well conditions: [100mM] 1M Hepes pH 7.5 + [25%] 50% PEG 3350 +[200mM] 1M Magnesium Chloride hexahydrate Cryoprotectant comment: [20%] 80% Glycerol Harvest date: 09/05/2006 Collection date: 09/05/2006 APS resolution: 2.3 Crystal status: D-DATASET COLLECTED Crystal morphology? Space group?

11 Proposed Data Reporting Molecular biology –DNA source, primers, vector, PSI-MR clone ID, Host, antibiotic resistance –Expression and solubility rating (small scale), media, predicted and observed molecular weight Fermentation –Media, volume, induction time, temp, selenoMet? Purification –Purification steps, final buffer, yield, concentration, molecular weight, purity, oligomeric state –Accurate MW if mass spec done Crystallization –Temperature, protein concentration, well conditions, cryoprotectant and resolution, if applicable

12 Alternative mechanism to report experimental data – molecular weight – 32475 – Da Examples –Molecular weight –Isoelectric point –Phosphorylation –Methylation –Element analysis / stoichiometry –etc.

13 Optional tags http://mmcif.pdb.org/sg-data/protprod.html PDB-proposed mmCIF-like tags to describe cloning, expression, purification, crystallization, etc. Examples –_entity_src_gen_pure.protein_concentration –_entity_src_gen_pure.protein_yield –_entity_src_gen_pure.protein_oligomeric_state –_pdbx_buffer_components.name –_pdbx_buffer_components.conc –_exptl_crystal_grow.temp

14 Recommendation NYSGXRC plans to further improve our reporting of trial results in 2008 We encourage all PSI Centers to utilize the PepcDB or tags to report as much experimental trial results as possible in their PepcDB XML updates See associated poster

15 Acknowledgements SGX LIMS development team –Ryan Allis –Chris Hansen –Peter Hillier –Ken Schwinn AECOM - Veena Venkatagiriyappa (Fiser lab) Andrei Kouranov (PDB) LIMS improvements suggested by SGX protein production, crystallization, and beamline staff This work was supported by SGX Pharmaceuticals, Inc., and NIH Grant U54 GM074945


Download ppt "PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader."

Similar presentations


Ads by Google