Ten Years and Change the MX data archive at ALS 8.3.1.

Slides:



Advertisements
Similar presentations
Credit. Borrowers & Lenders Find Your Match! Whos Your Middle-Man???
Advertisements

NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
HIST 300: Find it! Tips for Locating Primary Sources and Images LIBRARIANS: MW morning class: Pete Ramsey Wed night class:
INTERNSHIP AT INCLIN Kelly Speth November 4, 2013.
Acknowledgements Christine Gee Janet Newman Tom Peat Center for Structure of Membrane Proteins Membrane Protein Expression Center II Center for HIV Accessory.
AVOIDING PLAGIARISM A peer mentor training from University of Texas Librarians.
WHY SANeFORCE.com Prescription for your Success. SERVICES NO Hardware NO Software ONLY SERVICES Prescription for your Success.
Lape’s Laws of Credit and Money THE MOST IMPORTANT INFORMATION YOU WILL EVER LEARN!!!
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
I hope so to do well to get the job. Meaning: I believe Everyday English Expressions.
Chapter 3 Assessing your Potential. What do you want to be?
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
Basic Persuasion Skills in Employee Ownership Corey Rosen National Center for Employee Ownerships.
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
Research Services Introduction to research data management - a physical science case study Slides provided by DaMaRO Project, University of Oxford.
Another Perspective on Authoring an Open Textbook David Lippman Pierce College Ft Steilacoom.
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
CASH AND TRASH Experiences of a university professor at Ludhiana I contribute regularly for gurdwara functions. Once, special Gurpurb celebrations were.
High-speed macromolecular structure determination on a Superbend Beamline J.M. Holton 1, C. Chu 2, K. Corbett 2, J. Erzberger 2, R. Fennel-Fezzie.
Monday Vocab Review This Week’s Vocab:
Cloud Computing Dave Elliman 11/10/2015G53ELC 1. Source: NY Times (6/14/2006) The datacenter is the computer!
Beamline PRT organization Funding Hardware Safety management Control system Scientific productivity.
Globus online Reliable, high-performance file transfer… made easy. XSEDE ECSS Symposium, Dec.12, 2011 Presenter: Steve Tuecke, Deputy Director Computation.
Presentation to Department Heads Cloud Computing Information Systems Division November 5, 2010.
My personality By: Richard Chambers. About me Well I love to draw and read and not pay attention in my English class. I love to write stories about random.
Integrating a Statewide Web Gateway With Digital Collections ______________________ Eric Weig and Beth Kraemer University of Kentucky and KCVL.
Retrospective Testing - How Good Heuristics Really Work Andreas Marx AV-Test.org University of Magdeburg GEGA IT-Solutions GbR.
Secrets of a Market Maker Presented by Andrew Keene.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
PowerPoint File available: ~jamesh/powerpoint/ Oslo_2010.ppt.
Ten Years and Change the MX data archive at ALS
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Web Archiving Service (WAS) Rosalie Lack Data Curation for Practitioners 2012 Workshop.
LISA A. TOBLER, M.S. Introduction to Psychology PS124 Seminar #1.
Data Integration and Management A PDB Perspective.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
Optimizing structure determination How many are we solving? What is the limit? Are we there yet? Why not? What are the biggest problems?
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
WGS Data management course Try-out , Hugo Besemer.
© Nuffield Trust The future of commissioning Dr Judith Smith Director of Policy The Nuffield Trust 8 March 2013.
BY: ANDY Primary Sources / Titanic. My Topic: Titanic The topic I have chosen is the event of the Titanic. Since I don’t really know a lot about the event.
But I Don't Have Access to Your Server, and My Grad Student Left Last Month! Meeting the Challenges of Research Data Curation via Metadata Juliane Schneider.
Sight Words.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
Center for Structures of Membrane Proteins © 2006 Optimizing x-ray structure determination James Holton LBNL/UCSF April 6, 2006.
Seaborg Decommission James M. Craw Computational Systems Group Lead NERSC User Group Meeting September 17, 2007.
New targets since 7/22/05: Tbru015978AAADihydroorotate dehydrogenase Chunks since 6/17/05: Lmaj002537BAIProtein tyrosine phosphatase Lmaj002537BAFProtein.
Acknowledgements Christine Gee Janet Newman Tom Peat Center for Structure of Membrane Proteins Membrane Protein Expression Center II Center for HIV Accessory.
Considering Time in Designing Large-Scale Systems for Scientific Computing Nan-Chen Chen 1 Sarah S. Poon 2 Lavanya Ramakrishnan 2 Cecilia R. Aragon 1,2.
LISA A. TOBLER, M.S. Introduction to Psychology PS124 Seminar #1.
Things That Make You More Likely to Be in an Accident Nevada Traffic Summit Las Vegas – May 25, 2016 James Lynch, FCAS MAAA, Chief Actuary Insurance Information.
Top producing beamlines of the world Structures credited.
Instructor: Natalia Fofanova University of Houston Created by: Eric Adler Section < Click to roar! Eric Adler.
Acknowledgements UCSF LBNL SLAC ALS creator: Tom Alber UC Multicampus Research Programs and Initiatives (MRPI) UCSF Program for Breakthrough Biomedical.
Everything you know about DVDFab DVD copy DVDFab DVD copy is one of the most famous software used for copying and burning the DVD to the blank DVD disc.
By: Antonio Vazquez.  As far as this year goes, there were a lot of struggles that I had this year, I can’t really explain why, they just occurred. 
ALS MRPI Business meeting Overview of the Beamline PRT Structure Scheduling and Allocation Future upgrade plans Financial report.
Afternoon session: The archival problem and infrastructure for solutions Prof John R Helliwell Interactive Publications.
Why are. we not solving more struct tures? James Holton University of California San Francisco and Advanced Light Source Lawrence.
Why are. we not solving more struct tures? James Holton University of California San Francisco and Advanced Light Source Lawrence.
Investing in research, making a difference. Valley of Death: Surviving the journey from idea to product Richard Schifreen, WARF Accelerator Program.
Faculty of Pharmaceutical Medicine PMST – the (ex) trainee’s perspective Dr Phil Ambery Metabolic Medical Director, GSK Pharmaceuticals.
Chance Chance Community Chest JOB SEEKER Community Chest JOB SEEKER
Crystallography images
Graduating from High School
Transitioning VisIt to CMake
Your Name, Elias Catalano, Robert Sublett, Robin Curtis
Digital library for Earth System Education Teaching Boxes
Hydrogen Vehicle Readiness on the Central Coast
Rethinking Junior Statistics
Presentation transcript:

Ten Years and Change the MX data archive at ALS 8.3.1

Acknowledgements ALS creator: Tom Alber PRT head: Jamie Cate Center for Structure of Membrane Proteins Membrane Protein Expression Center II Center for HIV Accessory and Regulatory Complexes W. M. Keck Foundation Plexxikon, Inc. M D Anderson CRC University of California Berkeley University of California San Francisco National Science Foundation University of California Campus-Laboratory Collaboration Grant Henry Wheeler The Advanced Light Source is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the US Department of Energy under contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory.

ALS data collection history terabytes (uncompressed)

ALS data collection history terabytes (uncompressed)

ALS data collection history images x 10 6

DVD data archive: 68 TB

DVD data archive

50 TB

Primary failure mode of DVDs

3000 files remain unrecoverable (~0.1%)

Which data go with which PDB? 260,000 images are called “test” cell: – is within 5 Å and 5° of 16,000 PDBs focusing on PDBs credit ALS with data 44 of these didn’t actually collect data 64 collected data, but no credit

1.images from collected “near” edges 3.find “runs” of >10 images 4.unify multi-wedge sets 5.run labelit & XDS 6.>70% complete? 7.I/σ > 10 8.reduced cell vs PDB 1,604, , to 200+ Which data go with which PDB?

Unit Cell: best R cryst after rigid-body refinement RMS unit cell length deviation (Å) 1hh7 M. TB CSOR 1rb5 myoglobin

MAD/SAD datasets R iso vs PDB deposit best R cryst after rigid-body refinement Published non-isomorphous Unsolved?

Responses to inquiries “I have to find my old note book as I have no idea what that is.” “I have changed jobs a few times since and am really far away from crystallography now.” “Will see what I can find.” “We solved it but never published it. Sorry!”

EGDA Dec 01 19:45: egda46_*1_E#_###.img (1112 images, Se MAD) Dec 02 15:10: egda27_*1_###.img (180, 1A, native?) Dec 02 19:21: egdau1_*1_###.img (427, 8000eV (U?) SAD) Dec 02 20:58: egdau1_*2_###.img (360, 8000eV (U?) SAD) Jun 01 14:07: egda60_*1_###.img (360, Lutetium SAD) “I think that these EGDA data sets are very likely some of xxx’s data sets, he was working on E.coli guanine deaminase, something he brought from yyy. No structure was ever published James, xxx was unable to solve the structure from these data.”

~2.9 Å P R = 0.32 R free = 0.39 PDB ID: ???? E. coli guanine deaminase

Responses to inquiries “Thank you for your effort, it looks like the "elves" have brought an early Christmas present.... I think it is worth depositing of course, it would be a shame to do the difficult part and avoid the easiest. ”

Metadata: can we rely on it? Duquerroy, et al. (1994). "Lobster enolase crystallized by serendipity", Proteins: Struct., Funct., Bioinf. 18, authors were after lobster arginine kinase got enolase instead arginine kinase structure still unknown

compresses 4.2x raw image

compresses 337x just spots

compresses 5x, but only one per dataset! pixel-wise median across dataset

compresses 3.5x deviation from median in “non-spot” areas

compressed ~50x after h264 of non-spot areas

compresses 5.2x difference between raw and compressed

Lossy compression vs R/R free R factor compression ratio

backblaze.com “pod” server backblaze.com offers “unlimited storage” data backup for $5/month.

backblaze offers “unlimited storage” data backup for $5/month.

backblaze does not sell these “pods”, but “protocase.com” does.

Summary saving data could double productivity unit cell is not a good score lossy compression: rallying cry? backup vs archive metadata: what do we really know?

Brief Summary this is a lot of work. who is going to pay for it?