Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007
Organize the world’s information and make it universally accessible and useful.
Motivating Problem What if a piece of information is too large to efficiently transmit across the Internet as it exists today?
“Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” - Andrew Tanenbaum (?)
Large Dataset Archive Move data by shipping hard drives Centralized repository stored on Google’s infrastructure Accepting data from all disciplines, but it must be open and free Ulimate goal: Promiscuous distribution
Nice Properties of Physical HD Shipment Uses commodity technologies: Linux, SATA, ext2 High throughput Trivially scalable Cheap and easy: $2400 for 3T Rapidly getting cheaper
Real-World Throughputs
The Cost of 1GB of Storage 1986: $100, : $10, : $1, : $ : $ : $1 Today: About 40¢ Creative Computing - February, 1980
Not-So-Nice Properties of Physical HD Shipment Physical objects break, get stolen, occasionally explode HD copying bottleneck Customs/duties make international shipments more complicated
The Big Question What happens when every astronomer has the complete Hubble Legacy Archive on the computer in their office?
The Big Question What happens when every high-school student has the complete Hubble Legacy Archive on the computer in their bedroom?