Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
Matatiele Refilwe Rantlha Legobole. Table of content Introduction Concepts of Cloud Computing How Google Drive fit in the cloud Benefits and Limitations.
Large Scale Computing Systems
The Next I.T. Tsunami Paul A. Strassmann. Copyright © 2005, Paul A. Strassmann - IP4IT - 11/15/05 2 Perspective Months  Weeks.
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
Symantec De-Duplication Solutions Complete Protection for your Information Driven Enterprise Richard Hobkirk Sr. Pre-Sales Consultant.
Mgt 240 Lecture Exam Review February 1, Homework Three Due Friday 2/4 at 5pm Due Friday 2/4 at 5pm Any questions? Any questions? Posted on course.
Implementation Review1 Moving Archive Data to the EMC Storage Array March 14, 2003 Faith Abney.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Panel Summary Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University XLDB 23-October-07.
Engineering the Cloud Andrew McCombs March 10th, 2011.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Electronic Transmission of Very- Long Baseline Interferometry Data National Internet2 day, March 18, 2004 David LapsleyAlan Whitney MIT Haystack Observatory,
What do researchers do with IT?What do they want from ITS? Heidi Fraser-Krauss University of York.
Terabyte IDE RAID-5 Disk Arrays David A. Sanders, Lucien M. Cremaldi, Vance Eschenburg, Romulus Godang, Christopher N. Lawrence, Chris Riley, and Donald.
Shifting Power: A New Information Infrastructure Bonnie Lawlor ICSTI January 15, 2004.
© 2010 Koninklijke Bibliotheek – National Library of the Netherlands Open Access: Present Pitfalls and Future Scenarios Bas Savenije, Director General.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
E-VLBI over TransPAC Masaki HirabaruDavid LapsleyYasuhiro KoyamaAlan Whitney Communications Research Laboratory, Japan MIT Haystack Observatory, USA Communications.
Presentation to Department Heads Cloud Computing Information Systems Division November 5, 2010.
Government Information Preservation Working Group Highlights of Digital Preservation Survey December 16 th 2003 Oliver Slattery Information Access Division.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
DuraCloud Enabling services for managing data in the cloud Michele Kimpton, CBO DuraSpace Bill Branan, Senior Developer DuraSpace.
NCAA Archive: Then & Now Presented by: Nate Flannery - Director of Championships and Alliances, Digital and Social Media, NCAA Bret Wilhoite – VP of Sports.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
32. 2 “The Obama Administration is committed to the proposition that citizens deserve easy access to the results of scientific research their tax dollars.
 Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). 
CS246 Data & File Structures Lecture 1 Introduction to File Systems Instructor: Li Ma Office: NBC 126 Phone: (713)
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
Storing data on your computer & network. Storage devices Hardware Hard drive Flash drive Tape File server (HD) SAN NAS Software System software Windows.
Distributed Architectures for Medical Systems Andrew A. Kitchen Computer Integrated Surgery 8 March 2001.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
The Transition from Traditional to Internet-Based Publishing Dr. ZHOU,Huaibei Scientific Research Publishing November 2015.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
Experiences of a Earth Science Data User Confessions of a Data Hoarder Rob Carver, The Weather Company.
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.
Peck: Transparent Distributed Backup Using Chirp Graduate Operating Systems, Fall 2005 Matthew Van Antwerp December 15, 2005.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Shatskaya M, Abramov A., Fedorov N., Kostenko V., Likhachev S., Seliverstov S., Sichev D. Astro Space Center, P. Lebedev Physical Institute, RAS, Moscow,
What is YOUR Data Worth???. “Just because you're paranoid doesn't mean they aren't after you.” Joseph Heller, Catch-22.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012.
MERANTI Caused More Than 1.5 B$ Damage
OpenStack Swift Where do big data go? Eben van Zyl
Evolution Happens Fast…Stay Ahead of the Curve
Jenny Pange University of Ioannina
Andrew McCombs March 10th, 2011
Tools and Services Workshop
Operating Systems Introduction ENCE 360.
Technology Education THE PERSONAL COMPUTER (PC) HARDWARE PART 1
Data uploading and sharing with CyVerse
Backing Up Files File Maintenance Unit.
SCALABLE OPEN ACCESS Hussein Suleman
Peer to Peer Information Retrieval
Cloudstor: Glamming up the ecosystem
GCSE OCR 3 Memory Computer Science J276 Unit 1
Brief Introduction to Hadoop
Proposal for a DØ Remote Analysis Model (DØRAM)
Objectives Describe the difference between RAM and ROM
Presentation transcript:

Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Organize the world’s information and make it universally accessible and useful.

Motivating Problem What if a piece of information is too large to efficiently transmit across the Internet as it exists today?

“Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” - Andrew Tanenbaum (?)

Large Dataset Archive Move data by shipping hard drives Centralized repository stored on Google’s infrastructure Accepting data from all disciplines, but it must be open and free Ulimate goal: Promiscuous distribution

Nice Properties of Physical HD Shipment Uses commodity technologies: Linux, SATA, ext2 High throughput Trivially scalable Cheap and easy: $2400 for 3T Rapidly getting cheaper

Real-World Throughputs

The Cost of 1GB of Storage 1986: $100, : $10, : $1, : $ : $ : $1 Today: About 40¢ Creative Computing - February, 1980

Not-So-Nice Properties of Physical HD Shipment Physical objects break, get stolen, occasionally explode HD copying bottleneck Customs/duties make international shipments more complicated

The Big Question What happens when every astronomer has the complete Hubble Legacy Archive on the computer in their office?

The Big Question What happens when every high-school student has the complete Hubble Legacy Archive on the computer in their bedroom?