INDIANAUNIVERSITYINDIANAUNIVERSITY The Data Capacitor Digital Library Brown Bag February 22, 2006 Stephen Simms Data Capacitor Project Manager Research.

Slides:



Advertisements
Similar presentations
Common Instrument Middleware Architecture and Federation of Instrument Resources for X-ray Crystallography Rick McMullen Indiana University.
Advertisements

The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Experiment Workflow Pipelines at APS: Message Queuing and HDF5 Claude Saunders, Nicholas Schwarz, John Hammonds Software Services Group Advanced Photon.
The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
File Management Chapter 3
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
Introduction The Open Science Grid (OSG) is a consortium of more than 100 institutions including universities, national laboratories, and computing centers.
* Making Timely and Accurate Patient Care Decisions through Vista Imaging.
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
The Data Management Requirements at SNS Shelly Ren & Steve Miller Scientific Computing Group, SNS-ORNL December 11, 2006.
The Role of DANSE at SNS Steve Miller Scientific Computing Group Leader January 22, 2007.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Introduction to client/server architecture
Mint-user MINT Technical Overview October 8 th, 2010.
Multimedia Application Design. Introduction Electronically stored information, is useful only if it is well managed and easily accessed. Not only it is.
Data Acquisition at the NSLS II Leo Dalesio, (NSLS II control group) Oct 22, 2014 (not 2010)
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network Stephen Simms Manager, High Performance File Systems.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Only Connect: Better Use of Library, Publisher and End-User Metadata in a Networked World 31 st International Supply Chain Seminar Tuesday 13 th October,
Presentation Topics CS 4763 Multimedia Systems. Multimedia When different people mention the term multimedia, they often have quite different, or even.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Software Architecture
Logistics and Systems Rabby Q. Lavilles. Supply chain is a system of organizations, people, technology, activities, information and resources involved.
Ceph Storage in OpenStack Part 2 openstack-ch,
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Chapter 9 Section 2 : Storage Networking Technologies and Virtualization.
Semantics-based Distributed I/O with the ParaMEDIC Framework P. Balaji, W. Feng, H. Lin Math. and Computer Science, Argonne National Laboratory Computer.
2004/12/02Slide Number 1 of 15 Exposure Time Calculator (ETC) as a Web Service Donald McLean 2004 Technology Open House.
Igor Gaponenko ( On behalf of LCLS / PCDS ).  An integral part of the LCLS Computing System  Provides:  Mid-term (1 year) storage for experimental.
Moving Large Amounts of Data Rob Schuler University of Southern California.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
A Portal-based interface for compositing multiple streams of experimental data to video Gilead Kutnick Donald McMullen Kianosh Huffman Yu Ma Pervasive.
“Indiana University should continue to execute and accelerate an incremental and extensible strategy that enhances its overall storage infrastructure from.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Research and Educational Networking and Cyberinfrastructure Russ Hobby, Internet2 Dan Updegrove, NLR University of Kentucky CI Days 22 February 2010.
Crystal25 Hunter Valley, Australia, 11 April 2007 Crystal25 Hunter Valley, Australia, 11 April 2007 JAINIS (JCU and Indiana Instrument Services): A Grid.
The Data Capacitor and Digital Libraries at IU Jon Dunn Associate Director for Technology IU Digital Library Program February 22, 2006.
OSG Storage Architectures Tuesday Afternoon Brian Bockelman, OSG Staff University of Nebraska-Lincoln.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Crystal Grid Reciprocal Net XPort Crystal Grid Framework Chemical Informatics and Cyberinfrastructure Collaboratory The Crystal Grid A joint project.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
Ethernet. Ethernet  Ethernet is the standard communications protocol embedded in software and hardware devices, intended for building a local area network.
1 Title: Introduction to Computer Instructor: I LTAF M EHDI.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Data Evolution: 101. Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
XMC CAT Among the Stars Yiming Sun Beth Plale Scott Jensen SC11.
1 This Changes Everything: Accelerating Scientific Discovery through High Performance Digital Infrastructure CANARIE’s Research Software.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
U.S. ATLAS Grid Production Experience
Introduction to client/server architecture
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
The New Internet2 Network: Expected Uses and Application Communities
Gordon Erlebacher Florida State University
Data Management Components for a Research Data Archive
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

INDIANAUNIVERSITYINDIANAUNIVERSITY The Data Capacitor Digital Library Brown Bag February 22, 2006 Stephen Simms Data Capacitor Project Manager Research and Academic Computing University Information Technology Services

INDIANAUNIVERSITYINDIANAUNIVERSITY Imagine You have been given hours of rare video –Jackson Pollack singing Jackson Browne –Jack Sprat eating fat –Ethel Merman singing Black Sabbath You feel compelled to: –Digitize –Examine –Edit –Catalog –Archive

INDIANAUNIVERSITYINDIANAUNIVERSITY Wouldn’t It Be Nice... To have a temporary filespace fast enough to accept multiple streams of uncompressed video. To have a temporary filespace large enough to let you edit several 250 GB files at once. To have a temporary filespace which has a fast path to archival storage To have a system that could help you catalog and keep track of your extraordinary gift.

INDIANAUNIVERSITYINDIANAUNIVERSITY Data In the 21 st Century everything is data –Nutritional data –Musical data –Patient data Raw material for –Research in the Arts and Humanities –Scientific advancement –Technological development

INDIANAUNIVERSITYINDIANAUNIVERSITY Better Technology = More Data

INDIANAUNIVERSITYINDIANAUNIVERSITY Dr. Caty Pilachowski, Astronomer at IU –Investigating the evolution of stars –Project scientist for design of WIYN telescope –ODI promises 1 billion pixels/image by 2009 LSST – Large Synoptic Survey Telescope –In the planning stages –Promise of 3 billion pixels/image by 2012 Better Telescopes One Degree Imager 32k x 32k CCD

INDIANAUNIVERSITYINDIANAUNIVERSITY Better Televisions Ultra High Definition Television (UHDTV) –Japanese demonstration November 2005 –16 times more pixels than HDTV –Puts HD-DVD vs Blu-Ray debate in perspective

INDIANAUNIVERSITYINDIANAUNIVERSITY Data Challenges Where do I put it? –Storing the “data firehose” Instrument Simulation Workflow How can I move it? –Pushing and pulling data archive storage computational and visualization resources some other location across the network Can I find it again? –Insure data’s utility through clear organization documented provenance provide a means for data discovery and make the data available

INDIANAUNIVERSITYINDIANAUNIVERSITY The Data Capacitor Project will provide: –Hundreds of Terabytes of fast short term storage –Servers for rapid data transfer –Servers to provide web services The Data Capacitor Last October, IU received $1.7 Million from the NSF to address data challenges in the form of a Major Research Instrumentation grant.

INDIANAUNIVERSITYINDIANAUNIVERSITY Why Capacitor? Data Capacitor –Provides transient storage of data –Absorbs and evens out peaks in data flow –Provides fast discharge of data Capacitor –Provides transient storage of electrons –Absorbs and evens out peaks in electron flow –Provides fast discharge of electrons

INDIANAUNIVERSITYINDIANAUNIVERSITY Data in the network as uncompressible fluid –Network saturated? –Workflow clogged? Lots of fast storage as temporary reservoir –Quickly capture instrument or simulation data –Stage workflow data waiting for resources –Compress data before transfer to archive storage The Data Firehose

INDIANAUNIVERSITYINDIANAUNIVERSITY Where you put it: Lustre Scalable Object Storage ClientOSS MDS metadata server object storage server

INDIANAUNIVERSITYINDIANAUNIVERSITY How you move it: Data Transfer Servers Not everyone will mount the Data Capacitor natively. Dedicated data servers will be available to local and remote users and will help ease the data transfer burden usually shouldered by: –Cluster head nodes –compute nodes –Gatekeepers We intend to support –NFS –CIFS –GridFTP –Striped GridFTP –Possibly pCIFS

INDIANAUNIVERSITYINDIANAUNIVERSITY Now? How about now? Writes into capacitor –12.5 GB/sec aggregate write 8-way Striped GridFTP –960 MB/sec theoretical Dedicated high speed feeds –20 GigE from Indianapolis –10 GigE from Chemistry –10 GigE from Computer Science –10 GigE from Astronomy

INDIANAUNIVERSITYINDIANAUNIVERSITY More Than Storage I *know* that file is in here somewhere...

INDIANAUNIVERSITYINDIANAUNIVERSITY Redundant servers running web servicesRedundant servers running web services -Data pre-processing -Metadata acquisition -Data discovery Data and Metadata Management

INDIANAUNIVERSITYINDIANAUNIVERSITY Current Status Data Capacitor Prototype installed and deployed Request for vendor proposals issued. Vendor responses received and evaluated. Vendor interviews start February 24 th. System to be installed and operational this Spring Work has already begun on local research projects: –Dr. David Clemmer - Proteomics –Dr. Chuck Horowitz - Plasma Pasta –Dr. John Huffman - X-ray Crystallography Work has already begun on the following national projects –Open Science Grid - Storage Element –TeraGrid - Lustre over WAN project

INDIANAUNIVERSITYINDIANAUNIVERSITY Maximize Usefulness Through One Stop Shopping Provide researchers with workflow opportunities that were previously difficult or impossible. Fast path to archive storage Shared filesystem for –Computation –Visualization –Instruments –Possibly desktops

INDIANAUNIVERSITYINDIANAUNIVERSITY

INDIANAUNIVERSITYINDIANAUNIVERSITY Dr. Chuck Horowitz IU Physicist –Interested in the behavior of neutron stars –Studying the behavior of nucleons under extreme pressure –Collaborating with UITS programmer Don Berry –Utilizing unique MDGRAPE-2 hardware

INDIANAUNIVERSITYINDIANAUNIVERSITY Plasma Pasta Workflow Particle interaction is simulated –Using specialized MDGRAPE-2 hardware Post processing –Creates VTK frames Visualization system –Ingests frames –Displays as movie display device compute resource data capacitor viz resource

INDIANAUNIVERSITYINDIANAUNIVERSITY Visualization of Plasma Pasta

INDIANAUNIVERSITYINDIANAUNIVERSITY Dr. David Clemmer Analytical Chemist at IU –Seeking to sequence biomolocules –Builds his own instruments –In 10 months “one run” has more than doubled in the amount of data produced Ion mobility/time-of-flight instrument

INDIANAUNIVERSITYINDIANAUNIVERSITY Proteomics Workflow Data comes off of Ion Mobility / TOF instrument Data undergoes processing Database search using mascot compute resource data capacitor mascot database instrument

INDIANAUNIVERSITYINDIANAUNIVERSITY Dr. John Huffman –IU Chemist –Interested in creating large shared data repository –Interested in keeping metadata to “replay” an experiment Using Obsidian Architecture and Data Manager –Developed by Dr. Randy Bramley and Yu “Marie” Ma –Obsidian data architecture also in use: astronomy clinical radiation therapy bioinformatics CIMA and X-ray Crystallography

INDIANAUNIVERSITYINDIANAUNIVERSITY

INDIANAUNIVERSITYINDIANAUNIVERSITY Demonstration

INDIANAUNIVERSITYINDIANAUNIVERSITY Acknowledgements Many Thanks to Rick McMullen, John C. Huffman, John N. Huffman, Kia Huffman, Manny Plasencia Research and Technical Services Distributed Storage Support Group

INDIANAUNIVERSITYINDIANAUNIVERSITY Photo Acknowledgements Many thanks to the following Creative Commons licensors at Avlxyz davesag GFR homemade Jeffrey Gelens Joseph Robertson PPDIGITAL Toni V

INDIANAUNIVERSITYINDIANAUNIVERSITY Thank you Questions?