Challenges and Solutions Will Schroeder, co-Founder, President VAC Big Data Consortium Meeting July 31, 2012.

Slides:



Advertisements
Similar presentations
Building Open Science Communities
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
HP Flow CM Professional
EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.
Real-time Collaborative Scientific WebGL Visualization with WebSocket Julien Jomier & Charles Marion Web3D Conference 2012.
DC Inc. Dan Corbin, Inc. 28 River Ridge Lane Cedar Falls, IA Cell: (319) Fax: (319) Photogrammetric Consultant.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
Ultra-Scale Visualization with Open-Source Software Berk Geveci Kitware Inc.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
Data the NIH What is Happening & What is Coming A Conversation Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes.
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
Teula Morgan The Adaptable Repository: Swinburne Online Journals.
Cornell Institute for Digital Collections Digital Technologies and Access At Cornell University Peter B. Hirtle Cornell Institute for Digital Collections.
Slide 1 Visualization of scientific data - Domain-specific applications Mike Walterman, Manager of Graphics Programming, Scientific Computing and Visualization.
CS1020 Introduction to Computers
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
Developing PANDORA Mark Corbould Director, IT Business Systems.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Remote Visualization of Large Datasets with MIDAS & ParaViewWeb Web3D – Paris 2011 Julien Jomier, Kitware
SYNAT - the Polish National Research Content Infrastructure Wojtek Sylwestrzak, ICM Tomasz Rosiek, ICM Tomasz Krassowski, ICM Tartu, Estonia June 27, 2012.
It is helpful to break up the word ‘multimedia’ in order to gain a better understanding of its meaning. “Multi” means more than one e.g. a multi storey.
What is it a scanner? An optical input device that uses light- sensing equipment to capture an image on paper or some other subject. The image is translated.
NA-MIC National Alliance for Medical Image Computing NAMIC-Kit Update Will Schroeder Jim Miller Bill Lorensen.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
XXII International Symposium on Nuclear Electronics & Computing NEC’09 TOWARDS OPEN ACCESS PUBLISHING AT JINR I.A. Filozova, V.V. Korenkov, G. Musulmanbekov.
BISQUE: Enabling Cloud and Grid Powered Image Analysis Ramona Walls iPlant Collaborative
The DSpace Course Module – An introduction to DSpace.
CVS vs SVN Presented by: Anusha Kolla. Concurrent Version Systems(CVS)  System that lets groups of people work simultaneously on groups of files.  Version.
DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
NA-MIC National Alliance for Medical Image Computing Why NITRC Matters to NA-MIC Steve Pieper, PhD.
The TARDIS Framework A Federated Repository Solution For Raw Diffraction Datasets Steve Androulakis, Monash University, Melbourne Australia I2S2 Workshop.
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Highlights, Aims and Architecture Will Schroeder Kitware.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Software Process Stephen R. Aylward Kitware, Inc.
William Schroeder, Ph.D. §, Andy Cedilnik §, Sebastien Barré, Ph.D. §, William Lorensen ‡, James Miller, Ph.D. ‡, Daniel Blezek, Ph.D. ‡ § Kitware Inc.,
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Software Process Stephen R. Aylward Kitware, Inc.
May 2, 2013 An introduction to DSpace. Module 1 – An Introduction By the end of this module, you will … Understand what DSpace is, and what it can be.
1 By: Suman Negi, Technical Officer ‘B’ DESIDOC, DRDO, Delhi Presentation at NACLIN 14 (During 9-11 December 2014, Pondicherry) Design and Development.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
The Insight Journal Luis Ibáñez KITWARE, Inc.. NAMIC and The Insight Journal.
B. Hegner, P. Mato, P. Mendez CERN, PH-SFT Group 1 ST FORUM CERN 28-SEP-2015 THE QUALITY AND TESTING INFRASTRUCTURE OF PH-SFT.
Center for Computational Visualization University of Texas, Austin Visualization and Graphics Research Group University of California, Davis Molecular.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
Slicer 3 Ron Kikinis, Steve Pieper. CTK Workshop Heidelberg, June 29/30, 2009 Slicer Goals  Stable, Usable, Cross Platform, End-User Software for Medical.
NA-MIC National Alliance for Medical Image Computing Kitware, Inc. Core 2 Engineering William J. Schroeder.
Aalto Data Repository Keijo Heljanko and Mikko Hakala
Visualization Programming: “Libraries” and “Toolkits” Class visualization resources CSCI 6361.
Tackling I/O Issues 1 David Race 16 March 2010.
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Introduction Will Schroeder Kitware, Inc.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Data Management Daniel Marcus Washington University.
Enhancements to Galaxy for delivering on NIH Commons
Open-source Scientific Computing and Data Analytics using HDF
Industrial Research and Open Source – Reasons and
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Status and Challenges: January 2017
Introduction to Computers and the Internet
Tim Smith CERN Geneva, Switzerland
#dlbb IU Libraries and the Center for Biological Research Collections
VI-SEEM Data Repository
VI-SEEM Data Repository
EOSCpilot All Hands Meeting 8 March 2018 Pisa
TeraScale Supernova Initiative
$1M a year for 5 years; 7 institutions Active:
Presentation transcript:

Challenges and Solutions Will Schroeder, co-Founder, President VAC Big Data Consortium Meeting July 31, 2012

Thanks

Big Data Architecture Platform Collaboration

Kitware, Inc. Open Source Scientific Computing Software Software Services

Kitware CMake CDash ParaView

Other Kitware Big Data Projects HPC -Simulation BioMedical Point Clouds Text & Documents Web: >8 billion indexed pages Kitware / VTK / Titan Electron Scanning Microscopy Connectome Resolution towards 100,000 2 x 10,000 Whole Slide Imaging / Digital Pathology Resolution at 100,000 2 x hundreds LIDAR Acquisition rates: > 200,000 pts/sec Kitware VTK / PCL / VES 3deling.com nimh.nih.gov Turbulent Flow /kitware ParaView 160,000 Computing Cores Argonne Intrepid

Columbus Large Image Format (CLIF) 2007 & k x 8k tiled image (64 MP) Six cameras with 4k x 2.6k images 8-bit grayscale raw format Frame rate ~ 1.6Hz 15-30cm GSD Duration ~ 2.8 hrs (16117 frames) in 2007; ~1 hr in 2006 Metadata Camera configuration

SCALABLE ARCHITECTURES Data-Centric Computing Client-Server Co-Processing Mobile to Supercomputer Big Data Architecture Platform Collaboration

The Traditional Visualization Workflow is Breaking Down Image from Rob Ross, Argonne National Laboratory Solver Disk Storage Disk Storage Visualization Full Mesh

Small Example Simulation 40 million finite elements simulation File size: 3.2GB per time step 1000 time steps 100 time steps written to disk Visualization ParaView Quad-core Mac Pro with 12 GB memory IO: 240 secs Contour: 25 secs Slice: 7 secs

Issues IO vs. analysis time Reduced time accuracy in post-processing Data movement ORNL Jaguar 2.33 petaflops, 224,526 compute cores

Data-Centric Computing

ParaViewWeb

Co-Processing

Mobile to Supercomputer ParaView Kiwi / VES

PLATFORM Toolkits & Modularization Integration Software Licenses Big Data Architecture Platform Collaboration

Toolkits & Modularization

Integration Module 1 Module 2Module 3Module 2 (Python) Integration Glue

Software Licenses Early Reciprocal Licenses –Requires release of software combined with OS software –Generally discourages commercial collaboration –E.g., GPL Now Permissive Licenses –Few strings attached –Suitable for commercial collaboration –E.g., BSD, Apache, MIT

COLLABORATION Multi-view, Multi-control Test-Driven Development / Software processes Big Data Architecture Platform Collaboration

Multi-View, Multi-Control Collaboration ParaViewWeb

Software Repository Build, Test & Package Community Review Developers & Users

Scalable Architectures Agile, open platforms Robust, test-driven collaboration Summary Big Data Architecture Platform Collaboration

Scientists Publisher Journals Evolution Papers Peer-Review

If it’s not reproducible, it’s not Science Nullius in Verba “take nobody's word for it” Royal Society 1640

Nature (March 2012) –Glenn Begley, former head of cancer research at pharma giant Amgen –Lee M. Ellis, cancer researcher at the University of Texas Failure of Reproducibility Found that more than 90% of papers published in science journals describing "landmark" breakthroughs in preclinical cancer research, are not reproducible, and are thus just plain wrong.