Download presentation
Presentation is loading. Please wait.
Published byEsmond Chambers Modified over 8 years ago
1
Challenges and Solutions Will Schroeder, co-Founder, President VAC Big Data Consortium Meeting July 31, 2012
2
Thanks
3
Big Data Architecture Platform Collaboration
4
Kitware, Inc. Open Source Scientific Computing Software Software Services
5
Kitware CMake CDash ParaView
7
Other Kitware Big Data Projects HPC -Simulation BioMedical Point Clouds Text & Documents Web: >8 billion indexed pages Kitware / VTK / Titan Electron Scanning Microscopy Connectome Resolution towards 100,000 2 x 10,000 Whole Slide Imaging / Digital Pathology Resolution at 100,000 2 x hundreds LIDAR Acquisition rates: > 200,000 pts/sec Kitware VTK / PCL / VES 3deling.com nimh.nih.gov Turbulent Flow /kitware ParaView 160,000 Computing Cores Argonne Intrepid
8
Columbus Large Image Format (CLIF) 2007 & 2006 315 204 8k x 8k tiled image (64 MP) Six cameras with 4k x 2.6k images 8-bit grayscale raw format Frame rate ~ 1.6Hz 15-30cm GSD Duration ~ 2.8 hrs (16117 frames) in 2007; ~1 hr in 2006 Metadata Camera configuration
9
SCALABLE ARCHITECTURES Data-Centric Computing Client-Server Co-Processing Mobile to Supercomputer Big Data Architecture Platform Collaboration
11
The Traditional Visualization Workflow is Breaking Down Image from Rob Ross, Argonne National Laboratory Solver Disk Storage Disk Storage Visualization Full Mesh
12
Small Example Simulation 40 million finite elements simulation File size: 3.2GB per time step 1000 time steps 100 time steps written to disk Visualization ParaView Quad-core Mac Pro with 12 GB memory IO: 240 secs Contour: 25 secs Slice: 7 secs
13
Issues IO vs. analysis time Reduced time accuracy in post-processing Data movement ORNL Jaguar 2.33 petaflops, 224,526 compute cores
14
Data-Centric Computing
15
ParaViewWeb
16
Co-Processing
17
Mobile to Supercomputer ParaView Kiwi / VES
18
PLATFORM Toolkits & Modularization Integration Software Licenses Big Data Architecture Platform Collaboration
19
Toolkits & Modularization
20
Integration Module 1 Module 2Module 3Module 2 (Python) Integration Glue
21
Software Licenses Early Reciprocal Licenses –Requires release of software combined with OS software –Generally discourages commercial collaboration –E.g., GPL Now Permissive Licenses –Few strings attached –Suitable for commercial collaboration –E.g., BSD, Apache, MIT
22
COLLABORATION Multi-view, Multi-control Test-Driven Development / Software processes Big Data Architecture Platform Collaboration
23
Multi-View, Multi-Control Collaboration ParaViewWeb
24
Software Repository Build, Test & Package Community Review Developers & Users
26
Scalable Architectures Agile, open platforms Robust, test-driven collaboration Summary Big Data Architecture Platform Collaboration
28
Scientists Publisher Journals Evolution Papers Peer-Review
29
If it’s not reproducible, it’s not Science Nullius in Verba “take nobody's word for it” Royal Society 1640
30
Nature (March 2012) –Glenn Begley, former head of cancer research at pharma giant Amgen –Lee M. Ellis, cancer researcher at the University of Texas Failure of Reproducibility Found that more than 90% of papers published in science journals describing "landmark" breakthroughs in preclinical cancer research, are not reproducible, and are thus just plain wrong.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.