Presentation is loading. Please wait.

Presentation is loading. Please wait.

All the World’s a Stage Stream Andy Wilson Senior Member of Technical Staff Sandia National Laboratories Albuquerque, New Mexico February.

Similar presentations


Presentation on theme: "All the World’s a Stage Stream Andy Wilson Senior Member of Technical Staff Sandia National Laboratories Albuquerque, New Mexico February."— Presentation transcript:

1 All the World’s a Stage Stream Andy Wilson atwilso@sandia.gov Senior Member of Technical Staff Sandia National Laboratories Albuquerque, New Mexico February 4, 2008 SAND Report 2008-0639P Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

2 Who are we?

3

4 Simulation-Based Design You’re building some artifact. What if you can’t test in the real world? –Cost, time, conditions, treaties, Congress… We use computers for design. Use them to simulate tests, too. –Generate lots of data No constraints on where we place sensors –“All models are wrong. Some are useful.” –Whenever possible, resort to ground truth –Sometimes the simulations are all you have

5 How does it work? Design Object Design Tests Physics Simulations Analyze Results Real-World Testing Refine Object Design

6 How does it work? Design Object Design Tests Physics Simulations Analyze Results Real-World Testing Refine Object Design What questions do the scientists, designers and engineers want to answer? How do we get those answers out of petabyte-sized data sets?

7 Our Example Problem We’re going to build a box that will stand up to anything. Fire, wind, snow, rain, vacuum, the Marianas trench, thermonuclear explosions… and we want to test all of those conditions.

8 The Simulation Results 1000 time-varying finite element simulations –100GB apiece –100TB total data size (nice and manageable) Question #1: What are the differences between two (ten, a hundred, all) of these results? Question #2: What are the interesting parts of the data?

9 Example #1: What’s different? Before we can compute any differences or statistics we have to map the two sets of results into a common data space.

10 How Netezza and the NDN Help foreach element E in CommonReferenceFrame find corresponding element F in SimulationResult interpolate F.data into coordinates of E end Or… SELECT Interpolate(SimResult.data, RefFrame.coords) FROM CommonReferenceFrame RefFrame INNER JOIN SimulationResult SimResult ON ElementsCorrespond(RefFrame.coords, SimResult.coords) = true;

11 Example #2: What’s interesting? There’s too much data for any human to sort through. Can we get the computer to show us the interesting bits? How do you define “interesting”? –Human: “Show me where the welds are broken.” –Computer: “temperature > 50, strain != 0” –Machine learning to the rescue! “Here’s an example of something interesting. Find me other things like it.”

12 Automated Data Discovery Sandia’s AVATAR Tools –User provides several example regions of interest –Tools analyze regions and build decision tree for classifying data –Decision trees are fast way of locating similar regions in same or different datasets –Provides the right tradeoffs for today’s HPC systems

13 WARNING MATHEMATICS AND STATISTICS AHEAD

14 In a perfect world… What classification algorithms would we use if we had unlimited time? –Ideally, compare input vectors to all training vectors and classify it based on the training vectors that are the most similar. k-Nearest Neighbors (kNN) –Find k training vectors with shortest Euclidean Distance to input vector –Select the classification that is the most popular in the k training vectors In practice, kNN is never used –Computationally expensive –Must have all training vectors available when classifying –However, high-quality results and mathematically provable characteristics

15 How Netezza and the NDN help This is just another case of streaming a lot of data past a few evaluation functions that select the most important parts. Hardware parallelism and the streaming paradigm make it practical. SELECT MostPopularClassification(TV.id, EuclideanDistance(TV.data, SD.data)) FROM TrainingVectors TV INNER JOIN SimulationData SD ON PossibleMatch(TV.data, SD.data) = true;

16 The Take-Home Message Many analysis problems fit neatly into a stream model. Supercomputers designed for simulation are often (relatively) bad at I/O. –We cannot use the algorithms we really want. The NPS has unmatched capabilities. It doesn’t just let us do our work faster: through the NDN it lets us do things and ask questions that were previously out of reach.

17 DOE Institute for Advanced Architectures Foster the integrated co-design of architectures and algorithms Commit national laboratory staff and funding the Non-Recurring Engineering costs of promising technology development Deploy prototypes to prove technologies & allow application/algorithm developers early exploration of new architectures IAA is the DOE vehicle for industry, university and national laboratory collaboration in the co-design of architectures and applications in order to close critical gaps between theoretical peak performance and real application performance on future leadership class supercomputers The joint IAA newly launched at Sandia and Oak Ridge national laboratories is charged with laying the groundwork for an exascale computer. Supported by DOE’s NNSA and Office of Science, the institute – a DOE Center of Excellence – is funded in FY08 by congressional mandate at $7.4M.

18 What does this mean for you? New Audience, New Market –Simulation-based design and analysis National labs, universities, engineering firms, auto/aircraft manufacturers, petroleum industry… This is just one possible application! –You’ve built a different flavor of supercomputer. Platform Design: small changes with large impact –Programmability and flexibility –Add SQL constructs as well as user functions? –Hardware floating point –More flexible compiler, C/C++ library –Faster path for loading data

19 Thank You


Download ppt "All the World’s a Stage Stream Andy Wilson Senior Member of Technical Staff Sandia National Laboratories Albuquerque, New Mexico February."

Similar presentations


Ads by Google