Data Intensive Computing at Sandia September 15, 2010 Andy Wilson Senior Member of Technical Staff Data Analysis and Visualization Sandia National Laboratories.

Slides:



Advertisements
Similar presentations
Parallel Visualization Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.
Advertisements

Distributed Data Processing
anywhere and everywhere. omnipresent A sensor network is an infrastructure comprised of sensing (measuring), computing, and communication elements.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
SALSA HPC Group School of Informatics and Computing Indiana University.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
1 Approved for unlimited release as SAND C Verification Practices for Code Development Teams Greg Weirs Computational Shock and Multiphysics.
Unstructured Data Partitioning for Large Scale Visualization CSCAPES Workshop June, 2008 Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Desktop Computing Strategic Project Sandia National Labs May 2, 2009 Jeremy Allison Andy Ambabo James Mcdonald Sandia is a multiprogram laboratory operated.
Exploring Communication Options with Adaptive Mesh Refinement Courtenay T. Vaughan, and Richard F. Barrett Sandia National Laboratories SIAM Computational.
Massive Graph Visualization: LDRD Final Report Sandia National Laboratories Sand Printed October 2007.
Packard BioScience. Packard BioScience What is ArrayInformatics?
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System architectures Updated: November 2014.
Data-centric computing with Netezza Architecture DISC reading group September 24, 2007.
M.A.Doman Model for enabling the delivery of computing as a SERVICE.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.
SAINT2002 Towards Next Generation January 31, 2002 Ly Sauer Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Erik P. DeBenedictis Sandia National Laboratories October 24-27, 2005 Workshop on the Frontiers of Extreme Computing Sandia is a multiprogram laboratory.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Tom Furlani Director, Center for Computational Research SUNY Buffalo Metrics for HPC September 30, 2010.
LAMMPS Users’ Workshop
STK (Sierra Toolkit) Update Trilinos User Group meetings, 2014 R&A: SAND PE Sandia National Laboratories is a multi-program laboratory operated.
Erik P. DeBenedictis Sandia National Laboratories October 27, 2005 Workshop on the Frontiers of Extreme Computing Overall Outbrief Sandia is a multiprogram.
CCR = Connectivity Residue Ratio = Pr. [ node pair connected by an edge are together in a common page on computer disk drive.] “U of M Scientists were.
Trilinos Strategic (and Tactical) Planning Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United.
Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Ribbon UI SharePoint Workspace SharePoint Mobile Office Client and Office Web App Integration Standards Support Tagging, Tag Cloud, Ratings Social Bookmarking.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Clusters Rule! (SMPs DRUEL!) David R. White Sandia National Labs Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
 Has computer technology knowledge and programming expertise  Understands business problems  Uses logical methods for solving problems  Has fundamental.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Automated File Server Disk Quota Management May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department Sandia is.
Virtual Directory Services and Directory Synchronization May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
VisIt Project Overview
GWE Core Grid Wizard Enterprise (
The Client/Server Database Environment
Ray-Cast Rendering in VTK-m
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
Energy-Efficient Storage Systems
Trilinos I/O Support (TRIOS)
Presentation transcript:

Data Intensive Computing at Sandia September 15, 2010 Andy Wilson Senior Member of Technical Staff Data Analysis and Visualization Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

The Question What is Data-Intensive Computing?

My Answer What is Data-Intensive Computing? Parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times Usually distilling compact results from massive data

Outline What is Data-Intensive Computing? Data-Intensive Computing at Sandia –Physics –Informatics –Architectures Into the Future

Spaghetti Plot (2)

Traditional Visualization Workflow Solver Disk Storage Disk Storage Visualization Full Mesh

Traditional In-Situ Visualization Solver Disk Storage Disk Storage Visualization Images Solver Disk Storage Disk Storage Visualization Full Mesh

Coprocessing Solver Disk Storage Disk Storage Visualization Images Solver Disk Storage Disk Storage Visualization Full Mesh Solver Disk Storage Disk Storage Features & Statistics Features & Statistics Salient Data Visualization

Collision Movie

Outline What is Data-Intensive Computing? Data-Intensive Computing at Sandia –Physics –Informatics –Architectures Into the Future

Slide 3/20 Community Detection in Networks Find many small groups of vertices and/or edges –O(n) communities –overlaps may be allowed Hundreds of papers in physics and computer science Lancichinetti, Fortunato, Radicchi 2008

Slide 2/20 Analysis of Massive Graphs Finding communities: a kernel of social network analysis “Dunber’s number” from sociology: there is a size limit (~150) on stable social group size (from neolithic farming village to academic sub-discipline) Twitter social network (|V|≈200M) [Akshay Java, 2007]

Slide 19/20 Collapsed Dendrograms and Statistical Confidence: wCNM The wCNM partitioning is much deeper, resolving smaller communities The statistically significant variation is visually close, but does not reproduce ground truth as well Image credit: Titan The (much better) wCNM solution also has a statistically significant variation.

LSA and LDA from 5 miles up Slide 15 of 18 Image credit: Dave Robinson (LDA)

LSA/LDA: Increasing Data Size, Single Processor Straight Line = Linear Scaling, Lower = Faster Slide 16 of XX Slide 16 of 18

LSA/LDA: Weak Scaling (Bigger Problem, Same Time) Flat Lines = Perfect Scaling Slide 17 of XX Slide 17 of 18

Outline What is Data-Intensive Computing? Data-Intensive Computing at Sandia –Physics –Informatics –Architectures Into the Future

NGC System Diagram ArchitecturesAlgorithmsWeb ServicesApplications (Clients) Titan, browser Trilinos Algebraic Methods Clustering, Ranking, High Dimensional Mapping MTGL Graph Methods Subgraph searches, Connection sg’s, Shortest Path, etc. Specialized Distributed Data Operations Titan Analysis Pipelines, Capability Integration, Data Access, Lightweight analysis Titan Analysis Pipelines, Capability Integration, Data Access, Lightweight analysis “This project seeks to bring these two strengths – a solid reputation for excellence in computing, and our niche expertise in specific classes of intelligence analysis – to bear on a thorny problem: developing advanced informatics capabilities that are both usable and useful to analysts who are drowning in data.” NGC project proposal Highly optimizedIterative, flexible Data

SQL Service Enables Remote Access to Data Warehouse Appliances (DWA) SQL Service* –Provides “bridge” between parallel apps and external DWA –Runs on Red Storm network nodes –Titan applications communicate with service through Portals –External resources (Netezza) communicate through standard interfaces (e.g. ODBC over TCP/IP) The SQL service enables an HPC application to access a remote DWA Service Nodes (GUI and Database Services) Service Nodes (GUI and Database Services) High-Speed Network (Portals) High-Speed Network (Portals) Compute Nodes (Titan Analysis Code) Tech Area 1AnywhereCSRI Netezza LexisNexis Other ODBC DWA Other ODBC DWA AnalystHPC System (Red Storm)DWA TCP/IP SQL * Results of SQL access from parallel statistics code presented at CUG’2009. Additional Modifications for Multilingual –Tokenization support on Netezza (goal is to count unique words) –Developed a custom UTF-8 words splitter for SPU (snippet processing unit) –Allows parallel tokenization and counting at storage device Slide 20 of 14

Outline What is Data-Intensive Computing? Data-Intensive Computing at Sandia –Physics –Informatics –Architectures Into the Future

I don’t care about flops anymore. I care about mops. I want to send more complex requests to the storage system. There is no one perfect architecture.