Download presentation
Presentation is loading. Please wait.
Published byLaureen Frederica Conley Modified over 9 years ago
1
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Combinatorial Scientific Computing: A View to the Future Bruce Hendrickson Senior Manager for Math & Computer Science Sandia National Laboratories, Albuquerque, NM University of New Mexico, Computer Science Dept.
2
Combinatorial Scientific Computing The development, application and analysis of combinatorial algorithms to enable scientific and engineering computations Highlighted areas from a survey talk I composed in 2003 –Sparse matrices (direct & iterative methods) –Optimization & derivatives –Parallel computing –Mesh generation –Statistical physics –Chemistry –Biology
3
A Brief History of CSC Grew out of series of minisymposia at SIAM meetings –Deeper origins in Sparse direct methods community (1950s and onward) Statistical physics – graphs and Ising models (1940s & 1950s) Chemical classification (1800s, Cayley) –Recognition of common esthetic, techniques and goals among researchers who were far apart in traditional scientific taxonomy Name selected in 2002 –After lengthy email discussion among ~ 30 people. –Now, ~3000 hits for “combinatorial scientific computing” on Google.
4
Previous Milestones This is the 4 th major CSC workshop –SIAM ’04 (with Parallel Processing) Organizers J. Gilbert, B. Hendrickson, A. Pothen, H. Simon, S. Toledo –CERFACS ’05 –SIAM ’07 (with Computational Science & Engineering) –Coming soon: SIAM ’09 (with Applied Linear Algebra) SIAM ’11 (with Optimization) (?) Special issue of ETNA in 2004 Importance recognized by scientific community and funding agencies
5
Invited Speakers from Earlier CSC Workshops Richard Brualdi (combinatorial matrix theory) Dan Gusfield (computational biology) Shang-Hua Teng (smoothed analysis of algorithms) Stan Eisenstat (sparse direct methods) Dan Halperin (geometric algorithms) Denis Trystram (parallel scheduling) Iain Duff (sparse direct methods) Phil Duxbury (statistical physics)
6
Outline A look back: –A brief history of a brief history A look ahead: –New application opportunities: data-centric computing Graph models of information retrieval Emerging science of complex networks –Architectural revolution: challenges and promise Challenges of near-future machines Potential architectures for discrete problems Conclusions
7
Data-Centric Computing Many science disciplines generate huge data sets –Biology, astronomy, high-energy physics, environmental science, social sciences (internet data), etc. Important scientific knowledge lurks within this data What abstractions and algorithms are needed? Claim: –Combinatorial algorithms have an important role to play –“Combinatorial problems generated by challenges in data mining and related topics are now central to computational science.” [I. Beichl & F. Sullivan, 2008]
8
Example 1: Information Retrieval Consider a document corpus –Each document is a “bag of words” –Represent as non-negative term/document matrix –A(i,j) encodes frequency of term i in document j –A set of terms in a query can be thought of as a vector q –Large entries in A T q identify good matches for retrieval t terms d documents
9
Latent Semantic Analysis LSA uses truncated SVD for dimension reduction –A ≈ U k k V k T Retrieval query now becomes –A T q ≈ V k k U k T q Widely used idea to reduce noise and reduce query expense –[Deerwester, et al., 1990] Basic idea has many applications –Image recognition, machine translation, pattern recognition, etc.
10
Graph Based Alternative View the term-document matrix as a bipartite graph –Terms and documents have weighted links if they are related Embed the graph in a low dimensional space using (for example) Laplacian eigenvectors Given a query vector, map it to same space and look for nearby documents –Fiedler retrieval [H., 2007] Algebraically, this involves low eigenvectors of the matrix L= Note that LSA involves low eigenvectors of
11
Advantages of Graph Representation Terms & Documents live in same space –Principled method for adding doc-doc or term-term similarities E.g. former from dictionary, latter from citation analysis or hyperlinks Unified text and link analysis Supports more complex queries –“similar to these documents and these terms” Supports extensions to more classes of objects. –E.g., instead of just term-document, could do term-document-author.
12
Example II: Network Science Graphs are ideal for representing entities and relationships Rapidly growing use in social, environmental, and other sciences Zachary’s karate club (|V|=34) The way it was … Twitter social network (|V|≈20K) The way it is now …
13
New Questions New algorithms –Community detection, centrality, graph generation, etc. –Right set of questions and concepts still emerging. New issues –Noisy, error-filled data. What can we conclude robustly? –Semantic graphs with edges and vertices of different types. E.g. people, organizations, events How should this be exploited algorithmically? –Multilinear instead of linear algebra? New paradigms: –E.g. graph evolves over time –Temporal analysis, dynamics, streaming algorithms on graphs, etc Enormous opportunities for combinatorial algorithms
14
Outline A look back: –A brief history of a brief history A look ahead: –New application opportunities: data-centric computing Graph models of information retrieval Emerging science of complex networks –Architectural revolution: challenges and promise Challenges of near-future machines Potential architectures for discrete problems Conclusions
15
A Renaissance in Architecture Research Good news –Moore’s Law marches on –Real estate on a chip is essentially free Major paradigm change – huge opportunity for innovation Bad news –Power considerations limit the improvement in clock speed Eventual consequences are unclear Current response, multicore processors –Computation/Communication ratio will get worse Makes life harder for applications
16
Applications Also Getting More Complex Leading edge scientific applications increasingly include: –Adaptive, unstructured data structures –Complex, multiphysics simulations –Multiscale computations in space and time –Complex synchronizations (e.g. discrete events) Significant parallelization challenges on today’s machines –Finite degree of coarse-grained parallelism –Load balancing and memory hierarchy optimization Dramatically harder on millions of cores Huge need for new algorithmic ideas – CSC will be critical
17
Architectural Challenges for Graph Algorithms Runtime is dominated by latency –Particularly true for data-centric applications –Random accesses to global address space –Perhaps many at once – fine-grained parallelism Essentially no computation to hide access time Access pattern is data dependent –Prefetching unlikely to help –Usually only want small part of cache line Potentially abysmal locality at all levels of memory hierarchy
18
Locality Challenges What we traditionally care about What industry cares about Emerging Codes From: Murphy and Kogge, On The Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications, IEEE T. on Computers, July 2007
19
Example: AMD Opteron
20
L2 Cache L1 I-Cache L1 D-Cache Memory (Latency Avoidance) Example: AMD Opteron
21
L2 Cache L1 I-Cache L1 D-Cache Memory (Lat. Avoidance) Memory Controller I-Fetch Scan Align Load/Store Unit Out-of-Order Exec Load/Store Mem/Coherency (Latency Tolerance) Example: AMD Opteron
22
L2 Cache L1 I-Cache L1 D-Cache Memory (Latency Avoidance) Memory Controller I-Fetch Scan Align Load/Store Unit Out-of-Order Exec Load/Store Mem/Coherency (Lat. Toleration) Bus DDR HT Memory and I/O Interfaces Example: AMD Opteron
23
L2 Cache L1 I-Cache L1 D-Cache Memory (Latency Avoidance) Memory Controller I-Fetch Scan Align Load/Store Unit Out-of-Order Exec Load/Store Mem/Coherency (Lat. Tolerance) FPU Execution Int Execution Bus DDR HT Memory and I/O Interfaces COMPUTER Example: AMD Opteron Thanks to Thomas Sterling
24
Architectural Wish List for Graphs Low latency / high bandwidth –For small messages! Latency tolerant Light-weight synchronization mechanisms Global address space –No graph partitioning required –Avoid memory-consuming profusion of ghost-nodes –No local/global numbering conversions One machine with these properties is the Cray MTA-2 –And successor XMT
25
How Does the MTA Work? Latency tolerance via massive multi-threading –Context switch in a single tick –Global address space, hashed to reduce hot-spots –No cache or local memory. –Multiple outstanding loads Remote memory request doesn’t stall processor –Other streams work while your request gets fulfilled Light-weight, word-level synchronization –Minimizes conflicts, enables parallelism Flexible dynamic load balancing Notes: –220 MHz clock –Largest machine is 40 processors
26
Case Study I: MTA-2 vs. BlueGene With LLNL, implemented S-T shortest paths in MPI Ran on IBM/LLNL BlueGene/L, world’s fastest computer Finalist for 2005 Gordon Bell Prize –4B vertex, 20B edge, Erdös-Renyi random graph –Analysis: touches about 200K vertices –Time: 1.5 seconds on 32K processors Ran similar problem on MTA-2 –32 million vertices, 128 million edges –Measured: touches about 23K vertices –Time:.7 seconds on one processor,.09 seconds on 10 processors Conclusion: 4 MTA-2 processors = 32K BlueGene/L processors –[Berry, H., Kahan, Konecny, 2007]
27
Case Study II: Single Source Shortest Path PBGL SSSP Time (s) MTA SSSP # Processors Parallel Boost Graph Library (PBGL) –Lumsdaine, et al., on Opteron cluster –Some graph algorithms can scale on some inputs PBGL - MTA Comparison on SSSP –Erdös-Renyi random graph (|V|=2 28 ) –PBGL SSSP can scale on non-power law graphs –Order of magnitude speed difference –2 orders of magnitude efficiency difference Big difference in power consumption –[Lumsdaine, Gregor, H., Berry, 2007]
28
Longer Term Architectural Opportunities Near future trends –Multithreading to tolerate latencies –XMT-like capability on commodity machines? Potential big impact on latency-dominated applications (e.g. graphs) Further out –Application-specific circuitry E.g. hashing, feature detection, etc. –Reconfigurable hardware? Adapt circuits to the application at run time Lots of new combinatorial problems in these alternative computing models
29
Conclusions CSC is in robust health – Growing in breadth, depth, impact and visibility Trends in science play to our strengths –Growing complexity of traditional applications requires more CSC Unstructured, adaptive meshes; bigger problems; multiphysics; optimization; etc. –New science domains with combinatorial needs are emerging Social sciences, ecology, structural biology, etc. –Many sciences are becoming more data-rich –Complex computers require new discrete algorithms We can help applications on multicore nodes, and maybe influence future architectures Enormous need for new models and algorithmic improvements It’s a great time to be doing CSC!
30
Thanks Cevdet Aykanat, Jon Berry, Rob Bisseling, Erik Boman, Bill Carlson, Ümit Çatalürek, Edmond Chow, Karen Devine, Iain Duff, Danny Dunlavy, Alan Edelman, Jean-Loup Faulon, John Gilbert, Assefaw Gebremedhin, Mike Heath, Paul Hovland, Simon Kahan, Pat Knupp, Tammy Kolda, Gary Kumfert, Fredrik Manne, Mike Merrill, Richard Murphy, Esmond Ng, Ali Pınar, Cindy Phillips, Steve Plimpton, Alex Pothen, Robert Preis, Padma Raghavan, Steve Reinhardt, Suzanne Rountree, Rob Schreiber, Viral Shah, Jonathan Shewchuk, Horst Simon, Dan Spielman, Shang-Hua Teng, Sivan Toledo, Keith Underwood, etc.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.