The FLAME Project Faculty: Robert van de Geijn (CS/ICES) Don Batory (CS) Maggie Myers (SDS) John Stanton (Chem) Victor (TACC) Research Staff: Field Van.

Slides:



Advertisements
Similar presentations
How a Domain-Specific Language Enables the Automation of Optimized Code for Dense Linear Algebra DxT – Design by Transformation 1 Bryan Marker, Don Batory,
Advertisements

ScicomP 10, Aug 9-13, 2004 Parallel Out-of-Core LU and QR Factorization Brian Gunter Center for Space Research The University of Texas at Austin, Austin,
Don Batory, Bryan Marker, Rui Gonçalves, Robert van de Geijn, and Janet Siegmund Department of Computer Science University of Texas at Austin Austin, Texas.
Future Directions for NSF Advanced Computing Infrastructure to support US Science in CASC, April 25, 2014 Jon Eisenberg Director, CSTB v2.
1 Anatomy of a High- Performance Many-Threaded Matrix Multiplication Tyler M. Smith, Robert A. van de Geijn, Mikhail Smelyanskiy, Jeff Hammond, Field G.
Isaac Lyngaas John Paige Advised by: Srinath Vadlamani & Doug Nychka SIParCS,
Semester Conversion Guidelines Fall and Spring semester – 15 week January intercession – 3 week Summer session – 10 week Bachelor’s Degree 121 credits.
1/03/09 De 89 à 98. 1/03/09 De 89 à 98 1/03/09 De 89 à 98.
Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.
A Framework for Distributed Tensor Computations Martin Schatz Bryan Marker Robert van de Geijn The University of Texas at Austin Tze Meng Low Carnegie.
Refactoring By: Brian Smith. What is Refactoring? Definition: a change to the internal structure of software to make it easier to understand and cheaper.
1 st 10 Presidents of the United States of America.
Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures Min Si PhD student at University of Tokyo, Tokyo, Japan Advisor : Yutaka.
Intellectual Property and Senior Design Projects.
NSF Vision and Strategy for Advanced Computational Infrastructure Vision: NSF Leadership in creating and deploying a comprehensive portfolio…to facilitate.
Information Technology at Emory IT Briefing October 2003.
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.
Beyond GEMM: How Can We Make Quantum Chemistry Fast? or: Why Computer Scientists Don’t Like Chemists Devin Matthews 9/25/ BLIS Retreat1.
1 Jack Dongarra University of Tennesseehttp://
1 OCEANIA TECHNOLOGY SEMINAR 2008 © 2008 OSIsoft, Inc. | Company Confidential OCEANIA TECHNOLOGY SEMINAR 2008 © 2008 OSIsoft, Inc. | Company Confidential.
Intellectual Property and Senior Design Projects.
Intellectual Property and Senior Design Projects.
PACC2011, Sept [Soccer] is a very simple game. It’s just very hard to play it simple. - Johan Cruyff Dense Linear Algebra subjectmake RvdG.
Carnegie Mellon Generation of SIMD Dense Linear Algebra Kernels with Analytical Models Generation of SIMD Dense Linear Algebra Kernels with Analytical.
SuperMatrix on Heterogeneous Platforms Jianyu Huang SHPC, UT Austin 1.
Computer Science Jeopardy! December 1, 2011 Ferguson 360 CS Jeopardy! December 2011.
Welcome to the 2006 ISR Research Forum Walt Scacchi Acting Director.
Computing Ed 1: Day 2 New teaching paradigms: –*not* web-based / video-based – active learning –working in groups –community service projects –team teaching.
CS6068 Week 2 Quiz. What are David Patterson’s Three Wall of Computer Architecture?
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
R UNTIME D ATA F LOW S CHEDULING OF M ATRIX C OMPUTATIONS E RNIE C HAN R UNTIME D ATA F LOW S CHEDULING OF M ATRIX C OMPUTATIONS E RNIE C HAN C HOL 0 A.
CO$T TE150 ON-LINE. $$$$ = Expensive $$$ = Moderate $$ = Minimal $ = Negligible $CALE TE150 ON-LINE CO$T.
THE UNIVERSITY OF TEXAS AT AUSTIN Programming Dense Matrix Computations Using Distributed and Off-Chip Shared-Memory on Many-Core Architectures Ernie Chan.
The Lean LaunchPad Imperial Application Team Name TEAM MEMBERS NAMES Member 1Member 2Member 3Member 4 What is your professional status and affiliation?
DO NOW: Take out your summer reading work so I can check for completion!
Adding Algorithm Based Fault-Tolerance to BLIS Tyler Smith, Robert van de Geijn, Mikhail Smelyanskiy, Enrique Quintana-Ortí 1.
March 22, 2010Intel talk1 Runtime Data Flow Graph Scheduling of Matrix Computations Ernie Chan.
Scaling up R computation with high performance computing resources.
Hopper The next step in High Performance Computing at Auburn University February 16, 2016.
。 33 投资环境 3 开阔视野 提升竞争力 。 3 嘉峪关市概况 。 3 。 3 嘉峪关是一座新兴的工业旅游城市,因关得名,因企设市,是长城文化与丝路文化交 汇点,是全国唯一一座以长城关隘命名的城市。嘉峪关关城位于祁连山、黑山之间。 1965 年建市,下辖雄关区、镜铁区、长城区, 全市总面积 2935.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
June 13-15, 2010SPAA Managing the Complexity of Lookahead for LU Factorization with Pivoting Ernie Chan.
June 9-11, 2007SPAA SuperMatrix Out-of-Order Scheduling of Matrix Operations for SMP and Multi-Core Architectures Ernie Chan The University of Texas.
BIOGRAPHY PROJECT. I. REQUIRED ELEMENTS Posterboard with pictures and information on the person you choose. Brief biography: at least one page Timeline.
High-performance Implementations of Fast Matrix Multiplication with Strassen’s Algorithm Jianyu Huang with Tyler M. Smith, Greg M. Henry, Robert A. van.
H.W.Greenham & Sons MURRAY FOOTBALL LEAGUE ROUND 1.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Generating Families of Practical Fast Matrix Multiplication Algorithms
Stanford University.
BLIS: Year In Review, Field G. Van Zee
A survey of Exascale Linear Algebra Libraries for Data Assimilation
Using BLIS Building Blocks:
IEEE NPSS Real Time Conference 2009
The Ed Tech Report February 18, 2009 Edition.
Adding and Subtracting Linear Expressions
Coding FLAME Algorithms with Example: Cholesky factorization
Reduced Costs Service & Support Innovative Technology Performance Quality.
Reduced Costs Service & Support Innovative Technology Performance Quality.
Impact Panel SI^2 PIs Meeting.
Boolean logic in CMOS.
PROJECT TITLE PROJECT OVERVIEW PRODUCTS MAJOR FINDINGS
PROJECT TITLE PROJECT OVERVIEW PRODUCTS MAJOR FINDINGS
NSF cloud Chameleon: Phase 2 Networking
Anton/Busby Contemporary Linear Algebra
ECE/CS 757: Advanced Computer Architecture II
TRUST: Security Education Program at Stanford
Using BLIS Building Blocks:
Storage Elements.
Run time performance for all benchmarked software.
Presentation transcript:

The FLAME Project Faculty: Robert van de Geijn (CS/ICES) Don Batory (CS) Maggie Myers (SDS) John Stanton (Chem) Victor (TACC) Research Staff: Field Van Zee Postdocs: Bryan Marker Devin Matthews Grad students: Tyler Smith (CS) Martin Schatz (CS) Woody Austin (CS) Scott Rabidoux (CSEM) Jianyu Huang (CS) Undergrads: David Rosa Tamsen Dillon Huff Karen Tsao Ilya Polkovnichenko Josh Blair Nicholas Moreles

LayerTraditionalFLAME * PrimitivesBLASBLIS SequentialLAPACKlibflame DAG schedulingPLASMAlibflame/SuperM atrix Distributed memory ScaLAPACKElemental (Jack Poulson) The Dense Linear Algebra Software Stack * Sponsored by an NSF Software Infrastructure for Sustained Innovation

Blue Gene/Q PowerPC A2 16 cores ESSL 1 core ESSL 16 cores BLIS

Intel Xeon Phi 60 cores/240 threads