Principles/theory matter and can matter more: Big lead of PRAM algorithms on prototype-HW Uzi Vishkin There is nothing more practical than a good theory--

Slides:



Advertisements
Similar presentations
Problem solving skills
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
© 2011 IBM Corporation Improving Reliability and Making Things Cheaper to Run Tuesday 20th September James Linsell-Fraser, Senior Architect & Client Technical.
Modeling the Process and Life Cycle CSCI 411 Advanced Database and Project Management Monday, February 2, 2015.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
IPDPS Looking Back Panel Uzi Vishkin, University of Maryland.
Algorithms-based extension of serial computing education to parallelism Uzi Vishkin - Using Simple Abstraction to Reinvent Computing for Parallelism, CACM,
James Edwards and Uzi Vishkin University of Maryland 1.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
Set-Based Design Our operating assumption is that we want to be better designers and we are looking for ways to improve our techniques.
CONCLUSIONS and SUGGESTIONS. The Conclusions and Suggestions drawn up in this section includes the opinions about the general approaches for the applications.
Software Engineering.
1 GENI: Global Environment for Network Innovations Jennifer Rexford On behalf of Allison Mankin (NSF)
Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker.
Better Speedups for Parallel Max-Flow George C. Caragea Uzi Vishkin Dept. of Computer Science University of Maryland, College Park, USA June 4 th, 2011.
27-Jun-15 Profiling code, Timing Methods. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the.
The Future of Internet Research Scott Shenker (on behalf of many networking collaborators)
Teaching Parallelism Panel, SPAA11 Uzi Vishkin, University of Maryland.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin.
Programmability and Portability Problems? Time for Hardware Upgrades Uzi Vishkin ~2003 Wall Street traded companies gave up the safety of the only paradigm.
Computational Thinking Related Efforts. CS Principles – Big Ideas  Computing is a creative human activity that engenders innovation and promotes exploration.
CIO Academy Journey to Influential IT Leadership Journey to CIO Academy Strategic Competencies for 21st Century CIO Success Influential IT Leadership:
Susanne Hambrusch Division of Computing and Communication Foundations (CCF) CISE Directorate National Science Foundation June 28, 2012.
1 Framework Programme 7 Guide for Applicants
Why do so many chips fail? Ira Chayut, Verification Architect (opinions are my own and do not necessarily represent the opinion of my employer)
EENG 1920 Chapter 1 The Engineering Design Process 1.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Lab: Introduction to Loop Transformations Tomofumi Yuki EJCP 2015 June 22, Nancy.
Team Skill 6: Building the Right System From Use Cases to Implementation (25)
Who are they?  Mutli-tasker  Net-generation learners  Millennial Students  Generation Y & Z  Digital Natives  Interactive & Networked.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Software Processes.
© 2008 Autodesk Engineering Design with CAD Fusion FIRST training 2011.
Paul E. Sollock1 FOUR DECADES OF EVOLUTION TOWARD FLEXIWARE ( From the perspective of an Apollo, Shuttle, ISS and Shuttle Upgrades Veteran) Definitions.
Example: Sorting on Distributed Computing Environment Apr 20,
Does humans-in-the-service-of-technology have a future Preview of Viewpoint article: Is Multi-Core Hardware for General-Purpose Parallel Processing Broken?
+ Careers in STEM are Everywhere! Science, Technology, Engineering, Math.
What IS a Journeyman Programmer? Why this program?
Lecture 01: Welcome Computer Architecture! Kai Bu
1 Planning for Reuse (based on some ideas currently being discussed in LHCb ) m Obstacles to reuse m Process for reuse m Project organisation for reuse.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Interactive Supercomputing Update IDC HPC User’s Forum, September 2008.
Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Design Process … and some design inspiration. Course ReCap To make you notice interfaces, good and bad – You’ll never look at doors the same way again.
Constructivism A learning theory for today’s classroom.
One Day Training Programme for Business Trainers and Mentors.
2007 Science of Design (SoD) PI Meeting – Project Nuggets NSF SoD Award No: NSF SoD-HCER Project Title: Learning Based Programming Investigator.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
BSc Honours Project Introduction CSY4010 Amir Minai Module Leader.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 3 rd Project Review October 2015 WP6 – Space Demonstrator.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
University of Washington Today Quick review? Parallelism Wrap-up 
CSE 403, Spring 2008, Alverson CSE 403 Software Engineering Pragmatic Programmer Tip: Care about Your Craft Why spend your life developing software unless.
MUX-2010 DISCUSSION : W HAT DO WE NEED NEXT IN HEP.
New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.
Building a SW Architecture Group Tomer Peretz Chief Software Architect.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Lecture 01: Welcome Computer Architecture! Kai Bu
Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.
Conclusions on CS3014 David Gregg Department of Computer Science
Thoughts on How to Initiate An Academic Career - Research
Lab: Introduction to Loop Transformations
Reflective writing The Early Years Teacher Programme: Reflective Practice Reflective Writing for the PG Certificate.
Building the foundations for innovation
Lab: Introduction to Loop Transformations
CS 239 – Big Data Systems Fall 2018
Presentation transcript:

Principles/theory matter and can matter more: Big lead of PRAM algorithms on prototype-HW Uzi Vishkin There is nothing more practical than a good theory-- James C. Maxwell

Mainstream many-cores are in trouble Future of Computing Performance - Game Over or Next level, National Academy of Engineering “Only heroic programmers can exploit the vast parallelism in today’s machines” – Parallel HW is way too difficult to program for faster completion time of a single general-purpose task – Yawn factor. Apps innovation over last decade in performance general-purpose desktops low. Compare to internet & mobile – SW spiral (business model!) is broken -Need whole new parallel computing stack [  ‘generalist researchers’] Will advocate a theory-driven parallel computing stack Quiz who was the generalist behind the 1 st general-purpose serial computer?

Where are all the theorists? Many parallel computing theorists in 1980s & early 1990s Why recent rise in other communities, but not theory? My conjecture: – Unstable/Unappetizing programming models Theorists seek robust contributions. Important for business reasons, as well. Don’t blame the messenger. Recall also the “yawn factor” – Collective memory: Trauma of 1990s belittling/bashing theory.  Theory-driven parallel computing stack

Good news We can build on Bandwidth Bandwidth ~= 300 Latency Latency2012 Source [HP-12]

Building Confidence Theory Hypotheses and their Validation Be guided by theory/principles: 1981 – Introduced Work-Depth methodology [& aspiration that WD will become synonym with thinking in parallel] & 2 hypotheses: 1.WD methodology will be sufficient for PRAM [Argued: PRAM is too difficult. Common wisdom: too easy] 1980s-90s – PRAM won battle of ideas on parallel algorithms theory. WD used as description framework + validation of WD 2. PRAM/WD algorithms will be sufficient for (the right) stack [Skill – generalist] Negotiated WD/PRAM with HW constraints 2012 – Full prototype-stack (incl. HW) validation of WD/PRAM Also: validation of generalist skill

Best speedups on non-trivial stress tests “Normal” algorithm to implementation effort. No algorithmic creativity. 1.XMT enables PhD-level programming at High School, May “XMT is an essential component of our Parallel Computing courses because it is the one place where we are able to strip away industrial accidents from the student's mind, in terms of programming necessity, and actually build creative algorithms to solve problems”—national award winning HS teacher Parallel Algorithms course: prog assignment parallel BiConn Can do orders-of-magnitude better on speedups and ease-of-programming XMTGPU/CPU XMT source BiConn33X4X TarjanV85 TriConn129X? RamachandranV88, MillerR MaxFlow108X2.5X ShiloachV82 + GoldbergTarjan88

Not just talking – See CACM, Jan 2011 Algorithms&Software WorkDepth Creativity ends here PRAM validation completed with Bi&Tri-connectivity & Max-flow results – most advanced algs Programming & workflow No ‘parallel programming’ course beyond freshmen Stable compiler PRAM-On-Chip HW Prototypes 64-core, 75MHz FPGA of XMT (Explicit Multi-Threaded) architecture SPAA98..CF core intercon. network IBM 90nm: 9mmX5mm, 400 MHz [HotI07] FPGA design  ASIC IBM 90nm: 10mmX10mm Direct HW ‘orchestration’ Architecture scales to cores on-chip

Research challenge in principles of parallel computing Establish that principles/theory actually matter and can once and for all remedy the ills of the field XMT validated the hypothesis on WD/PRAM algorithms. Now, advance XMT into a serious new stack alternative: – Applications Establish: XMT can provide order-of-magnitude improvements on interesting scientific/market applications – Wall-clock runtime Upgrade current prototype to yield wall-clock speedups  appealing to widely try by others including application people

Need NSF program Enable new stack Must enable: Coherent theme. Prefer: theory- or architecture-driven proposals led by generalists (plus domain experts), rather than ‘check-list collection’ of such experts Development budget, so that research can follow Opportunity to test recent concepts; e.g., min primitive Separate in budget/features from other programs Last 3 arguments in favor of a theory-driven stack Architects: Really good at optimizing HW, given code examples. But, what if 1 st HW draft is not good enough? Jury out for … 5 th decade Primary future beneficiary from good enough 1 st draft - problems they are best at Biggest (?) problem Diminished competition among HW vendors NSF should not limit itself to industry welfare. How about the opposite: compensate for insufficient competition?