1 Customizable Domain-Specific Computing Proposal for NSF “Expedition in Computing” Program Point of Contact: Prof. Jason Cong Participating.

Slides:



Advertisements
Similar presentations
National Academy of Engineering of the National Academies 1 Phase II: Educating the 2020 Engineer Phase II: Adapting Engineering Education to the New Century...
Advertisements

ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Test Automation Success: Choosing the Right People & Process
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Update on Goals 1 and 2 Curricular Domain Curricular Domain – accomplishments to date Developed baseline information about current level of faculty.
Teaching Courses in Scientific Computing 30 September 2010 Roger Bielefeld Director, Advanced Research Computing.
Cyber Education Project Accreditation Committee November 2014.
Successful Graduation Projects
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
FP6 Thematic Priority 2: Information Society Technologies Dr. Neil T. M. Hamilton Executive Director.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
R R R CSE870: Advanced Software Engineering (Cheng): Intro to Software Engineering1 Advanced Software Engineering Dr. Cheng Overview of Software Engineering.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
1 Strategic Planning: An Update March 13, Outline What we have done so far? Where do we stand now? Next steps?
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
International Center on Design for Nanotechnologies (IC-DFN) Jason Cong University of California, Los Angeles Tel: ,
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Global City Teams Challenge Funding Opportunities.
Capstone Design Project (CDP) Civil Engineering Department First Semester 1431/1432 H 10/14/20091 King Saud University, Civil Engineering Department.
C OLUMBIA U NIVERSITY Lightwave Research Laboratory Embedding Real-Time Substrate Measurements for Cross-Layer Communications Caroline Lai, Franz Fidler,
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Effective Methods for Software and Systems Integration
Darema Dr. Frederica Darema NSF Dynamic Data Driven Application Systems (Symbiotic Measurement&Simulation Systems) “A new paradigm for application simulations.
EPSRC Mathematical Sciences Programme David Harman – Head of Programme Katharine Bowes – Pure Mathematics Mark Bambury – Applied Mathematics Janet Edwards.
Recommendations Overview Student Success Task Force.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.
ECE-777 System Level Design and Automation Introduction 1 Cristinel Ababei Electrical and Computer Department, North Dakota State University Spring 2012.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
CSE ACCREDITATION REVIEW BY CAC & EAC UC Irvine October 2, 2013.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Automated Design of Custom Architecture Tulika Mitra
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
ELEC692/04 course_des 1 ELEC 692 Special Topic VLSI Signal Processing Architecture Fall 2004 Chi-ying Tsui Department of Electrical and Electronic Engineering.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Transforming the Tech Valley Workforce Region A Blueprint From Traditional Manufacturing to Globally Competitive Advanced Manufacturing and Technology.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
S AN D IEGO AND I MPERIAL V ALLEY B ASIC S KILLS N ETWORK Dr. Lisa Brewster.
The Importance of a Strategic Plan to Eliminate Health Disparities 2008 eHealth Conference June 9, 2008 Yvonne T. Maddox, PhD Deputy Director Eunice Kennedy.
AP + PROJECT LEAD THE WAY PARTNERSHIP OVERVIEW ®.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Full and Para Virtualization
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
Overview of RUP Lunch and Learn. Overview of RUP © 2008 Cardinal Solutions Group 2 Welcome  Introductions  What is your experience with RUP  What is.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
March 12, SIGCSE Report FOCE Summit Panel 1 Getting to a Future of Computing Education Summit Joseph Urban Texas Tech University.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
Coding Connections at the Interface of Algebra I and Physical World Concepts Improving Teacher Quality Grant Program Summer 2016.
Advanced Software Engineering Dr. Cheng
Ph.D. in Computer Science
Enabling machine learning in embedded systems
First-Stage Draft Plans for Gen Ed Revision
Xuechao Wei, Peng Zhang, Cody Hao Yu, and Jim Wu
HIGH LEVEL SYNTHESIS.
2018 NSF Expeditions in Computing PI Meeting
2018 NSF Expeditions in Computing PI Meeting
H2020 Infoday on Photonics Topics
Presentation transcript:

1 Customizable Domain-Specific Computing Proposal for NSF “Expedition in Computing” Program Point of Contact: Prof. Jason Cong Participating Universities: UCLA (lead), Rice, Ohio-State, and UC Santa Barbara (Complete list of PI/Co-PI available inside)

2 Outline u Motivation u Overall approach u Research plan u Management and collaboration plan u Education and outreach plan u Deliverables and knowledge transfer u Why an expedition

3 Focus: Power/Energy Efficient Computation Current Solution: Parallelization Parallelization Source: Shekhar Borkar, Intel

4 Our Proposal: Beyond Parallelization – Customizable Domain-Specific Computing Parallelization Customization Adapt the architecture to application Source: Shekhar Borkar, Intel

5 Motivation and Vision u A few facts   We have sufficient computing power for most applications   Each user/enterprise need high computing power for only limited tasks in his/her application-domain   Application-specific integrated circuits (ASIC) can lead to 10,000x+ better power performance efficiency, but too expensive to design and manufacture u Our vision and approach   A general, customizable platform for the given domain(s) Can be customized to a wide-range of applications in the domain with novel compilation and runtime systems Can be massively produced with cost efficiency Can be programmed efficiently u Goal: A “supercomputer-in-a-box” with 100x performance/power improvement via customization for the intended domain(s) u Analogy: Advance of civilization via specialization/customization

6 Application Domains: Medical Image Processing & Hemodynamic Simulation u Medical imaging has transformed healthcare   An in vivo method for understanding disease development and patient condition   Estimated to be $100 billion/year   More powerful & efficient computation can help Fewer exposure using compressive sensing with lower sampling frequency Better clinical assessment using improved registration and segmentation algorithms to provide quantitative measures of disease (e.g., cancer) u Hemodynamic simulation   Very useful for surgical procedures involving blood flow and vasculature u Both may take hours to days to construct  Clinical requirement: 1-2 min Intracranial aneurysm reconstruction with hemodynamics Magnetic resonance (MR) angiography of an aneurysm

7 compressive sensing level set methods fluid registration total variational algorithm Application Domains: Medical Image Processing Pipeline denoising registration segmentation analysis reconstruction Navier-Stokes equations

8 compressive sensing level set methods fluid registration total variational algorithm Navier-Stokes equations Non-iterative, highly parallel, local & global communication sparse linear algebra, structured grid, optimization methods parallel, global communication dense linear algebra, optimization methods local communication sparse linear algebra, n-body methods, graphical models local communication dense linear algebra, spectral methods, MapReduce iterative, local or global communication dense and sparse linear algebra, optimization methods Application Domains: Medical Image Processing Pipeline denoising registration segmentation analysis reconstruction These algorithms have diverse computation & communication patterns These algorithms have diverse computation & communication patterns A single, homogeneous system cannot perform very well on all of these algorithms A single, homogeneous system cannot perform very well on all of these algorithms Need architecture customization and hardware- software co-optimization Need architecture customization and hardware- software co-optimization Include many common computation kernels (“motifs”) Include many common computation kernels (“motifs”) Applicable to other domains Applicable to other domains Bi-harmonic registration (Using the same algorithm on all platforms) CPU (Xenon 2.0 GHz) 1x ~100 W GPU (Tesla C1060) 93x ~150 W FPGA (xc4vlx100) 11x~5W 3D median filter: For each voxel, compute the median of the 3 x 3 x 3 neighboring voxels CPU (Xenon 2.0 GHz) Quick select 1x ~100 W GPU (Tesla C1060) Median of medians 70x ~140 W FPGA (xc4vlx100) Bit-by-bit majority voting 1200x ~3 W

9 Customizable Heterogeneous Platform (CHP) Reconfigurable RF-I bus Reconfigurable optical bus Transceiver/receiver Optical interface Overview of the Proposed Research Domain characterization Application modeling Design once Invoke many times Domain-specific-modeling (healthcare applications) Architecture modeling

10 CHP Creation – Design Space Exploration Key questions:Optimal trade-off of efficiency & customizability Which options to fix at CHP creation? Which to be set by CHP mapper? Custom instructions & accelerators  Amount of programmable fabric  Shared vs. private accelerators  Custom instruction selection  Choice of accelerators  … Custom instructions & accelerators  Amount of programmable fabric  Shared vs. private accelerators  Custom instruction selection  Choice of accelerators  … Core parameters  Frequency & voltage  Datapath bit width  Instruction window size  Issue width  Cache size & configuration  Register file organization  # of thread contexts  … Core parameters  Frequency & voltage  Datapath bit width  Instruction window size  Issue width  Cache size & configuration  Register file organization  # of thread contexts  … NoC parameters  Interconnect topology  # of virtual channels  Routing policy  Link bandwidth  Router pipeline depth  Number of RF-I enabled routers  RF-I channel and bandwidth allocation  … NoC parameters  Interconnect topology  # of virtual channels  Routing policy  Link bandwidth  Router pipeline depth  Number of RF-I enabled routers  RF-I channel and bandwidth allocation  … Customizable Heterogeneous Platform (CHP) $ $ $ $ $ $ $ $ Fixed Core Custom Core Prog Fabric Reconfigurable RF-I bus Reconfigurable optical bus Transceiver/receiver Optical interface

11 CHP Mapping – Compilation and Runtime Software Systems for Customization Goal: Efficient compiler and runtime support to map domain-specific specification to customizable hardware Adapt the CHP to a given application for drastic performance/power efficiency improvement Domain-specific applications Abstract execution Programmer Domain-specific programming model (Domain-specific coordination graph and domain-specific language extensions) Source-to source CHP Mapper Application characteristics CHP architecture models C/C++ code C/C++ front-end Reconfiguring and optimizing back-end Analysis annotations Binary code for fixed & customized cores Customized target code RTL for prog fabric RTL Synthesizer (xPilot) C/SystemC behavioral spec Performance feedback Adaptive runtime Lightweight threads and adaptive configuration

12 Center for Domain-Specific Computing (CDSC) Organization UCLARiceUCSBOhio State Domain-specific modeling Bui, Reinman, Potkonjak Sarkar, Baraniuk Sadayappan CHP creation Chang, Cong, Reinman Cheng CHP mapping Cong, Palsberg, Potkonjak Sarkar ChengSadayappan Application modeling Aberle, Bui, Vese Baraniuk Experimental systems All (led by Cong & Bui)All ReinmanPalsbergSadayappan Sarkar (Associate Dir) VesePotkonjak AberleBaraniukBui Cong (Director) ChengChang A diversified & highly accomplished team: 8 in CS&E; 1 in EE; 2 in medical school; 1 in applied math

13 Management and Collaboration Plan u u Director: Jason Cong (UCLA), Associate Director: Vivek Sarkar (Rice)   Oversee the center operation u u Research Executive Committee (REC): leaders of 4 research thrusts + 2 directors   Monthly teleconferences to review the research progress and facilitate inter-thrust collaboration u u Each thrust will have weekly or biweekly meeting driven by research milestones   Leveraging extensive collaboration history among PI/Co-PIs Everyone had/has joint projects/publications with others in the center Everyone had/has joint projects/publications with others in the center   Inter-campus students exchanges are planned and encouraged u u Three center-wide meetings each year   January, May, and September (annual review, with guests from NSF and industry)   Research talks + poster sessions + brainstorm sessions + feedback session (at annual review)

14 Milestones Year 1Year 2Year 3Year 4Year 5 Application modeling Form benchmark sets in medical imaging and hemodynamic & establish baseline results Demonstration of benchmark sets on Prototype 1a Model the benchmark sets on DSCG & DSLE and drive the CHP optimizations Demonstration of benchmark sets on optimized CHP runtime environment Evaluation of benchmark on final CHP and quantify the impact on real world clinical data Domain- specific specification Develop Domain Specific Coordination Graph (DSCG) with abstract metrics Implementation of DSCG+DSLE executable models for benchmark sets; Identification of abstract execution metrics to guide CHP exploration Refinement of DSCG+DSLE executable models for benchmark sets Public release of DSCG infrastructure and the DSCG+DSLE executable models for benchmark sets CHP creation CHP hierarchical imulation Infrastructure CHP initial design- space tuning; Domain- specific component synthesis & selection Refinement of CHP design- space exploration with detailed simulation CHP design- space exploration with full system simulation System integration CHP mapping Source-to-source CHP mapper for Prototype 1a, Fine-grained task scheduling system with locality and load balance adaptations Design of software reliability components Reconfiguring and optimizing back-end transformations; Phase-based adoptions in adaptive runtime Support of software reliability Demonstration of the full CHP mapping system on Prototypes 1a & 2 Experimental systems Initial CHP prototype with COTS components (Prototype 1a) Prototype RF-I chip (Prototype 1b) with traffic generators and multicast CHP testbed (Prototype 2) prototyping on FPGAs CHP testbed tapeout (Prototype 2) Full system integration and demonstration

15 Milestones for Experimental Platforms u Prototype 1a: Heterogeneous integration of off- the-shelf CMPs + GPUs + FPGAs, e.g., u Intel Xeon CPU + Xilinx V5 FPGA (via FSB) + Nvidia Tesla GPU (via PCI-express 2.0) u Initial HW platform for CHP compilation and runtime system development u Prototype 1b: RF-interconnect prototype u RF-I implementation at 45nm CMOS with multiple digital cores/traffic generators u Performance, power, and reliability study u Prototype 2: final CHP implementation for the proposed healthcare domains u Single-chip integration or 3D integration RF-I tape-out at IBM 90nm CMOS

16 Integrated Research and Education u New courses planned based on the research   “Architecture and Compilation for Domain-specific Computing”   “C omputational Techniques for Medical Imaging”   “Programming Models and Application Development for Domain-specific Computing ” With projects for new domain, e.g., scientific computing, VLSI CAD, and digital entertainment   May be jointly taught (multi-disciplinary)   Developed and shared via Connexions (cnx.org), an open-access education platform now with over 1M users/month (based at Rice) u Graduate student training   Estimated around 18 students in total in four campuses   Seminars and workshops on interdisciplinary research, career development, ethics, entrepreneurship … u Undergraduate student training   10 summer research fellowship each year, via UCLA FOCUS, Rice AGEP and similar programs u Outreach to high-school students   5-7 high-school summer scholarship each year, via UCLA SMARTS programs

17 Outreach Partner: Frontier Opportunities in Computing for Underrepresented Students (FOCUS) u Aims to increase the number of under- represented minorities interested in computing disciplines u Currently has 50 underrepresented undergraduates:   23 in CS   27 in CSE u summer research poster competition The first prize winner

18 Outreach Partner: Science Mathematics Achievement and Research Technology for Students (SMARTS) u A six-week summer college preparation program at UCLA   Engage underrepresented students in science, technology, engineering and math training u SMARTS activities   Course related activities Math courses (Intro to Statistics and AP Calculus Readiness) SAT preparation   Research activities u Will have CDSC faculty and graduate students involved to serve as mentors and provide projects u This year, SMARTS program has over 80 applicants   will be admitted (due to limitation of funding)

19 Knowledge Transfer u Main outcome of the project 1. 1.CHP prototypes 2. 2.Compilation and runtime system for CHP mapping 3. 3.Application drivers – original source code & modified code with domain-specific modeling 4. 4.General methodology for customizable computing (mainly through publications) #1 – 3 will be shared with the research community via web as they become available u Industrial partners   Altera, IBM, Intel, Magma, Mentor Graphics, Nvidia, Xilinx   More will be contacted and included if the project is officially funded u Campus partners   UCLA Institute of Digital Research and Education (IDRE)   Institute of Pure and Applied Mathematics (IPAM)   UCLA Wireless Health Institute (WHI) u Technology transfer experience   Impact via industrial partners: IBM, Intel, Xilinx …   Startups: Aplus (acquired by Magma in 2003), AutoESL (Magma and Xilinx were investors)

20 Why an Expedition u Address a fundamental problem – energy efficient computing   What’s beyond parallelization?   Our proposal – a transformative approach using customization u Many challenging research topics   Domain-specific modeling/specification   Novel architecture & microarchitecture for customization   Compilation and runtime software to support intelligent customization   New research in testing, verification, reliability, etc in customizable computing u Integrated effort in modeling, HW, SW, & application development u Demonstration in a critical application domain   Healthcare has a significant impact to economy and society   Can greatly benefit from customizable domain-specific computing