A unique processor architecture meeting LLVM IR and the IoT

Slides:



Advertisements
Similar presentations
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
Advertisements

RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
Computer Abstractions and Technology
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Project Testing; Processor Examples. Project Testing --thorough, efficient, hierarchical --done by “independent tester” --well-documented, repeatable.
Chapter 13 Embedded Systems
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
COM181 Computer Hardware Ian McCrumRoom 5B18,
COMPUTER ORGANIZATIONS CSNB123 May 2014Systems and Networking1.
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Chapter 4 MARIE: An Introduction to a Simple Computer.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Chapter 16 Micro-programmed Control
Computer Architecture And Organization UNIT-II General System Architecture.
1 Text Reference: Warford. 2 Computer Architecture: The design of those aspects of a computer which are visible to the programmer. Architecture Organization.
PART 6: (1/2) Enhancing CPU Performance CHAPTER 16: MICROPROGRAMMED CONTROL 1.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
MICROPROGRAMMED CONTROL
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 2 Computer Organization.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
Basic Concepts Microinstructions The control unit seems a reasonably simple device. Nevertheless, to implement a control unit as an interconnection of.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Computer Organization and Architecture Lecture 1 : Introduction
Software Development Environment
CSC235 Computer Organization & Assembly Language
Computer Operations Part 2.
Computer Organization and Architecture + Networks
Assembly language.
Chapter 1 Introduction.
Microarchitecture.
Micro-programmed Control
A Closer Look at Instruction Set Architectures
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
nZDC: A compiler technique for near-Zero silent Data Corruption
Chapter 1 Introduction.
Introduction to Operating System (OS)
Architecture & Organization 1
Overview Introduction General Register Organization Stack Organization
A Closer Look at Instruction Set Architectures
Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 4 – The Instruction Set Architecture.
CISC (Complex Instruction Set Computer)
INTRODUCTION TO MICROPROCESSORS
Micro-programmed Control Unit
CSCI/CMPE 3334 Systems Programming
Superscalar Processors & VLIW Processors
Architecture & Organization 1
Central Processing Unit
The University of Adelaide, School of Computer Science
Processor Organization and Architecture
Overview of Computer Architecture and Organization
William Stallings Computer Organization and Architecture 8th Edition
Computer Architecture
Morgan Kaufmann Publishers Computer Performance
What is Computer Architecture?
Introduction to Microprocessor Programming
Introduction to Computer Systems
What is Computer Architecture?
Course Outline for Computer Architecture
CPU Structure CPU must:
A Level Computer Science Topic 5: Computer Architecture and Assembly
Computer Architecture
Presentation transcript:

A unique processor architecture meeting LLVM IR and the IoT Dávid Juhász david.juhasz@imsystech.com 4/2/2018 FOSDEM 2018 LLVM devroom

Compiling with LLVM LLVM High-Level Programming Language LLVM Front-end LLVM Assembly / IR LLVM Middle-end Start with LLVM 101: formats and steps when compiling with LLVM LLVM Back-end Your Typical Target ISA 4/2/2018 FOSDEM 2018 LLVM devroom

Improving Efficiency by Lifting Target High-Level Programming Language LLVM LLVM Front-end LLVM Assembly / IR LLVM Middle-end What this talk is about ISAs are usually designed with no respect for compiler IR We create an ISA tailor-made for LLVM We expect improved efficiency in several aspects, detailed later… Imsys ISA for LLVM Big gap Complex translation LLVM Back-end Reducing the gap Your Typical Target ISA 4/2/2018 FOSDEM 2018 LLVM devroom

Agenda Imsys Company and the IoT Imsys Lean Processing Technology Imsys ISA for LLVM 4/2/2018 FOSDEM 2018 LLVM devroom

Imsys AB Device Module IC IP Swedish semiconductor SME Products Devices Modules IC IP Device Module 4/2/2018 FOSDEM 2018 LLVM devroom IC IP

Networked Embedded Controllers History as supplier of networked embedded controllers Proven core technology Retargeting for the IoT Good match for our values 4/2/2018 FOSDEM 2018 LLVM devroom

Imsys EMBLA Single Controller Solution for the IoT Single controller board with graphics Various IO capabilities via extension board 4/2/2018 FOSDEM 2018 LLVM devroom

High-Level Software Platform C/C++ Java Application Code Integrated Development Environment LLVM Software platform overview Eclipse-based IDE for connecting to the development board Application Code can be Java C/C++ compiled with LLVM Executable Binary 4/2/2018 FOSDEM 2018 LLVM devroom

Improving Efficiency by Reducing the ISA Gap High-Level Programming Language LLVM LLVM Front-end LLVM Assembly / IR LLVM Middle-end Now let’s look under the hood to see what is the basis of the Imsys lean processing technology: what makes the Imsys processor different from the mainstream and how that allows us to dedicate an ISA for LLVM Assembly. Imsys ISA for LLVM Big gap Complex translation LLVM Back-end Reducing the gap Your Typical Target ISA 4/2/2018 FOSDEM 2018 LLVM devroom

Imsys Instruction Set Support Abstraction Layers Application Code Assembly/C/C++ JAVA ISAL ISAJ Check out the main levels of abstractions – the traditional way Processor core and its supported instruction sets Software code in high-level – System software and application Java with dedicated ISAJ for native execution Assembly/C/C++ targeting ISAL Imsys Instruction Set Support Imsys Processor Core 4/2/2018 FOSDEM 2018 LLVM devroom

A Forgotten Layer of Abstraction Software Microcode Hardware Software Hardware Today’s typical categorization: Hardware & Software There could be an extra layer in between, which gives Imsys processors unique abilities: microcode 4/2/2018 FOSDEM 2018 LLVM devroom

The Processor Architect’s Best Friend Microprogram Microinstruction IO Mult. ALU MEM Ctrl. Addr. Microcode enables architects to master their hardware softly Microprogram consisting of microinstructions in memory Hardware core executes microinstructions A microinstruction consists of fields Each fields controls functional units of the processor core directly Including sequence control, which selects the next microinstruction to execute The microcode runs an Operation-Oriented Architecture focusing on the task at hand Operation-Oriented Hardware Architecture 4/2/2018 FOSDEM 2018 LLVM devroom

Microcode, what is it good for? Minimal Hardware Complete Deterministic Control Maximum Efficiency Microcode, what is it good for? Flexibility List the main characteristics provided by microcoding Complete deterministic control – direct control over all functional parts of the core Minimal hardware – control by microcode, hence no deep pipeline, no out-of-order or speculative execution, no cache hierarchy, no complex state Maximum efficiency – no wasted computation, maximum utilization of hardware Flexibility – implement any computation with high efficiency, making the processor multi-purpose Soft-reconfigurability – microcode in memory may be changed at any time on demand Soft-Reconfigurability 4/2/2018 FOSDEM 2018 LLVM devroom

Abstraction Layers Revisited Application Code Software ISAL Match abstraction levels to our case Processor Core is hardware Instruction Set Support is in microcode Application Code is software Note that Instruction Sets are the lowest level software interface to use the microcoded features, they belong to software A compiler, LLVM, matches the high-level application code to the instruction set Imsys Instruction Set Support Microcode Imsys Processor Core Hardware 4/2/2018 FOSDEM 2018 LLVM devroom

A Unique Balance between Hardware and Software Application Balanced Rich ISA Compiler Application code is made executable by a compiler targeting an ISA ISA is typically defined by hardware Consider an operation-oriented hardware architecture with microcoding A microcode-defined ISA is lifted from the hardware A unique balance between software and hardware supports all parties to improve efficiency Systems architects design in complex domain-specific operations and deliver a compiler-friendly ISA with optimized hardware utilization Compiler designers target a machine supporting a balanced rich ISA with high-level operations Domain-specific Operations Microcode Operation-Oriented Hardware Architecture Instruction Set Architecture Instruction Set Architecture 4/2/2018 FOSDEM 2018 LLVM devroom

Improving Efficiency by Matching LLVM Assembly High-Level Programming Language LLVM LLVM Front-end LLVM Assembly / IR LLVM Middle-end We have just looked into the basics of the Imsys Lean Processing Technology, now turn towards our tailor-made instruction set for LLVM. Imsys ISA for LLVM Big gap Complex translation LLVM Back-end Reducing the gap Your Typical Target ISA 4/2/2018 FOSDEM 2018 LLVM devroom

Matching LLVM Assembly Simple, efficient, general LLVM LLVM Assembly LLVM back-end for ISAL LLVM middle-end How does ISAL match LLVM Assembly? ISAL matches LLVM Assembly by having semantically matching instructions Complexity propagated into ISAL implementation How does the semantic match help? LLVM back-end for ISAL Simple and efficient back-end relying mostly on general LLVM facilities No complex translation as no semantical difference LLVM middle-end applies general optimizations to LLVM Assembly Directly exploiting platform-independent LLVM optimizations due to the semantical match Moreover, general LLVM optimizations are enough, no need for much architecture-specific magic Direct use of optimizations ISAL Semantically matching instructions 4/2/2018 FOSDEM 2018 LLVM devroom

Meeting a Constrained Reality LLVM Assembly ISAL Operations Instructions & Additional system operations intrinsic functions Single Value Types Virtually unlimited Set of integer, floating point, pointer, and vector types Registers Unlimited Register windows Arguments Source and Registers with support of LLVM Assembly is a theoretical model with characteristics that render it impossible to implement directly Need to add special operations for bookkeeping Moving data between and immediate values into registers Managing execution state Handling I/O Need to select single value types to support Need to define actual register set Better to support special kinds of arguments to optimize register usage Better to optimize binary representation to improve code density, execution time, and power consumption destination registers accumulating in source registers, immediate values Binary Representation Bitcode Custom dense binary coding 4/2/2018 FOSDEM 2018 LLVM devroom

Optimized Operation Sequences a ≔ a + b Accumulating in source register add a a b opcode dst src1 src2 add.upd a b opcode src1 src2 a ≔ a + 42 Immediate values Regular instructions with destination and source registers are bloating binary code, so we have special variants. Accumulating in source register – happens quite often Consider example: a = a + b The corresponding regular operation needs 3 arguments A special accumulating (in-place update) variant works only with 2 arguments Binary size is reduced Working with immediate values Consider example: a = a + 42 Using the corresponding regular instruction needs the immediate value to be moved into a register by an extra instruction A special form with one immediate argument can do the same work as one instruction with 3 arguments Even better, combine the two: accumulating variant with one immediate argument does it as one instruction with 2 arguments Binary size is reduced again ISAL has different variants operations to improve code density add.imm a a 42 move b 42 add a a b add.upd.imm a 42 4/2/2018 FOSDEM 2018 LLVM devroom

Optimized Binary Representation Maximize Code Density Optimize Microcode Size Minimize Execution Time Efficiently encoding such a rich set of instructions requires variable-length instructions Instruction lengths between 1 and 10 Average 3.4 Binary format is defined considering the following goals Software perspective makes us maximize code density Frequent operations have shorter instruction The microcode implementation should be economic with optimized size Common instruction formats share decode logic Grouped instructions share operation logic Performance counts, so we need to minimize execution time Time for decoding is dependent on binary representation Simple operation should have short instruction Time for performing operation may be dependent on binary representation It could overlap with argument decoding if similar operations are grouped together Variable-length Instructions 4/2/2018 FOSDEM 2018 LLVM devroom

Average Binary Size of TI Suite Preliminary ISAL code density figures as average binary size of TI Suite benchmarks normalized to ISAL. ARM Cortex requires 35%+ more, Intel 80%+ more program memory. 4/2/2018 FOSDEM 2018 LLVM devroom

Imsys Lean Processing Technology LLVM Assembly LLVM back-end for ISAL Imsys ISA for LLVM ISAL Imsys EMBLA with ISAL Imsys Firmware Imsys Processor Core Imsys Lean Processing Technology Concluding with the main points Imsys Lean Processing Technology Operation-Oriented Hardware Architecture Microcode-defined ISA Imsys ISA for LLVM How ISAL matches LLVM Assembly Some design considerations ISAL and its software ecosystem are currently under development, and planned to be released for Imsys EMBLA in next year Dávid Juhász david.juhasz@imsystech.com Leaner is Meaner 4/2/2018 FOSDEM 2018 LLVM devroom