3/17/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,

Slides:



Advertisements
Similar presentations
More Intel machine language and one more look at other architectures.
Advertisements

Instruction Set Design
4/14/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Web:
ELEN 468 Advanced Logic Design
ITEC 352 Lecture 27 Memory(4). Review Questions? Cache control –L1/L2  Main memory example –Formulas for hits.
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Overview von Neumann Model Components of a Computer Some Computer Organization Models The Computer Bus An Example Organization: The LC-3.
ARM Core Architecture. Common ARM Cortex Core In the case of ARM-based microcontrollers a company named ARM Holdings designs the core and licenses it.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 4: IT Students.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:
Instruction Set Architecture
ITEC 352 Lecture 20 JVM Intro. Functions + Assembly Review Questions? Project due today Activation record –How is it used?
Computer Architecture Instruction Set Architecture Lynn Choi Korea University.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
12/5/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.
Computer Organization 1 Instruction Fetch and Execute.
Chapter 2 Parts of a Computer System. 2.1 PC Hardware: Memory.
Reminder Bomb lab is due tomorrow! Attack lab is released tomorrow!!
1/31/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
CS61C L20 Datapath © UC Regents 1 Microprocessor James Tan Adapted from D. Patterson’s CS61C Copyright 2000.
Lecture 6: Branching CS 2011 Fall 2014, Dr. Rozier.
Lecture 2: Instruction Set Architecture part 1 (Introduction) Mehran Rezaei.
Microprocessor Fundamentals Week 2 Mount Druitt College of TAFE Dept. Electrical Engineering 2008.
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Memory Management Chapter 5 Advanced Operating System.
Information Security - 2. CISC Vs RISC X86 is CISC while ARM is RISC CISC is Compiler’s heaven while RISC is Architecture’s heaven Orthogonal ISA in RISC.
E Virtual Machines Lecture 2 CPU Virtualization Scott Devine VMware, Inc.
Embedded Real-Time Systems
Instruction Set Architectures Continued. Expanding Opcodes & Instructions.
Spring 2016Assembly Review Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);
Instruction Set Architecture
CS 140 Lecture Notes: Virtual Memory
Machine-Level Programming I: Basics
Displacement (Indexed) Stack
Credits and Disclaimers
William Stallings Computer Organization and Architecture 6th Edition
Immediate Addressing Mode
Assembly Programming II CSE 351 Spring 2017
Computer Architecture Instruction Set Architecture
Operating Systems Design (CS 423)
Choices in Designing an ISA
MIPS Coding Continued.
Lecture 4: MIPS Instruction Set
ELEN 468 Advanced Logic Design
CSE 351 Section 10 The END…Almost 3/7/12
MIPS Instruction Encoding
The University of Adelaide, School of Computer Science
ECE232: Hardware Organization and Design
Processor Architecture: The Y86-64 Instruction Set Architecture
ECEG-3202 Computer Architecture and Organization
CS 140 Lecture Notes: Virtual Memory
MIPS Instruction Encoding
Computer Architecture and the Fetch-Execute Cycle
Instruction Set Architectures Continued
PZ01C - Machine architecture
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
MIPS Coding Continued.
Get To Know Your Compiler
Machine-Level Programming II: Basics Comp 21000: Introduction to Computer Organization & Systems Spring 2016 Instructor: John Barr * Modified slides.
William Stallings Computer Organization and Architecture 8 th Edition Chapter 11 Instruction Sets: Addressing Modes and Formats.
Processor Design CS 161: Lecture 1 1/28/19.
Credits and Disclaimers
Credits and Disclaimers
Computer Architecture and System Programming Laboratory
Low level Programming.
Presentation transcript:

3/17/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King, and Andrew S Tanenbaum

Dynamic binary translation Simulate basic block using host instructions. Basic block? Cache translations 3/17/20162 Load r3, 16(r1) Add r4, r3, r2 Jump 0x48074 Load r3, 16(r1) Add r4, r3, r2 Jump 0x48074 Load t1, simRegs[1] Load t2, 16(t1) Store t2, simRegs[3] Load t1, simRegs[1] Load t2, 16(t1) Store t2, simRegs[3] Load t1, simRegs[2] Load t2, simRegs[3] Add t3, t1, t2 Store t3, simregs[4] Load t1, simRegs[2] Load t2, simRegs[3] Add t3, t1, t2 Store t3, simregs[4] Store 0x48074, simPc J dispatch_loop Binary Code Translated Code

This is Binary? We are calling the translation on previous slide dynamic “binary” translation. Where’s the binary? Answer, part 1: Decode translates binary effectively to assembly 3/17/20163

Decode void decode_control_signals(u32 opcode, struct control_signals *control) { control->raw = opcode; control->op = (opcode >> 30) & 0x3; control->rd = (opcode >> 25) & 0x1f; control->op2 = (opcode >> 22) & 0x7; control->op3 = (opcode >> 19) & 0x3f; control->rs1 = (opcode >> 14) & 0x1f; control->i = (opcode >> 13) & 0x1; control->asi = (opcode >> 5) & 0xff; control->simm = sign_extend_13(opcode & 0x1fff); control->imm = opcode & 0x3fffff; control->rs2 = opcode & 0x1f; control->disp22 = opcode & 0x3fffff; control->disp30 = opcode & 0x3fffffff; control->cond = (opcode >> 25) & 0xf; } 3/17/20164

OK, but aren’t we generating assembly? In slides we showed gen_load_T0(guest_reg_no) producing “assembly” This is an abstraction It needs to produce binary, ie bits 3/17/20165

Dynamic Binary Translation r0 (T0) r1 (T1) r2 (T2) r3 r4 r5 3/17/20166 Physical Registers cpu->regs gen_load_T0(guest_r1) Load r0, gr1_off(r5) Guest: Load r3, 16(r1)

Binary code generation What does it mean that gen_load_T0(guest_reg_no) Produces Load r0, gr1_off(r5)? 3/17/20167

Finding the pieces: Load r0, gr1_off(r5) 3/17/20168 We want op = 3, op3 = 0 (LD), i = 1 rd = 0, rs1=5, simm13= 4 * grn

Binary code generation gen_load_T0(guest_reg_no) { u32 op = 0x3 << 30; u32 op3 = 0x0 << 19; u32 i = 0x1 << 13; u32 rd = 0x0 << 25; u32 rs1 = 0x5 << 14; u32 simm13 = guest_reg_no * 4; return op & rd & op3 & rs1 & i & simm13} 3/17/20169

Dynamic binary translation for x86 Goal: translate this basic block Note: x86 assembly dst value is last Mov %rsp, %rbp //move stack ptr -> frame ptr Sub $0x10, %rsp //sub 10 from stack pointer Movq $0x100000, 0xfffffffffffffff8(%rbp) //set value into loc rel to frame ptr Problem: x86 instructions have varying length 3/17/201610

Inst-by-inst simulation: fetch void fetch(struct CPU *cpu, struct control_signals *control){ memset(control, 0, sizeof(struct control_signals)); control->orig_rip = cpu->rip; control->opcode[control->numOpcode++] = fetch_byte(cpu); if(IS_REX_PREFIX(control->opcode[0])) { control->rex_prefix = control->opcode[control->numOpcode-1]; control->rex_W = REX_PREFIX_W(control->rex_prefix); control->opcode[control->numOpcode-1] = fetch_byte(cpu); } //and more of this for multi-byte opcodes 3/17/201611

Inst-by-inst simulation: decode if(HAS_MODRM(control->opcode[0]) { control->modRM = fetch_byte(cpu); control->mod = MODRM_TO_MOD(control->modRM); control->rm = MODRM_TO_RM(control->modRM); control->reg = MODRM_TO_R(control->modRM); // now calculate m addresses if(IS_SIB(control)){ // more address lookup and fetching } else if(IS_DISP32) { // fetch immediate } else { addr = getRegValue(cpu, control) } // now fetch any disp8 or disp32 for addr 3/17/201612

Dynamic translation Amortize the cost of fetch and decode Two different contexts Translator context – fetch and decode Guest context – execute Makes debugging hard 3/17/201613

Dynamic translation for x86 Map guest registers into these host registers T0 -> rax T1 -> rbx T2 -> rcx – use for host addresses Rdi -> holds a pointer to cpu 3/17/201614

Inst-by-inst: Mov %rsp, %rbp cpu->reg[control->rm] = cpu->reg[control->reg]; Where rm == rbp index and reg == rsp index 3/17/201615

DBT: Mov %rsp, %rbp void mov(struct CPU *cpu, struct control_signals *control) { gen_load_T0(cpu->tcb, getRegOffset(cpu, control->reg)); gen_store_T0(cpu->tcb, getRegOffset(cpu, control->rm)); } // tcb = translation code block 3/17/201616

DBT: Mov %rsp, %rbp void gen_load_T0(struct tcb *tcb, uint32_t offset) { uint64_t opcode = offset; // create movq offset(%rdi), %rax opcode <<= 24; opcode |= 0x878b48; // store in cache memcpy(tcb->buffer+tcb->bytesInCache, translation, size); tcb->bytesInCache += size; 3/17/201617

DBT: compile // Mov %rsp, %rbp gen_load_T0(control->reg) gen_store_T0(control->rm) //Sub $0x10, %rsp gen_load_rm32_T0(cpu, control) gen_alu_imm32_T0(tcb, imm, SUB_ALU_OP) gen_store_rm32_T0(cpu, control) // movq $0x100000, -8(%rbp) gen_mov_imm32_T0(cpu, control->imm) gen_store_rm64_T0(cpu, control) 3/17/201618

Switching to guest context Jump_to_tc(cpu) { //will return a pointer to beginning of TC buf void (*foo)(struct CPU *) = lookup_tc(cpu->rip) foo(cpu); } What does the preamble for your TCB look like? How do we return back to the host (or translator) context? 3/17/201619

DBT optimizations Mov %rsp, %rbp Sub $0x10, %rsp movq $0x100000, -8(%rbp) Are there opportunities for optimizations? 3/17/201620

Discussion questions Could you run this raw code on the CPU directly? What assumptions do you have to make? When do you have to invalidate your translation cache Break into groups to discuss this Read embra for discussion about using guest physical vs guest virtual pc values to index tb 3/17/201621