Download presentation
Presentation is loading. Please wait.
1
Virtual Machines: Squeak and Java
Why use a virtual machine? O-O, portability, memory, security How Smalltalk compiles and executes bytecodes Specific details of Squeak’s VM Object memory, garbage collection, primitives How Java’s VM differs Support for native types 9/20/2018 Copyright 2000, Georgia Tech
2
Copyright 2000, Georgia Tech
Why not just use C? Strengths of Virtual Machines Better for O-O Objects are complex VM handles initializations Supports message sending “Ask, don’t touch” Adele Goldberg The compiler can do some of this but then it would produce larger code. Much of this can be put in the virtual machine and low-level support like setting up the object header format and pointers to the class. 9/20/2018 Copyright 2000, Georgia Tech
3
Strengths of Virtual Machines, part 2
Portability Virtual machines allow for binary compatibility between platforms A simple VM implementation allows for easy porting of the VM Memory handling Programmer doesn’t explicitly allocate and de-allocate memory Typically, no pointers VM provides garbage collection Java’s VM is much more complicated than Squeak’s, but it runs faster. Java’s VM has more native libraries which makes porting slower but running faster. 9/20/2018 Copyright 2000, Georgia Tech
4
Strengths of Virtual Machines, part 3
Security Not running on the bare machine, can be safer Explicit execution model Easier to create reflective programs, meta-programming Threads and control structures are available to programmer 9/20/2018 Copyright 2000, Georgia Tech
5
Strengths of Virtual Machines, part 4
More memory efficient Definitely an issue for low-power, low-memory consumer devices Can be faster startup Side effect of more memory efficient For some activities, speed is a wash Floating-point, network... As C programs get more complicated the memory use climbs quickly. A VM memory use starts out higher but grows slowly. Thus VMs are useful in small electronic devices like cell phones. 9/20/2018 Copyright 2000, Georgia Tech
6
Compiling Smalltalk to Bytecodes
Bytecodes are the instructions of the Smalltalk virtual machine The Smalltalk virtual machine is not register-based (like Pentium, Sparc, etc.) There are registers, but they’re not made visible to the programmer Instead, it’s stack-based Parameters are pushed onto the stack, and then are manipulated from there 9/20/2018 Copyright 2000, Georgia Tech
7
Kinds of Bytecodes in the Smalltalk VM
Push bytecodes Put things (like instance variables) on the stack for manipulation Store bytecodes From stack into object variable Send bytecodes Causes a message to be sent Return bytecodes Return top of stack, or nil, true, false, self Jump bytecodes For optimizing branches 9/20/2018 Copyright 2000, Georgia Tech
8
Shhh…It’s not all messages!
Jump bytecodes allow us to optimize some operations This is a method in Rectangle. Notice that some of the methods are not being sent as messages but are implemented as jumps. The first bytecode puts the first instance variable in the Rectangle (the origin) onto the stack. The second pushes the first local variable (aPoint) on the stack. The next bytecode sends the <= method. The next bytecode checks if the result of the last was a false and if so jumps to 13 which pushes a false on the stack and returns. 9/20/2018 Copyright 2000, Georgia Tech
9
Example Compilation of Smalltalk
center ^origin + corner/2 Compute the center point of a Rectangle It’s actually a little different in current Squeak But if you type the above in, it will compile to the same bytecodes Bytecodes: 0, 1, 176, 119, 185, 124 The bytecode numbers map to the hexadecimal values (176 equals B0). 9/20/2018 Copyright 2000, Georgia Tech
10
Copyright 2000, Georgia Tech
Meaning of bytecodes 0: Push the value of self’s first instance variable (origin) onto the stack 1: Push the value of the receiver’s second instance variable (corner) onto the stack 176: Send + (result is left on the stack) 119: Push the SmallInteger 2 onto the stack 185: Send a binary message with the selector / (result is left on the stack) 124: Return the object on top of the stack as the value of the message center 9/20/2018 Copyright 2000, Georgia Tech
11
Constants go in the Literal Frame
Each method gets compiled into a CompiledMethod A CompiledMethod includes a header, the bytecodes, and a “literal frame” The literal frame contains the constants from the method Strings, numbers that are too big to be SmallIntegers, symbols for messages, etc. Obviously, references to the literal frame are a bit slower because it’s another memory access 9/20/2018 Copyright 2000, Georgia Tech
12
An Example with Literals
Transcript and the string are literals that appear (earlier) in the CompiledMethod 9/20/2018 Copyright 2000, Georgia Tech
13
Copyright 2000, Georgia Tech
What does that mean? Push on the stack the literal #Transcript Duplicate it (for that cascade in a moment) Push the constant ‘Hello World!’ Send the (literal) #show: Pop the stack (we don’t care about the result) Send the (literal) #cr Pop stack Return self 9/20/2018 Copyright 2000, Georgia Tech
14
Methods go in a MethodDictionary
Every method is compiled into a CompiledMethod The CompiledMethod is stored in a MethodDictionary (one for every class) with the selector of the method as the key Lookup of a message involves finding the class that can return a CompiledMethod for the sought after message selector 9/20/2018 Copyright 2000, Georgia Tech
15
What the VM Interpreter Knows
The CompiledMethod being executed The location of the next bytecode to execute: Instruction pointer (IP) The receiver (self) and arguments of the message to this method Any temporary variables A stack 9/20/2018 Copyright 2000, Georgia Tech
16
What the VM interpreter does
Fetch a bytecode from the CompiledMethod via the Instruction Pointer (IP) Increment the IP Perform the function specified 9/20/2018 Copyright 2000, Georgia Tech
17
Copyright 2000, Georgia Tech
Contexts When a message is sent, the active method execution has to be saved That’s saved as a context Context is all the state of the VM interpreter MethodContexts differ slightly from BlockContexts BlockContexts are created at runtime during execution of a method Optimizing contexts is a big part of making a VM faster 9/20/2018 Copyright 2000, Georgia Tech
18
Copyright 2000, Georgia Tech
Primitives Some CompiledMethods may just point to primitives Primitives are code inside the VM interpreter for handling things like: Arithmetic Storage management Control Input-Output 9/20/2018 Copyright 2000, Georgia Tech
19
Copyright 2000, Georgia Tech
Object Memory ObjectMemory allows manipulation of objects as unique object pointers (oops) Each oop is 32-bit If bit 1 = 1, then the next 31 bits represent a SmallInteger in 2’s-complement notation If bit 1=0, then it’s an address to an object Some Smalltalks make it an address to an ObjectTable ObjectMemory also implements garbage collection 9/20/2018 Copyright 2000, Georgia Tech
20
Copyright 2000, Georgia Tech
Object Header Each object contains 1-3 bytes of header information 3 bits for garbage collection 12 bits for hash value 5 bits for compact class index or 30 bits for direct class oop Size of object 9/20/2018 Copyright 2000, Georgia Tech
21
Copyright 2000, Georgia Tech
Garbage Collection Many mechanisms exist for GC Reference counting: Keep a count of all references to an object. When it’s zero, remove the object Problems: Cycles Every store to object memory requires incrementing/decrementing (can defer with things on stack) 9/20/2018 Copyright 2000, Georgia Tech
22
Copyright 2000, Georgia Tech
GC Part 2 Mark-sweep: Trace from a root object to everything, marking what’s accessible in the object header someplace Sweep away everything else Benefit: Memory gets compacted Downside: When an object moves, can be hard to fix it everywhere That’s why many Smalltalks use/used Object Tables The pointer to an object is the actual address of an object. If you move the object you have to update the address of that object in all other objects. 9/20/2018 Copyright 2000, Georgia Tech
23
Copyright 2000, Georgia Tech
GC, Part 3 Generation scavenging: Deal with objects in generations. Move from eden to survivor eventually to old or tenured Based on observation that new objects tend to go away quickly, while some objects stick around forever Reduces amount of memory space to mark-sweep By David Ungar, father of Self (Fastest O-O language with all-object semantics) 9/20/2018 Copyright 2000, Georgia Tech
24
VM Optimizations in Squeak
Compact classes are cached for easy access Smalltalk compactClassesArray Methods are cached since they’re often re-used Lookup of messages to methods are cached at two-levels to prevent frequent message re-look-up at: accesses an atCache first to speed Array and similar references 9/20/2018 Copyright 2000, Georgia Tech
25
Garbage Collection in Squeak
Combination of generation scavenging and mark-sweep Two-generation scavenger Normally, just does generation scavenging, but a full gc also does mark-sweep That’s why it takes two Smalltalk garbageCollect to really clear out all of memory On a 400 Mhz Powerbook, 3ms/second for scavenger, 400 ms for full gc 9/20/2018 Copyright 2000, Georgia Tech
26
What if you want to interfere with GC?
Can always fix it: ObjectMemory Can also use WeakArrays WeakArrays don’t prevent object reclamation But if the last reference is from a WeakArray, you can do something first You appoint an executor and finalize methods (see WeakRegistry) Use the WeakArray to do something like a Java finalize. 9/20/2018 Copyright 2000, Georgia Tech
27
Exploring the Squeak VM
VM Interpreter is in Interpreter Can actually run it to simulate Squeak (InterpreterSimulator new openOn: Smalltalk imageName) test Object Memory is in ObjectMemory The VM is generated using the CCodeGenerator Interpreter translate: 'interp.c' doInlining: true. 9/20/2018 Copyright 2000, Georgia Tech
28
Example: Implementation of bytecodes
duplicateTopBytecode self fetchNextBytecode. self internalPush: self internalStackTop. pushConstantMinusOneBytecode self internalPush: ConstMinusOne. 9/20/2018 Copyright 2000, Georgia Tech
29
Making Your Own Primitives
You can use the CCodeGenerator to build your own primitives You can even use it to generate stubs that you later rewrite with your own C The key challenge is dealing with the boxing and unboxing Moving between native types and objects There are issues of “glue”, conversions, etc. 9/20/2018 Copyright 2000, Georgia Tech
30
History of Primitives in Squeak
Primitives were just 1…256 Named primitives allowed for external shared libraries (DLLs) Named primitives can now be generated with SLANG (System Language - minimal Smalltalk that can be easily generated using code generator) Lots more on Squeak Swiki Smalltalk-80 allowed only 256 primitives. 9/20/2018 Copyright 2000, Georgia Tech
31
Making a Primitive, Step 1
For example: Build a primitive that returns the sum of two input integers Step 1: Define the plugin class InterpreterPlugin subclass: #FooPlugin instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'Werdna-Foostuff' You can write primitives in Smalltalk and output the C for it. 9/20/2018 Copyright 2000, Georgia Tech
32
Making a Primitive, Step 2
Step 2: Define the calling class and calling method Object subclass: #Foo instanceVariableNames: 'myInteger ' classVariableNames: '' poolDictionaries: '' category: 'Werdna-Foostuff' integerSum: firstInteger and: secondInteger "answer the sum of firstInteger and secondInteger" < primitive: 'primFooIntegerSumAnd' module: 'Foo'> ^FooPlugin doPrimitive: 'primFooIntegerSumAnd' This class will call a method of the FooPlugin class. It will first try to execute the C code in the file ‘Foo.dll’ (on windows) and if that fails will call the same method on the plugin class. This allows you to test your method in Smalltalk before you compile it to C. 9/20/2018 Copyright 2000, Georgia Tech
33
Copyright 2000, Georgia Tech
What’s going on here < primitive: 'primitiveName' module: 'moduleName'> This will always fail until the primitive is defined, so execution will fall through to the next line ^FooPlugin doPrimitive: 'primFooIntegerSumAnd’ Which will allow us to test in Squeak 9/20/2018 Copyright 2000, Georgia Tech
34
Making a Primitive, Step 3
Step 3: Define the primitive method primFooIntegerSumAnd ":firstInteger and: secondInteger" "answer sum of (int) firstInteger and (int) secondInteger" |firstInteger secondInteger| self export: true. “Public function for extern use” self inline: false. “Don’t bother inlining for speed” secondInteger := interpreterProxy stackIntegerValue: 0. firstInteger := interpreterProxy stackIntegerValue: 1. interpreterProxy pop: 3. interpreterProxy pushInteger: (firstInteger+secondInteger). This is an instance method of the FooPlugin class. 9/20/2018 Copyright 2000, Georgia Tech
35
Copyright 2000, Georgia Tech
What’s all that code? It’s close enough to Smalltalk to execute! inlining is compiling in the function rather than the call to the function. Optimization technique. InterpreterProxy fills in for the Squeak VM, which we need to manipulate the context (e.g., push/pop operations) What we’re doing is: Get the arguments Pop them and the receiver object off the stack Push back on the stack the result 9/20/2018 Copyright 2000, Georgia Tech
36
Making a Primitive, Step 4
Step 4: Test Believe it or not, it works as-is! The PlugIn code handles arguments and other issues for you inside of Squeak Foo new integerSum: 3 and: 4 9/20/2018 Copyright 2000, Georgia Tech
37
Copyright 2000, Georgia Tech
How did that work? doPrimitive: primitiveName | proxy plugin | proxy := InterpreterProxy new. proxy loadStackFrom: thisContext sender. plugin := self simulatorClass new. plugin setInterpreter: proxy. plugin perform: primitiveName asSymbol. ^proxy stackValue: 0 9/20/2018 Copyright 2000, Georgia Tech
38
Making a Primitive, Step 5
Step 5: Tell the primitive its name and compile to C moduleName "return the name of this plug-in library" ^'Foo' FooPlugin translateDoInlining: true “Generate the C code” moduleName is a method that must be defined in your FooPlugin. It returns the name of the plug-in library (.dll on windows). 9/20/2018 Copyright 2000, Georgia Tech
39
Making a Primitive, Step 6
Step 6: Compile your primitive! You’ll need support files (e.g., .h files) InterpreterSupportCode writeMacSourceFiles Generates them on the Mac 9/20/2018 Copyright 2000, Georgia Tech
40
What our primitive looks like in C
EXPORT(int) primFooIntegerSumAnd(void) { int x; int y; int _return_value; x = interpreterProxy->stackIntegerValue(1); y = interpreterProxy->stackIntegerValue(0); if (interpreterProxy->failed()) { return null;} _return_value = interpreterProxy->integerObjectOf((x + y)); interpreterProxy->popthenPush(3, _return_value); Notice that most of the code is dealing with testing for return values and converting things to and from integers. 9/20/2018 Copyright 2000, Georgia Tech
41
Copyright 2000, Georgia Tech
A Tour of the Java VM While we’re getting our hands dirty with bits, let’s talk about the Java VM How does it differ? What’s the same? 9/20/2018 Copyright 2000, Georgia Tech
42
Java VM also executes bytecodes
Like Squeak, Java VM is stack-based Java VM bytecodes are typed Java supports native data types, as well as objects So we need integer push (iload) as well as float and double (fload, dload) But most of the rest are similar Some special-purpose ones exist to optimize things like switch More Java bytecodes have operands 9/20/2018 Copyright 2000, Georgia Tech
43
Example of Java-to-bytecodes
void spin() { int i; for (i = 0; i < 100; i++) { ; // Loop body is empty } 9/20/2018 Copyright 2000, Georgia Tech
44
Copyright 2000, Georgia Tech
Bytecodes for Example 0 iconst_0 // Push int constant 0 1 istore_1 // Store into local variable 1 (i=0) 2 goto 8 // First time through don't increment 5 iinc 1 1 // Increment local variable 1 by 1 (i++) 8 iload_1 // Push local variable 1 (i) 9 bipush 100 // Push int constant 100 11 if_icmplt 5 // Compare and loop if less than (i < 100) 14 return // Return void when done 9/20/2018 Copyright 2000, Georgia Tech
45
Java VM makes threads more significant
In Java VM, stacks are associated with threads, not with method contexts (frames) in Java Frames are pushed and popped from the stack as methods come and go The same thread always keeps the same stack There are bytecodes for monitorenter and monitorexit for synchronized{} blocks Every object has its own lock (semaphore) 9/20/2018 Copyright 2000, Georgia Tech
46
Copyright 2000, Georgia Tech
Java VM’s Exceptions Exceptions are handled within the VM When an exception occurs, current method frame is searched for handler If not there, pop the current frame, and search the next one on the stack, until one is found Advantage: Fast Disadvantage: Can’t continue 9/20/2018 Copyright 2000, Georgia Tech
47
Copyright 2000, Georgia Tech
Java Security Support Java specifies very exactly a class file format That class file format must be followed for the class to be executed Java VM does some bytecode verification The VM thus guarantees a level of security 9/20/2018 Copyright 2000, Georgia Tech
48
Java’s Garbage Collection
Particular algorithm not specified in JVM spec Exactly like Squeak Java 1.1 used mark-sweep and object tables Java HotSpot VM uses generational scavenging and mark-sweep with direct pointers Default: Two generations Optionally: Three generations (“train”) for fewer gc pauses Just like Squeak 9/20/2018 Copyright 2000, Georgia Tech
49
Java’s JIT (Just In Time Compilation)
JIT compilation turns a method into a native code routine on-the-fly Native code routines are stored in a cache Hard to do well Easy to implement different semantics Not portable Can give you outstanding speed benefits But compilation itself takes time So, on some benchmarks, raw VM is better than JIT Issue: Is it still a VM anymore? Do you still get VM benefits? 9/20/2018 Copyright 2000, Georgia Tech
50
Comparing Squeak and Java VMs
Bytecodes Surprisingly similar! In fact, Java can be compiled to ST bytecodes But not vice-versa easily Java bytecodes are somewhat more complex Typed, more operands Garbage collection Latest version of Java: Dead heat 9/20/2018 Copyright 2000, Georgia Tech
51
Copyright 2000, Georgia Tech
Squeak /Java VMs, Part 2 Exceptions Squeak handles exceptions within Squeak Java handles exceptions within VM Java’s are faster Squeaks are more flexibile (e.g., notification as an exception that then continues) Security Definitely in Java’s favor 9/20/2018 Copyright 2000, Georgia Tech
52
Copyright 2000, Georgia Tech
Squeak/Java VMs, part 3 Threads/Processes With Java, they’re all in the VM Obviously, fast More overhead in the VM With Squeak, they’re done with Squeak and primitives Are they slower? Don’t know-anyone want to measure? 9/20/2018 Copyright 2000, Georgia Tech
53
Conclusion: Squeak/Java VMs
With JIT, faster More secure, better support for threads Much more complex and harder to port to different platforms Squeak Slower More flexible Easier to port 9/20/2018 Copyright 2000, Georgia Tech
54
Copyright 2000, Georgia Tech
Summary VMs make a lot of sense still VMs can ease programmer load, and make sense for modern devices VMs have common architectural pieces Bytecodes, stacks and contexts/frames, object memory management Squeak VM can be extended with primitives, also written in Squeak Java’s VM is more like Squeak’s VM than unlike it But faster, more secure, and more complex 9/20/2018 Copyright 2000, Georgia Tech
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.