Optimizing Squeak Measuring the Speed of Squeak MessageTally and TimeProfileBrowser Changes to improve speed Choose operations appropriately Choose collections to improve speed How collections work Build a primitive When building a primitive is useful/necessary Coming soon: How the VM works and how to build primitives...

MessageTally MessageTally provides a variety of tools for analyzing your code. time: - Returns the time in milliseconds that it took to do some operation MessageTally time: [ timesRepeat: [4 * 4]] "44" MessageTally time: [ timesRepeat: [4.0 * 4.0]] "80" MessageTally time: [ timesRepeat: [4 * 4.0]] "76" MessageTally time: [ timesRepeat: [4.0 * 4]] "79" Notice that floating point operations take much more time than integer.

3 What’s eating up the time?
Part of it: Floats are slower But the bigger part of it is the unpacking of Object -> Class (to figure out type) -> NativeFormat MessageTally time: [10000 timesRepeat: [ * ]] "9" MessageTally time: [10000 timesRepeat: [ * ]] "8" MessageTally time: [10000 timesRepeat: [4 * ]] "9" MessageTally time: [10000 timesRepeat: [ * 4]] "7" The multiply isn't taking the most time. Most of the time is taking up by finding the type (like float) and getting the type ready for the low-level operation. This is also called boxing and unboxing.

4 A Different Way to Look at Executing Code
At regular intervals, interrupt the executing process with a "spy" process Figure out which method it is that it executing at that moment Reports The "tree" of which methods called which other methods The percentage of time spent (over the whole tree) in each "leaf" The percentage of time won't be accurate since it doesn't track exactly when the method started and finished.

MessageTally spyOn: Does a "spy" on a process Reports percentages of time A "primitive" leaf is attributed to its method (one-level up) To trim tree, <2% is not shown, but can be added into leaves Example: MessageTally spyOn: [10000 timesRepeat: [ printString]] By "primitive" we mean a machine language routine.

- 139 tallies, 2407 msec. **Tree** 100.0 Float(Object)>>printString 74.1 Float(Number)>>printOn: |74.1 Float>>printOn:base: | Float>>absPrintOn:base: | Character class>>digitValue: | primitives | False>>| | Float(Number)>>ceiling | |12.9 Float(Number)>>floor | LimitedWriteStream(WriteStream)>>nextPut: 25.9 String class(SequenceableCollection class)>>streamContents:limitedTo: 17.3 LimitedWriteStream(WriteStream)>>contents |15.1 String(SequenceableCollection)>>copyFrom:to: | String(Object)>>species 7.2 LimitedWriteStream class(PositionableStream class)>>on: 5.8 LimitedWriteStream(WriteStream)>>on: 3.6 LimitedWriteStream(PositionableStream)>>on: The numbers are the approximate percentage of time spent in that method. The format is percentage of time in the method then the class and >> then the method.

**Leaves** 18.7 Character class>>digitValue: 18.0 False>>| 16.5 Float>>absPrintOn:base: 15.1 String(Object)>>species 12.9 Float(Number)>>floor 4.3 LimitedWriteStream(WriteStream)>>nextPut: 2.9 SmallInteger(Magnitude)>>max:

Where does the time go? Notice in that example The biggest piece of the printString execution is the conversion of each individual digit to character Character class>>digitValue: But second biggest is a logical Or. 18.0 False>>| Where is that happening? Ask Mark what is happening here.

9 Copyright 2000, Georgia Tech
TimeProfileBrowser TimeProfileBrowser does do spying, like MessageTally TimeProfileBrowser onBlock: [10000 timesRepeat: [ printString]] But it also acts as a code browser so that you can see each piece of code!

TimeProfileBrowser There is a class False which has a | "or" method which just returns the argument to the method. This is because anything or'd with a false value will depend on the thing it is or'd with.

The Problem of Spying Spying is inaccurate Run the same test several times: Different results each time! Have to run something often enough (e.g., 1000 timesRepeat:…) to catch the right methods Alternative, accurate counts with tallySends: Uses the fact that Squeak's VM is generated from a working simulation of the VM Actually simulates the VM to get perfectly accurate counts of how often each method is called. Can also be useful for debugging: It's a trace! Run the test a few times and see that you get different results. Because the spying interrupts the process and checks the method that it is in it won't always return the same results.

12 MessageTally tallySends: [3.14159 printString]
This simulation took 0.0 seconds. **Tree** 2 Float(Object)>>printString 1 Float(Number)>>printOn: |1 Float>>printOn:base: | 1 Float>>absPrintOn:base: | |7 SmallInteger>>* | | |7 SmallInteger(Integer)>>* | | | 7 Float>>adaptToInteger:andSend: | |7 LimitedWriteStream(WriteStream)>>nextPut: | |6 Character class>>digitValue: The numbers show the number of times the method was called.

13 Measuring Squeak’s Speed
Now that we have tools for measuring Squeak, let's start figuring out what's slow and what's fast. What's fast: Integer arithmetic is faster than floating point (expected) Special messages, coded into the bytecode + - > < at: at:put: bitOr: bitAnd: class = == new value do: size For each of the special messages above there is a translation into a single bytecode.

The VM and Bytecodes The VM (e.g., squeak.exe) interprets bytecodes Bytecodes are the machine language of a virtual machine The "VM" is, strictly speaking, a "VM simulator" or "interpreter" You can see bytecodes for a method by doing "show bytecodes" from code pane Click on the "Source" button and choose "bytecodes" in the menu to see the bytecodes for the method.

15 Special Messages are fast lookups
Special messages, like +, actually map to a single bytecode One memory access, no lookup Non-special messages involve passing a pointer to a memory location where the message selector is stored

16 Copyright 2000, Georgia Tech
A Word on Primitives Primitives are the bottommost layer of the method hierarchy They are not defined in terms of bytecodes, but in terms of the native code Think of them as subroutine calls into the VM You can make up your own primitives! In latest versions of Squeak, they can even be dynamically loaded

17 But much of speed is Squeak-level choices
Integers vs. floats, Squeak-code vs. primitives are low-level VM decisions Most of what determines fast or slow code is at the level of your Squeak code Choices in collections Algorithm coding

18 Brief review of Collections
Dictionary: Takes a key and a value, e.g., aDict at: 'dog' put: 'Rufus'. Array: Just like any language OrderedCollection: Like a Java vector Bag: You can add to it, and it remembers the number of identical elements Set: You can add to it, and it remembers only the element Look at the documentation for Bag in Squeak. It stores each different object but if there are several identical elements it only stores one of the identical and the count of how many there are.

Speed of Adding Dictionaries are the most general indexed collection, but they're also slow to add to. d := Dictionary new. MessageTally time: [1 to: do: [:i | d at: i put: i]]. "152" a := Array new: MessageTally time: [1 to: do: [:i | a at: i put: i]]. "2" Arrays are much faster for inserting than Dictionaries.

20 OrderedCollections are only slow to grow
oc := OrderedCollection new: MessageTally time: [1 to: do: [:i | oc add: i]]. "17" MessageTally time: [1 to: do: [:i | oc at: i put: i]]. "11" Once an OrderedCollection is the right size, at:put: is within six times the speed of an Array (2 ms from previous slide) It's slower because Array's at:put: is a primitive, while OC's checks bounds first

21 Why are OrderedCollections slow to grow?
add: newObject ^self addLast: newObject addLast: newObject "Add newObject to the end of the receiver. Answer newObject." lastIndex = array size ifTrue: [self makeRoomAtLast]. lastIndex := lastIndex + 1. array at: lastIndex put: newObject. ^ newObject "makeRoomAtLast calls self grow…"

22 OrderedCollections double in size on each grow!
"Become larger. Typically, a subclass has to override this if the subclass adds instance variables." | newArray | newArray := Array new: self size + self growSize. newArray replaceFrom: 1 to: array size with: array startingAt: 1. array:= newArray growSize ^ array size max: 2 "returns the maximum of the array size or 2" The growSize method return the maximum of the size of the array or 2. In grow the array will usually double in size.

Is that bad? Think about the average case of adding to an OrderedCollection Most of the time it won't need to grow Doubling in size means that you'll not do it very often! You're trading off space for time, a classic tradeoff

Speed of Access MessageTally time: [1 to: do: [:i | d at: i]]. "Dictionary: 60" MessageTally time: [1 to: do: [:i | a at: i]]. "Array: 2" MessageTally time: [1 to: do: [:i | oc at: i]]. "OrderedCollection: 9" For iteration Arrays are the fastest with OrderedCollections second and Dictionaries last.

25 SortedCollections are great but slow
SortedCollections keep their components sorted, but that's a cost (note that the below are a magnitude less than previous) sc := SortedCollection new. MessageTally time: [1 to: 1000 do: [:i | sc add: i]]. "12" MessageTally time: [1 to: 1000 do: [:i | d at: i]]. "4" SortedCollections are slower to add to than dictionaries.

26 Adding to Non-Sequenced Collections
o := OrderedCollection new. MessageTally time: [1 to: do: [:i | o add: i]]. "14" s := Set new. MessageTally time: [1 to: do: [:i | s add: i]]. "113" b := Bag new. MessageTally time: [1 to: do: [:i | b add: i]]. "265" OrderedCollections are the fastest with Sets being much slower and Bags even slower.

Let's find an element! MessageTally time: [10 timesRepeat: [o detect: [:n | n >= 5000]]]. "45" MessageTally time: [10 timesRepeat: [s detect: [:n | n >= 5000]]]. "48" MessageTally time: [10 timesRepeat: [b detect: [:n | n >= 5000]]]. "256" Bags looks unbearably slow! Why would you ever use one? Detect will walk through the entire contents and return just the items that match the conditional.

28 Iteration is the wrong way to find an element!
MessageTally time: [100 timesRepeat: [o includes: 5000]]. "444" MessageTally time: [100 timesRepeat: [s includes: 5000]]. "0" MessageTally time: [100 timesRepeat: [b includes: 5000]]. "0" Sets and bags are great for finding items.

29 How are Bags so fast? Dictionaries!
Bags are so fast because their implementation is actually a Dictionary (a hashtable)! Dictionaries are not slow! They're slow if you use them as arrays, and they're slow to iterate across But for finding a specific element, they are blindingly fast!

30 Implementation of Bags
Bags have one instance variable, a Dictionary named contents add: newObject ^self add: newObject withOccurrences: 1 add: newObject withOccurrences: anInteger "Add the element newObject to the receiver. Do so as though the element were added anInteger number of times. Answer newObject." contents at: newObject put: (contents at: newObject ifAbsent: [0]) + anInteger. ^ newObject

31 Dictionaries are key to fast lookups
Dictionaries are used heavily in Squeak E.g., Smalltalk is a kind of Dictionary Everything in Smalltalk (or Squeak) knows its own hash Hash functions need to be Fast Unique for unique objects Captures how objects differ in actual practice

32 Some Sample Hash Functions
"Integer" hash ^(self lastDigit bitShift: 8) + (self digitAt: 1) "Float" hash "Both words of the float are used; 8 bits are removed from each end to clear most of the exponent regardless of the byte ordering. (The bitAnd:'s ensure that the intermediate results do not become a large integer.) Slower than the original version in the ratios 12:5 to 2:1 depending on values. (DNS, 11 May, 1997)" ^ (((self basicAt: 1) bitAnd: 16r00FFFF00) + ((self basicAt: 2) bitAnd: 16r00FFFF00)) bitShift: -8

More hash functions “Character” hash ^value “Point” hash ^(x hash bitShift: 2) bitXor: y hash “String” hash | l m | (l _ m _ self size) <= 2 ifTrue: [l = 2 ifTrue: [m _ 3] ifFalse: [l = 1 ifTrue: [^((self at: 1) asciiValue bitAnd: 127) * 106]. ^21845]]. ^(self at: 1) asciiValue * 48 + ((self at: (m - 1)) asciiValue + l) 2/19/2019 Copyright 2000, Georgia Tech

Summary Lots of ways to time/trace in Squeak MessageTally and TimeProfileBrowser Making things fast in Squeak Choose data types wisely Use primitives Code wisely Arrays vs. hashing - for iteration, arrays; for finding, hashing 2/19/2019 Copyright 2000, Georgia Tech

