Copyright 2000, Georgia Tech

Slides:



Advertisements
Similar presentations
© Oscar Nierstrasz ST — Introduction ST 1.1 A Word about Primitives  For optimization, if a primitive fails, the code following is executed. 
Advertisements

1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
ITEC 352 Lecture 11 ISA - CPU. ISA (2) Review Questions? HW 2 due on Friday ISA –Machine language –Buses –Memory.
CIS Computer Programming Logic
IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.
Netprog: Java Intro1 Crash Course in Java. Netprog: Java Intro2 Why Java? Network Programming in Java is very different than in C/C++ –much more language.
Georgia Institute of Technology Speed part 4 Barb Ericson Georgia Institute of Technology May 2006.
1 Advanced Object Oriented Systems (CM0318) Lecture 8 (Last updated 15th February 2002)
CSC 108H: Introduction to Computer Programming Summer 2011 Marek Janicki.
Smalltalk Implementation Harry Porter, October 2009 Smalltalk Implementation: Optimization Techniques Prof. Harry Porter Portland State University 1.
CMPT 438 Algorithms.
Chapter 13 Recursion Copyright © 2016 Pearson, Inc. All rights reserved.
Using the Java Collection Libraries COMP 103 # T2
Computer Organization and Design Pointers, Arrays and Strings in C
CS/COE 0447 (term 2181) Jarrett Billingsley
CHP - 9 File Structures.
CMSC201 Computer Science I for Majors Lecture 22 – Binary (and More)
Compiler Construction (CS-636)
Containers and Lists CIS 40 – Introduction to Programming in Python
Modern Collections Classes
Programming Mehdi Bukhari.
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Introduction to Algorithms
Introduction to Algorithms
Programming Language Concepts (CIS 635)
Searching.
While Loops BIS1523 – Lecture 12.
Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes,
Object Oriented Programming (OOP) LAB # 8
Object Oriented Programming COP3330 / CGS5409
Feedback from Assignment 1
OOP Paradigms There are four main aspects of Object-Orientated Programming Inheritance Polymorphism Abstraction Encapsulation We’ve seen Encapsulation.
Lesson Objectives Aims
Fundamentals of Programming
Winter 2018 CISC101 12/1/2018 CISC101 Reminders
Introduction to Java, and DrJava part 1
Winter 2018 CISC101 12/2/2018 CISC101 Reminders
Linked Lists.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Chapter 11 Introduction to Programming in C
Sets, Maps and Hash Tables
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
Variables Title slide variables.
Modern Collections Classes
Coding Concepts (Data- Types)
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Introduction to Java, and DrJava
Chapter 11 Introduction to Programming in C
RECURSION Haskell.
Chapter 4 Unordered List.
Mastering Memory Modes
Introduction to Data Structure
How to use hash tables to solve olympiad problems
Data Structures Introduction
Hash Tables Buckets/Chaining
Data Structures & Algorithms
Course Overview PART I: overview material PART II: inside a compiler
slides created by Ethan Apter
Introduction to Java, and DrJava
Hashing.
Introduction to Java, and DrJava part 1
Chapter 13 Recursion Copyright © 2010 Pearson Addison-Wesley. All rights reserved.
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
CSE 3302 Programming Languages
CSE 326: Data Structures Lecture #14
Week 6 - Monday CS221.
Chapter 1: Creating a Program.
Lecture Set 9 Arrays, Collections, and Repetition
Presentation transcript:

Copyright 2000, Georgia Tech Optimizing Squeak Measuring the Speed of Squeak MessageTally and TimeProfileBrowser Changes to improve speed Choose operations appropriately Choose collections to improve speed How collections work Build a primitive When building a primitive is useful/necessary Coming soon: How the VM works and how to build primitives... 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech MessageTally MessageTally provides a variety of tools for analyzing your code. time: - Returns the time in milliseconds that it took to do some operation MessageTally time: [100000 timesRepeat: [4 * 4]] “44” MessageTally time: [100000 timesRepeat: [4.0 * 4.0]] “80” MessageTally time: [100000 timesRepeat: [4 * 4.0]] “76” MessageTally time: [100000 timesRepeat: [4.0 * 4]] “79” Notice that floating point operations take much more time than integer. 2/19/2019 Copyright 2000, Georgia Tech

What’s eating up the time? Part of it: Floats are slower But the bigger part of it is the unpacking of Object -> Class (to figure out type) -> NativeFormat MessageTally time: [10000 timesRepeat: [432432432 * 4324324324.0]] “9” MessageTally time: [10000 timesRepeat: [4324324324.0 * 432432432]] “8” MessageTally time: [10000 timesRepeat: [4 * 4324324324.0]] “9” MessageTally time: [10000 timesRepeat: [4324324324.0 * 4]] “7” The multiply isn’t taking the most time. Most of the time is taking up by finding the type (like float) and getting the type ready for the low-level operation. This is also called boxing and unboxing. 2/19/2019 Copyright 2000, Georgia Tech

A Different Way to Look at Executing Code At regular intervals, interrupt the executing process with a “spy” process Figure out which method it is that it executing at that moment Reports The “tree” of which methods called which other methods The percentage of time spent (over the whole tree) in each “leaf” The percentage of time won’t be accurate since it doesn’t track exactly when the method started and finished. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech MessageTally spyOn: Does a “spy” on a process Reports percentages of time A “primitive” leaf is attributed to its method (one-level up) To trim tree, <2% is not shown, but can be added into leaves Example: MessageTally spyOn: [10000 timesRepeat: [3.14159 printString]] By “primitive” we mean a machine language routine. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech - 139 tallies, 2407 msec. **Tree** 100.0 Float(Object)>>printString 74.1 Float(Number)>>printOn: |74.1 Float>>printOn:base: | 74.1 Float>>absPrintOn:base: | 18.7 Character class>>digitValue: | 16.5 primitives | 16.5 False>>| | 15.1 Float(Number)>>ceiling | |12.9 Float(Number)>>floor | 4.3 LimitedWriteStream(WriteStream)>>nextPut: 25.9 String class(SequenceableCollection class)>>streamContents:limitedTo: 17.3 LimitedWriteStream(WriteStream)>>contents |15.1 String(SequenceableCollection)>>copyFrom:to: | 15.1 String(Object)>>species 7.2 LimitedWriteStream class(PositionableStream class)>>on: 5.8 LimitedWriteStream(WriteStream)>>on: 3.6 LimitedWriteStream(PositionableStream)>>on: The numbers are the approximate percentage of time spent in that method. The format is percentage of time in the method then the class and >> then the method. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech **Leaves** 18.7 Character class>>digitValue: 18.0 False>>| 16.5 Float>>absPrintOn:base: 15.1 String(Object)>>species 12.9 Float(Number)>>floor 4.3 LimitedWriteStream(WriteStream)>>nextPut: 2.9 SmallInteger(Magnitude)>>max: 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech Where does the time go? Notice in that example The biggest piece of the printString execution is the conversion of each individual digit to character Character class>>digitValue: But second biggest is a logical Or. 18.0 False>>| Where is that happening? Ask Mark what is happening here. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech TimeProfileBrowser TimeProfileBrowser does do spying, like MessageTally TimeProfileBrowser onBlock: [10000 timesRepeat: [3.14159 printString]] But it also acts as a code browser so that you can see each piece of code! 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech TimeProfileBrowser There is a class False which has a | “or” method which just returns the argument to the method. This is because anything or’d with a false value will depend on the thing it is or’d with. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech The Problem of Spying Spying is inaccurate Run the same test several times: Different results each time! Have to run something often enough (e.g., 1000 timesRepeat:…) to catch the right methods Alternative, accurate counts with tallySends: Uses the fact that Squeak’s VM is generated from a working simulation of the VM Actually simulates the VM to get perfectly accurate counts of how often each method is called. Can also be useful for debugging: It’s a trace! Run the test a few times and see that you get different results. Because the spying interrupts the process and checks the method that it is in it won’t always return the same results. 2/19/2019 Copyright 2000, Georgia Tech

MessageTally tallySends: [3.14159 printString] This simulation took 0.0 seconds. **Tree** 2 Float(Object)>>printString 1 Float(Number)>>printOn: |1 Float>>printOn:base: | 1 Float>>absPrintOn:base: | |7 SmallInteger>>* | | |7 SmallInteger(Integer)>>* | | | 7 Float>>adaptToInteger:andSend: | |7 LimitedWriteStream(WriteStream)>>nextPut: | |6 Character class>>digitValue: The numbers show the number of times the method was called. 2/19/2019 Copyright 2000, Georgia Tech

Measuring Squeak’s Speed Now that we have tools for measuring Squeak, let’s start figuring out what’s slow and what’s fast. What’s fast: Integer arithmetic is faster than floating point (expected) Special messages, coded into the bytecode + - > < at: at:put: bitOr: bitAnd: class = == new value do: size For each of the special messages above there is a translation into a single bytecode. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech The VM and Bytecodes The VM (e.g., squeak.exe) interprets bytecodes Bytecodes are the machine language of a virtual machine The “VM” is, strictly speaking, a “VM simulator” or “interpreter” You can see bytecodes for a method by doing “show bytecodes” from code pane Click on the “Source” button and choose “bytecodes” in the menu to see the bytecodes for the method. 2/19/2019 Copyright 2000, Georgia Tech

Special Messages are fast lookups Special messages, like +, actually map to a single bytecode One memory access, no lookup Non-special messages involve passing a pointer to a memory location where the message selector is stored 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech A Word on Primitives Primitives are the bottommost layer of the method hierarchy They are not defined in terms of bytecodes, but in terms of the native code Think of them as subroutine calls into the VM You can make up your own primitives! In latest versions of Squeak, they can even be dynamically loaded 2/19/2019 Copyright 2000, Georgia Tech

But much of speed is Squeak-level choices Integers vs. floats, Squeak-code vs. primitives are low-level VM decisions Most of what determines fast or slow code is at the level of your Squeak code Choices in collections Algorithm coding 2/19/2019 Copyright 2000, Georgia Tech

Brief review of Collections Dictionary: Takes a key and a value, e.g., aDict at: ‘dog’ put: ‘Rufus’. Array: Just like any language OrderedCollection: Like a Java vector Bag: You can add to it, and it remembers the number of identical elements Set: You can add to it, and it remembers only the element Look at the documentation for Bag in Squeak. It stores each different object but if there are several identical elements it only stores one of the identical and the count of how many there are. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech Speed of Adding Dictionaries are the most general indexed collection, but they’re also slow to add to. d := Dictionary new. MessageTally time: [1 to: 10000 do: [:i | d at: i put: i]]. “152” a := Array new: 10000. MessageTally time: [1 to: 10000 do: [:i | a at: i put: i]]. “2” Arrays are much faster for inserting than Dictionaries. 2/19/2019 Copyright 2000, Georgia Tech

OrderedCollections are only slow to grow oc := OrderedCollection new: 10000. MessageTally time: [1 to: 10000 do: [:i | oc add: i]]. “17” MessageTally time: [1 to: 10000 do: [:i | oc at: i put: i]]. “11” Once an OrderedCollection is the right size, at:put: is within six times the speed of an Array (2 ms from previous slide) It’s slower because Array’s at:put: is a primitive, while OC’s checks bounds first 2/19/2019 Copyright 2000, Georgia Tech

Why are OrderedCollections slow to grow? add: newObject ^self addLast: newObject addLast: newObject "Add newObject to the end of the receiver. Answer newObject." lastIndex = array size ifTrue: [self makeRoomAtLast]. lastIndex := lastIndex + 1. array at: lastIndex put: newObject. ^ newObject “makeRoomAtLast calls self grow…” 2/19/2019 Copyright 2000, Georgia Tech

OrderedCollections double in size on each grow! "Become larger. Typically, a subclass has to override this if the subclass adds instance variables." | newArray | newArray := Array new: self size + self growSize. newArray replaceFrom: 1 to: array size with: array startingAt: 1. array:= newArray growSize ^ array size max: 2 “returns the maximum of the array size or 2” The growSize method return the maximum of the size of the array or 2. In grow the array will usually double in size. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech Is that bad? Think about the average case of adding to an OrderedCollection Most of the time it won’t need to grow Doubling in size means that you’ll not do it very often! You’re trading off space for time, a classic tradeoff 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech Speed of Access MessageTally time: [1 to: 10000 do: [:i | d at: i]]. “Dictionary: 60” MessageTally time: [1 to: 10000 do: [:i | a at: i]]. “Array: 2” MessageTally time: [1 to: 10000 do: [:i | oc at: i]]. “OrderedCollection: 9” For iteration Arrays are the fastest with OrderedCollections second and Dictionaries last. 2/19/2019 Copyright 2000, Georgia Tech

SortedCollections are great but slow SortedCollections keep their components sorted, but that’s a cost (note that the below are a magnitude less than previous) sc := SortedCollection new. MessageTally time: [1 to: 1000 do: [:i | sc add: i]]. “12” MessageTally time: [1 to: 1000 do: [:i | d at: i]]. “4” SortedCollections are slower to add to than dictionaries. 2/19/2019 Copyright 2000, Georgia Tech

Adding to Non-Sequenced Collections o := OrderedCollection new. MessageTally time: [1 to: 10000 do: [:i | o add: i]]. “14” s := Set new. MessageTally time: [1 to: 10000 do: [:i | s add: i]]. “113” b := Bag new. MessageTally time: [1 to: 10000 do: [:i | b add: i]]. “265” OrderedCollections are the fastest with Sets being much slower and Bags even slower. 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech Let’s find an element! MessageTally time: [10 timesRepeat: [o detect: [:n | n >= 5000]]]. “45” MessageTally time: [10 timesRepeat: [s detect: [:n | n >= 5000]]]. “48” MessageTally time: [10 timesRepeat: [b detect: [:n | n >= 5000]]]. “256” Bags looks unbearably slow! Why would you ever use one? Detect will walk through the entire contents and return just the items that match the conditional. 2/19/2019 Copyright 2000, Georgia Tech

Iteration is the wrong way to find an element! MessageTally time: [100 timesRepeat: [o includes: 5000]]. “444” MessageTally time: [100 timesRepeat: [s includes: 5000]]. “0” MessageTally time: [100 timesRepeat: [b includes: 5000]]. “0” Sets and bags are great for finding items. 2/19/2019 Copyright 2000, Georgia Tech

How are Bags so fast? Dictionaries! Bags are so fast because their implementation is actually a Dictionary (a hashtable)! Dictionaries are not slow! They’re slow if you use them as arrays, and they’re slow to iterate across But for finding a specific element, they are blindingly fast! 2/19/2019 Copyright 2000, Georgia Tech

Implementation of Bags Bags have one instance variable, a Dictionary named contents add: newObject ^self add: newObject withOccurrences: 1 add: newObject withOccurrences: anInteger "Add the element newObject to the receiver. Do so as though the element were added anInteger number of times. Answer newObject." contents at: newObject put: (contents at: newObject ifAbsent: [0]) + anInteger. ^ newObject 2/19/2019 Copyright 2000, Georgia Tech

Dictionaries are key to fast lookups Dictionaries are used heavily in Squeak E.g., Smalltalk is a kind of Dictionary Everything in Smalltalk (or Squeak) knows its own hash Hash functions need to be Fast Unique for unique objects Captures how objects differ in actual practice 2/19/2019 Copyright 2000, Georgia Tech

Some Sample Hash Functions “Integer” hash ^(self lastDigit bitShift: 8) + (self digitAt: 1) “Float” hash "Both words of the float are used; 8 bits are removed from each end to clear most of the exponent regardless of the byte ordering. (The bitAnd:'s ensure that the intermediate results do not become a large integer.) Slower than the original version in the ratios 12:5 to 2:1 depending on values. (DNS, 11 May, 1997)" ^ (((self basicAt: 1) bitAnd: 16r00FFFF00) + ((self basicAt: 2) bitAnd: 16r00FFFF00)) bitShift: -8 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech More hash functions “Character” hash ^value “Point” hash ^(x hash bitShift: 2) bitXor: y hash “String” hash | l m | (l _ m _ self size) <= 2 ifTrue: [l = 2 ifTrue: [m _ 3] ifFalse: [l = 1 ifTrue: [^((self at: 1) asciiValue bitAnd: 127) * 106]. ^21845]]. ^(self at: 1) asciiValue * 48 + ((self at: (m - 1)) asciiValue + l) 2/19/2019 Copyright 2000, Georgia Tech

Copyright 2000, Georgia Tech Summary Lots of ways to time/trace in Squeak MessageTally and TimeProfileBrowser Making things fast in Squeak Choose data types wisely Use primitives Code wisely Arrays vs. hashing - for iteration, arrays; for finding, hashing 2/19/2019 Copyright 2000, Georgia Tech