APL Optimization Techniques Eugene Ying Senior Software Developer Fiserv, Inc. September 14, 2012 1.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Review pages for the AP CS Exam

Chapter 4 Loops Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved

EP & Boundary Questions

Welcome to Who Wants to be a Millionaire

Software Re-engineering

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,

Chapter 5 Analog Transmission.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

Introduction to C Programming

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

So far Binary numbers Logic gates Digital circuits process data using gates – Half and full adder Data storage – Electronic memory – Magnetic memory –

Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.

Who Wants To Be A Millionaire? Decimal Edition Question 1.

Patterns and sequences We often need to spot a pattern in order to predict what will happen next. In maths, the correct name for a pattern of numbers is.

Lecture 5 Types, Expressions and Simple I/O COMP1681 / SE15 Introduction to Programming.

Welcome to Who Wants to be a Millionaire

£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.

Welcome to Who Wants to be a Millionaire

Welcome to Who Wants to be a Millionaire

Gruppo di Misure Meccaniche e Termiche UNIBS - DIMI A. File Formats At their lowest level, all files written to your computers hard drive are a series.

Haas MFE SAS Workshop Lecture 3:

Copyright 2012, 2008, 2004, 2000 Pearson Education, Inc.

Chapter 7: Arrays In this chapter, you will learn about

Procedural Programming in C# Chapters Objectives You will be able to: Describe the most important data types available in C#. Read numeric values.

Module 10: Virtual Memory

Chapter 10: Virtual Memory

Copyright © Cengage Learning. All rights reserved. OPTIMIZING LOT SIZE AND HARVEST SIZE 3.5.

© S Haughton more than 3?

1 Chapter Eleven Arrays. 2 A Motivating Example main( ) { int n0, n1, n2, n3, n4; scanf(“%d”, &n0); scanf(“%d”, &n1); scanf(“%d”, &n2); scanf(“%d”, &n3);

Backup Slides. An Example of Hash Function Implementation struct MyStruct { string str; string item; };

1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 1 Introduction to Networking.

Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.

Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 4 Loops.

Chapter 5 Test Review Sections 5-1 through 5-4.

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

1 of 31 Images from Africa. 2 of 31 My little Haitian friend Antoine (1985)

1 of 32 Images from Africa. 2 of 32 My little Haitian friend Antoine (1985)

CSci 1130 Intro to Programming in Java

Addition 1’s to 20.

25 seconds left…...

Test B, 100 Subtraction Facts

We will resume in: 25 Minutes.

Pointers and Arrays Chapter 12

A SMALL TRUTH TO MAKE LIFE 100%

1 Unit 1 Kinematics Chapter 1 Day

“Reading Measurement Scales”. MNI = marked number interval AMI = adjacent number interval.

Chapter 30 Induction and Inductance In this chapter we will study the following topics: -Faraday’s law of induction -Lenz’s rule -Electric field induced.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Introduction to Programming G51PRG University of Nottingham Revision 1

Today’s lecture Review of Chapter 1 Go over homework exercises for chapter 1.

9-1 COBOL for the 21 st Century Nancy Stern Hofstra University Robert A. Stern Nassau Community College James P. Ley University of Wisconsin-Stout (Emeritus)

L6:CSC © Dr. Basheer M. Nasef Lecture #6 By Dr. Basheer M. Nasef.

Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 3 Loops.

1 © 2009 Fiserv. All Rights Reserved. Optimizing APL Matrix Indexing for Application Programmers Eugene Ying Software Development Aug 8, 2011.

Presentation transcript:

APL Optimization Techniques Eugene Ying Senior Software Developer Fiserv, Inc. September 14,

Topics Component File Fragmentation The Match Function The Inner Product Storing Numbers in a Native File The Outer Product File I/O Optimization CPU Optimization 2

A Component File where each Component Contains 100 Rows of Data Updating component 2 with 150 rows of data comp 2 file is fragmented Updating component 2 with 50 rows of data 3

Suppose your data will not have more than 500 rows of data. To minimize the chance of fragmentation, you allocate 500 rows of data for each component. Initializing a Component File (500 10⍴' ')⎕FAPPEND TIE ⍝ Component 1 (500 4⍴0)⎕FAPPEND TIE ⍝ Component 2 (500 20⍴' ')⎕FAPPEND TIE ⍝ Component 3 (500 5⍴0)⎕FAPPEND TIE ⍝ Component 4 (500 15⍴' ')⎕FAPPEND TIE ⍝ Component 5 4

Initializing a Component File Intended Initialization Actual Initialization comp 1comp 2 comp 1 comp 3 comp 4 comp 5 comp 3comp 2 comp 4 characters numbers Numeric Components are greatly under-allocated in size 5

Storage Sizes of APL Numbers BOOLEAN←1000⍴1 ⎕SIZE 'BOOLEAN' 144 INTEGER1←1000⍴2 ⎕SIZE 'INTEGER1' 1016 INTEGER2←1000⍴128 ⎕SIZE 'INTEGER2 ' 2016 INTEGER4←1000⍴32768 ⎕SIZE 'INTEGER4' 4016 FLOAT8←1000⍴0.1 ⎕SIZE 'FLOAT8'

The Default APL Number 0 X←1000⍴0 ⎕SIZE 'X' 144 X←1000⍴ ⎕SIZE 'X' 144 X←1000⍴0×0.1 ⎕SIZE 'X' 144 X←1000↑0⍴0.1 ⎕SIZE 'X' 144 X←0×1000⍴0.1 ⎕SIZE 'X' 144 7

F64_0←1⊃ ⎕DR 1000⍴0 ⍝ Floating pt # 0 ⎕SIZE 'F64_0' 8016 B32_999←1⊃ ⎕DR 1000⍴999 ⍝ Binary-32 # 999 ⎕SIZE 'B32_999' 4032 B16_2←1⊃ ⎕DR 1000⍴2 ⍝ Binary-16 # 2 ⎕SIZE 'B16_2' 2032 B8_0←1⊃11 83 ⎕DR 1000⍴0 ⍝ Binary-8 # 0 ⎕SIZE 'B8_0' 1016 How Do You Create A Vector of Integer Zeros or A Vector of Floating Point Zeros? 8

Declaring Numbers Using a Defined Function to Preserve Numeric Type F64←64 DCL 1000⍴0 ⍝ Floating pt # 0 ⎕SIZE 'F64' 8016 I32←32 DCL 1000⍴999 ⍝ Binary-32 # 999 ⎕SIZE 'I32' 4032 I16←16 DCL 1000⍴2 ⍝ Binary-16 # 2 ⎕SIZE 'I16' 2032 I8←8 DCL 1000⍴0 ⍝ Binary-8 # 0 ⎕SIZE 'I8'

The DCL (Declare) Function [0] Z←X DCL Y;D;R [1] ⍝ Declare a floating point or integer array so that each [2] ⍝ item occupies the number of bits requested by the X argument [3] ⍝ X: # of bits that each number in the array will occupy [4] ⍝ 8 for 8-bit (1-byte) integer (¯128 to 127) [5] ⍝ 16 for 16-bit (2-byte) integer (¯32768 to 32767) [6] ⍝ 32 for 32-bit (4-byte) integer (¯ to ) [7] ⍝ 64 for 64-bit (8-byte) floating point # [8] ⍝ Y: Numeric array declared [9] ⍝ Z: Numeric array that occupies the space you requested [10] [11] D←⎕DR Y ⍝ Current data type of Y [12] :Select ⍬⍴X [13] :Case 8 ⋄ R←83 [14] :Case 16 ⋄ R←163 [15] :Case 32 ⋄ R←323 [16] :Case 64 ⋄ R←645 [17] :Else ⋄ ∘ ⍝ Stop if requested data type not supported [18] :EndSelect [19] →(D>R)↑'∘' ⍝ Stop if numeric overflow [20] Z←1⊃(D,R)⎕DR Y ⍝ Convert to requested data type 10

For more accurate initialization: Initialization as Intended (500 10⍴' ')⎕FAPPEND TIE ⍝ Component 1 (64 DCL 500 4⍴0)⎕FAPPEND TIE⍝ Component 2 (500 20⍴' ')⎕FAPPEND TIE ⍝ Component 3 (32 DCL 500 5⍴0)⎕FAPPEND TIE ⍝ Component 4 (500 15⍴' ')⎕FAPPEND TIE ⍝ Component 5 11

Changing the Floating Point 0 Z1000←64 DCL 1000⍴0⍝ 1,000 Floating pt 0 ⎕SIZE 'Z1000' 8016 Z2000←2000↑Z1000⍝ 2,000 Floating pt 0 ⎕SIZE 'Z2000' 268 Z2000←64 DCL 2000⍴0⍝ 2,000 Floating pt 0 ⎕SIZE 'Z2000'

The internal representation of the result R←X ⎕ DR Y is guaranteed to remain unmodified until it is re-assigned (or partially re-assigned) with the result of any function (ref: Dyalog Apl Reference Manual Chapter 6) Precaution Do not change a Declared array and then re-use it. If you need another similar array but of different dimensions, you should declare the new one from scratch. Reason: 13

Storing Numbers in a Native File 14

Blanks and commas are the most frequently used separators for numbers stored in a text file. Index Generator is also frequently used. N1←' ' Storing Numbers as Characters N3←'40000+⍳2' N2←'40001,40002' :For I :In ⍳10000 X←⍎N1 Y←⍎N2 Z←⍎N3 :EndFor ⍝ Elapsed time = 72 ms ⍝ Elapsed time = 89 ms ⍝ Elapsed time = 94 ms The character strings are executed to retrieve the numbers 15

:For I :In ⍳100 X←⍎N1 Y←⍎N2 Z←⍎N3 :EndFor ⍝ Run Time 96 ms ⍝ Run Time 661 ms Storing 1,000 Numbers as Characters ⍝ Run Time 504 ms N1←⍕N N2←N1 ((N2=' ')/N2)←',' N3←¯1↓,'(',(⍕⍪¯1+(1000⍴1 0)/N),500 5⍴'+⍳2),' N←4000+(1500⍴1 1 0)/⍳1500 ⍝ (4000+⍳2),(4003+⍳2),... Comma separated Index generated ⍝ 4001,4002,4004,4005,... comma separated ⍝ space separated 16

Space Wasted by Trailing Blanks Character Matrix with 2 records Record 1 can be compressed a little bit by the Index Generator so that record 2 has less trailing blanks But in a nested vector, record 2 naturally has no trailing blanks ( ⍳ 2),29106,( ⍳ 2),

File I/O Optimization Suggestions Use the DCL function to Declare arrays to initialize the numeric components of a component file, otherwise the numeric components are under- allocated in size and the component file becomes fragmented too quickly. To store purely numeric data in a native file, do not use commas to separate the numbers, even though CSV format is very popular, because APL commas are being executed as primitive functions. 18

Outer Product 19

Replacing Outer Product by Indexing Y←⍳32000 :For I :In ⍳5 L←1≠+/Y∘.=Y M←Y∊((⍳⍴Y)≠Y⍳Y)/Y :EndFor ⎕WA X←1≠+/D∘.=D←⍳33000 LIMIT ERROR ⎕WA ⍝ 10,000 times smaller WS X←D∊((⍳⍴D)≠D⍳D)/D←⍳33000 ⍝ No LIMIT ERROR ⍝ 1,000 times faster ⍝ ms ⍝ 20 ms 20

Replacing Outer Product by Simple Logic M←100000↑50000⍴⍳13 :For I :In ⍳1000 L←1≠×/×M∘ N←(M≥1)^M≤12 :EndFor M←100000↑50000⍴⍳13 ⎕WA L←1≠×/×M∘ WS FULL ⎕WA L←(M≥1)^M≤12 ⍝ 40 times smaller WS ⍝ No WS FULL ⍝ 9210 ms ⍝ 813 ms ⍝ 10 times faster 21

Replacing Outer Product by a Loop :For J :In ⍳10 X←+/((⍳⍴A)∘.≥⍳⍴A)^A∘.<B Y←⍬ :For I :In ⍳⍴B Y,←+/A[I]<I↑B :EndFor ⎕WA X←+/((⍳⍴A)∘.≥⍳⍴A)^A∘.<B LIMIT ERROR ⎕WA X←⍬ :For I :In ⍳⍴B X,←+/A[I]<I↑B :EndFor ⍝ 3 times faster ⍝ 5,000 times smaller workspace A←32800?32800 B← ?32800 ⍝ No LIMIT ERROR ⍝ ms ⍝ ms 22

Inner Product 23

Matrix on the (wrong) Side of the Expression Requiring a Matrix Transpose 'ABC'^.=⍉((1↑⍴D),3)↑D (((1↑⍴D),3)↑D)^.='ABC' ⍝ Transpose needed ⍝ Transpose not needed 24 “one less pair of parentheses”

Transposed Inner Product VECTOR^.=⍉MATRIX Y← ⍴⎕A :For I :In ⍳10000 L←'EFGHIJ'^.=⍉Y M←Y^.='EFGHIJ' :EndFor MATRIX^.=VECTOR ⍝ ms ⍝ 2302 ms 25 vs

Array Comparisons 26

Comparing Array Contents with a scalar ^/M^.=' ' or ^/^/M=' ' or M≡(⍴M)⍴' ' M← ⍴⎕AV 27

Character Comparison Efficiency M← ⍴⎕AV :For I :In ⍳10000 {}^/M^.=' ' {}^/^/M=' ' {}M≡(⍴M)⍴' ' :EndFor ⍝ 9108 ms ⍝ 9060 ms ⍝ 587 ms 28

Numeric Comparison Efficiency M← ⍴ ⍳10000 :For I :In ⍳10000 {}^/M^.=0 {}^/^/M=0 {}M≡(⍴M)⍴0 :EndFor ⍝ ms ⍝ ms ⍝ 52 ms 29

Comparing Vectors A←10000?10000 B←10000?10000 C←A^.=B :For I :In ⍳10000 {}A^.=B {}A≡B :EndFor C←A≡B ⍝ 1244 ms ⍝ 135 ms 30

Comparing Vectors of Unequal Lengths A←10000?10000 B←9999?9999 C←A^.=B LENGTH ERROR C←A^.=B ^ 31

Comparing Vectors of Unequal Lengths L←(⍴A)⌈⍴B C←(L↑A)^.=L↑B or :If C←(⍴A)=⍴B :AndIf C←A^.=B :EndIf or C←A≡B To avoid LENGTH ERROR 32

Checking the Return Code of a Function →(¯1∊DATA←FUNCTION_1)/ERR But there are still many functions written such that the result returned can be either the data or the return code. Nowadays, many functions are written such that a 2-item nested vector is returned where one item contains the result and another item contains the return code. E.g. if ¯1 returned by a function means an error has occurred; then we need to be very careful with the use of the ∊ membership function. 33

Example of Function Return Code A popular IBM APL utility function to read text file is called ∆FM (File Matrix I/O). When ∆FM reads a text file and encounters an error, instead of returning the data, it returns an error code of 28. Thus many programmers would write the text file I/O coding in the following way. →(28∊DATA←∆FM 'file.csv')/ERR 34

Example of Return Code Inefficiency Y←∆FM 'file.csv' ⎕SIZE'Y' ⍴Y :For I :In ⍳1000 {}28∊Y {}28≡Y :EndFor ⍝ ms ⍝ 4 ms 35

CPU Optimization Suggestions When an elegant outer product generates a sparse matrix that causes LIMIT ERROR, WS FULL, or computational slow down, replace the outer product by a simpler but not so elegant expression. Example of code elegance: 1≠×/×M∘ vs (M≥1)^M≤12 Try to avoid unnecessary transpose of a matrix when you perform an inner product of a matrix with a vector. Remember that in some cases, the match function can run much faster than the inner product or the membership function. 36

The End 37 Eugene Ying Fiserv, Inc.