Programming for Geographical Information Analysis: Core Skills

Programming for Geographical Information Analysis: Core Skills
Basic Syntax and Single Value Variables

Review Programs are written in text files with the .py extension (for Python) They contain statements, usually one per line, telling the computer what to do. Statements are formed to some syntax (~grammar) and out of symbols and words. In imperative, stateful, programming, data is held in variables. We use control flow statements to direct the program down different paths. One type of control flow is procedures - a separated off bit of code you can run by calling it.

Statement atoms The 'atomic' components of a statement are: delimiters (indents, semicolons, etc.); keywords (built into the language); identifiers (names of variables etc.); literals (values like 2 or "hello world") operators (maths symbols that do stuff).

This lecture Basic syntax Delimiters Keywords
Variables: storing single values Operators We'll come on to variables for storing multiple values next time.

Basic syntax Each statement is generally a single line (statements can be stacked on a line with semicolons but this is unusual). Standard 'inline' comments (text for humans) start # # This is an inline comment. 'Block' comments across multiple lines start """ and end """, or ''' and end ''' """ This is a block comment. You can have as many blank lines as you like to structure code. Inline comments are single line comments. Although they can follow code on a line, you can’t have code after them once they’re started. As we’ll see, there are some other kinds of comments. In general, for beginners, a good rule of thumb is that a quarter of your code wants to be blank lines, and a quarter comments.

Compound statements In many languages, lines have some kind of end of line marker (e.g. a semicolon) and code is divided into control flow sections by braces/curly brackets: if (a < 10) { alert (a); // JavaScript } alert("done"); But this allows errors (a missing bracket or semicolon can change the meaning), and requires coders to indent code inside the sections to make it clear where the control flow is. We mention this, just incase you’re doing one of our courses that use JavaScript or Java, or the advanced course, where we’ll see some C.

Compound statements In Python, lines are generally one statement per line and are grouped in blocks by indenting and dedenting. You can use spaces or tabs, and as many as you like of either, but be consistent. We recommend using 4 spaces. In theory, this means formatting and syntax are one. In practice, the interpreter basically looks at whether indenting is more or less than previous lines, this works in some versions of Python: if a < 10 : print (a) # Using a mix of tabs and spaces print ("done") But it isn't good practice. Align blocks at the same flow control: print (a) In most IDEs, you can indent by selecting a block of code and pushing the tab key, while you can dedent (the opposite) by holding the shift key while pushing the tab key. However, if you use an IDE, it is best to make sure it automatically converts tabs to spaces. There’s nothing wrong with tabs, as long as you are consistent in using them, but the Python Software Foundation recommends spaces.

Compound statements Overall, this causes less bugs, but you need to make sure your indenting is correct or the control flow changes. This: if a < 10 : print (a) print ("done") Is completely different to this:

Delimiters Newlines and indenting/dedenting at the start of a line are special kinds of delimiter-ish things, as are ' " # \ Others (some of which also act as operators): ( ) [ ] { } , : . = -> += -= *= /= //= &= |= ^= >>= <<= **=

Enclosing delimiters Python especially uses three style of enclosing delimiters. These are what the Python documentation calls them: {} braces # Sometimes called curly brackets elsewhere. [] brackets # Sometimes called square brackets elsewhere. () parentheses # Sometimes called curved brackets elsewhere. We'll try and use the Python terms. As we'll see in future lectures, these delimiters are especially important in Python.

Line breaks Lines can be joined together into a single line with a backslash, \ provided this isn't part of a piece of non-comment text or followed by a comment. Lines inside {} () [] can also be split (and commented); though non-comment text needs quotemarks around each line. Triple quoted text may be split across lines, but the formatting (spaces; linebreaks) will be kept in the text when used (i.e. if it's not a comment). print \ (""" Daisy, Daisy, Give me your answer, do... """)

Keywords The following are reserved for special uses and can’t be used for anything else: False class finally is return None continue for lambda try True def from nonlocal while and del global not with as elif if or yield assert else import pass break except in raise E.g. they can't be used for variable names. Later in the course, we'll see some other things you need to be aware of when naming variables, but for now they can be pretty much anything else.

Example: import Import adds libraries of code (modules in Python) so we can use them. Usually at the top of the file we add: import module_name For example: import builtins From then on we can use code inside the library. We'll come back to imports when we look at the standard library.

Identifiers These are labels stuck on things. The main use of identifiers is in constructing variables. But they are also used for naming, for example, procedures, or as Python calls them functions.

Builtins As well as keywords, there are some functions (like print()) built into the standard distribution. These are in a module called builtins. The shell loads this up automatically, but hides it with the name __builtins__ . This double-underscore notation is used in Python to hide things so they aren't accidentally altered. It is called 'dunder' notation It's easier to deal with if you import it normally. To see what's inside, enter: >>> import builtins >>> dir(builtins) dir() is one of the builtin functions. Used with a module name it tells you what functions and variables are available inside the module. Used on its own, it tells you what identifiers are currently being used. In general, we'll write functions with following parentheses, to mark them out from variables.

This lecture Variables: storing single values Keywords Operators
Basic syntax Delimiters Keywords Variables: storing single values Operators We'll come on to variables for storing multiple values next time.

The what of variables Variables are the combination of an identifying label and a value, often a literal like 2 or "hello world". The label/identifier is attached to the value thus: >>> a = 10 This is called variable assignment. Once assigned, when you use the label, you get the value: >>> a 10 Note that this is not the same as: >>> print("a") a Remember >>> shows working typed at the shell; this allows us to show you the output, which doesn't start >>>.

The what of variables They are called 'variables' as you can change the value: >>> a = 10 >>> print (a) 10 >>> a = 20 20

NB: Shell interaction Note that at the shell prompt, you don't actually need the print command, you can just type the name of the variable or an expression and it will display the value >>> a 20 >>> We'll drop the print, when we're dealing with the shell prompt, but keep it for code that's displayed as if it is in a script.

The what of variables The label is just pointing to the value, which is stored somewhere in the computer memory. To show that it is just a label, there's nothing to stop you attaching more than one label to the same value. >>> a = 10 >>> b = a >>> b 10 Note that this is not the same as: >>> b = 10 Which creates a second "10" and attaches it to b only. 10 a b In fact, what is going on is slightly more complicated. Literals are sometimes reused for multiple values in some languages. For example, in many languages, text is reused where possible, because it uses up a lot of space, so in: a = "hello world" b = "hello world" chances are, it'll be the same "hello world" until it is altered to b = "hello globe" and then it'll be a different literal. This helps keep memory efficiently used. 10 a b

The what of variables For text and numbers, if you change a variable, it is essentially created anew. >>> a = 10 >>> b = a >>> b 10 >>> b = 20 >>> a # Changing b has no effect on a, even though 10 # they were once attached to the same value, >>> b # because b is a new "b". 20

Values When a value has no label attached to it, it is useless, as we can't talk about it. >>> a = 10 >>> b = 20 >>> a = b The initial 10 is now without a label, so we can never talk about it again. How would we? For instance, this creates a new 10 rather than attaching to the old one: You can also remove labels from values by pointing them at the special "null object" None, which denotes a label not attached to anything (else): a = None Or entirely with del(label_name) or del label_name: del(a)

Garbage collection When all the labels pointing at something are removed from it, the object is unreachable. At some point when it can, the Python virtual machine will then do garbage collection and delete it from memory. We don't want very big variables containing whole files hanging around using up memory. If you want to force this, you can with: import gc gc.collect() But unless you're dealing with massive files and marginal memory space, its better to let the virtual machine deal with it.

Variable identifiers Names can be any continuous word/s, but must start with a letter or underscores. There is no relationship between a variable's value or use and its name. In the last simple examples, a and b were fine, but generally you should name variables so you can tell what they are used for: radius = 10 is more meaningful than a = 10 or bob = 10 In general, the more meaningful your names, the easier it will be to understand the code, when you come back to it, and the less likely you are to use the wrong variable. In geography, x, y , and z are obviously used a lot for coordinates. You will also see i, j, k used a lot for counting things, for historical reasons. We tend to use i, j, and k for counters because in the first third generation language, FORTRAN, letters had to be used for certain variables, and certain types of number could only be assigned to certain letters. Integer values (whole numbers) could only be assigned to the letters from i (for integer) onwards. For this reason, people still use i, j, k for counting things (but not lowercase "L" -- it looks too much like a one).

Name style Style conventions aren't syntax, but allow all coders to recognise what an element is. There's a styleguide for Python at: But it goes out of its way to avoid talking about variable names. The community preference seems to be for lowercase words to be joined with underscores; what is called (coincidentally) snake_case: perimeter_of_a_square Though where Python is working with C or other code, the more conventional camelCase is sometimes used: perimeterOfASquare Either way, start with a lowercase letter, as other things start uppercase.

The what of variables You may see variables described as containers for values, but this isn't true and isn't helpful. Think of them as labels and values. As we'll see later on, it is quite easy to get confused about what a variable is referring to, and thinking about them as a identifier/label and value helps.

The why of variables Variables are generally used to hold the result of calculations and user inputs. These are things we can't predict before the code is run. Some things we can predict the value of, for example, the 4 in: perimeter = 4 * length_of_side # perimeter of a square Such literals are hardwired in. Even such literals are often better put in a variable at the top of the file, as changing the variable instantly allows us to change the value throughout: number_of_sides = 4 perimeter = number_of_sides * length_of_side This now works for any regular shape if we change number_of_sides.

Values What kinds of things can we attached to variable labels? Everything! Literals like 1, 1.1, "a", "hello world". But also whole chunks of code. All the code and values in the computer is held in the form of binary data. We don't usually see this, but it is. It has to be: computers are just very complicated sets of on and off switches. If values are just spaces in memory with something in them, and all code and values is binary, if we can attach a label to a value in memory, we can attach it to code in memory. This is the basis of object oriented computing. Other languages which allow this include JavaScript.

Objects Objects are chunks of code that are wrapped up in a particular way. One thing this format enables is the attaching of labels to them to create variables. Objects can have their own functions and variables, so you can have variables inside other variables. Objects generally do some particular job. Here's an example… Other languages which allow this include JavaScript.

Example import tkinter def run(): pass root = tkinter.Tk() menu = tkinter.Menu(root) root.config(menu=menu) model_menu = tkinter.Menu(menu) menu.add_cascade(label="Model", menu=model_menu) model_menu.add_command(label="Run model", command=run) tkinter.mainloop() So, here’s the code that would make the example on the previous slide in Python. Note how little code it takes, and how (with the exception maybe of add_cascade) you can kind of understand what it does without really knowing too much about how it does it. These are two key reasons people like Object Orientated programming languages.

Example import tkinter def run(): pass root = tkinter.Tk() menu = tkinter.Menu(root) root.config(menu=menu) model_menu = tkinter.Menu(menu) menu.add_cascade(label="Model", menu=model_menu) model_menu.add_command(label="Run model", command=run) tkinter.mainloop() Python is a third generation, imperative, procedural, object oriented language. The “dot operator” is used to say “look inside this object and find this code (in this case a procedure). Every time you see the dot operator, it is object orientation working.

Values Don't worry about how we make objects for the moment; we'll look at this a bit later. But, for example, in Python (but not all other languages), functions themselves are objects that can be given labels: >>> a = print >>> a("hello world") hello world This makes it incredibly powerful: for example, we can pass one function into another (the core of functional programming). >>> dir(a)

Values How does the computer know what a variable is? For Python, it works it out. This takes it a little time, but means it is much more flexible.

Static vs dynamic languages
Dynamic languages calculate on the fly what static languages do at compilation time. This traditionally makes them more flexible but less efficient. For example, with many static languages, variables are associated with a single data type. This is quite often associated with manifest typing: you have to say what kind of variables you’re going to use before/when you first use them. This is sometimes called the variable declaration or description. This: allows the system to set aside memory for the value; means the system can check you’re using the type of variable you think you are. Here’s a Java declaration and assignment: int a = 20; Dynamically languages allow a variety of types to be associated with a variable. These are therefore often associated with implicit typing, where you don’t have to define the type, and frequent type inference, where the system works out which type is being used. A nice clear discussion of this is

Type inference In general, for type inference languages, the computer will work out the type of the variable. This is a little less efficient but allows any type to be assigned to an identifier, which make code easier for beginners. However, it does mean that you need to keep track of what kind of thing is in the variable. For example, Python will often also allow different types of value to be used with a function call, but there are limits. Python will generally warn you if you misuse a variable, but this isn't always the case, which is one reason very critical systems are more likely to use manifest typing languages. One efficient solution, for example, is to work out the type when the variable is first used and store the type with the label. Sometimes, this involves making a new label (not that you need to know this): a = 10 # First "a" label. a = 20 # New "a" label. a = "hello world" # Another new "a" label. One way some languages deal with a function taking in a variety of types of variable is to compile several different versions of the function for different uses, for example.

Assignment order When we assign an expression, the expression on the right is generally calculated first and assigned the label on the left: a = This means this is fine: a = 2 a = a + 2 Note that for functions, there is a difference between a = dir # Use "a" as an alias for "dir" a() # which we can then call. and a = dir() # Run "dir" and get some result back from it, print(a) # assigning the result to "a" we can then print.

Augmented assignment Instead, Python has the augmented assignment operators: x += 1 # same as x = x + 1 x -= 1 # same as x = x - 1 As we'll see, you can create some quite complicated assignment statements, but these assignment operators can only be used in very simple ways like these. For the record, specifically they can't use the expansion operators * or ** and they can't be used to assign to lists etc. We'll what these mean later in the course.

Primitives Many languages have primitives: core data types built into the language with a definite space set aside for each type in memory. These are usually simple literal holders. Python has core types, but these are not primitives: they're object types with their own functions and variables for storing the literals; you just don't have to know that to set them up.

Built in types Integrals: int : Integers – whole numbers. bool: True and False ; evaluate to 1 and 0 respectively, except when printed. Other numbers: float: Floating point numbers – decimal numbers (sometimes called ‘doubles’ in other languages because of how they are stored) complex: For imaginary numbers (actually two floats: one real, one imaginary) Basic sequences: str: Strings – text. bytes: Byte – binary.

Built in types as objects
To see that these are stored as objects, look at this code: >>> a = "hello world" >>> b = a.upper() >>> a hello world >>> b HELLO WORLD >>> c = "hello world".upper() >>> c Notice that in the case of "hello world".upper() we haven't even made a variable label; from the start, the string literal is wrapped in an object. Try typing >>> 2. And then hitting the tab key to see what functions and variables are available when numerical literals are wrapped (or use dir(2)).

Ints Ints may look like:
or 1_000_000 # 3.6 onwards for the underscores. but can't have leading zeros. Since Python 3, the size of int you can have is only limited by the computer memory, and is, therefore, for all practical purposes, unlimited.

Floats Floats may look like:
e E e _15_93 Floats are limited, but to a very large (system dependent) number. You can find it with: >>> import sys >>> sys.float_info For a 32 bit machine picked at random it was ~1.75e308 to -2.22e308. In the unlikely event you need bigger numbers you can construct them from two ints! Floats have two special values: a = float("inf") # Representing infinity. a = float("nan") # Representing the result of a calculation a computer can't do # for example, dividing by zero.

Floats: warning You might think that floats are more exact than ints, because you can have decimal places. However, you'd be wrong: computers are very poor at representing decimal numbers. So poor, it regularly kills people. Ints, on the other hand, are exact. Python matches the IEEE 754 double-precision number standard, but still has issues. If you intend to use Python for critical systems, especially navigation systems, you need to read up on floating point calculations. A good starting point is: We'll mention this again later in the course, looking at the Fractions and Decimal libraries which try to make them more exact, and flag when they're not.

Imaginary numbers Imaginary numbers are floats followed by capital or lowercase "J". 3.14j 10.j 10J .001J 1e100j 3.14e-10j 3.14_15_93j These are joined to real floats with the "+" operator (not addition when there's a "J") to form complex numbers: >>> a = J >>> a ( J) We won't deal in detail on complex/imaginary numbers here, but if you're interested then a good starting point is: "J" is used, rather than the more traditional "i" because this is the convention in electrical engineering where "i" has other meanings, and because capital "I" looks too much like one.

Type To find the type of a variable (for example, one you've not back from a function), use type(variableName): >>> a = 10 >>> type(a) <class 'int'>

This lecture Operators Variables: storing single values
Basic syntax Delimiters Keywords Variables: storing single values Operators We'll come on to variables for storing multiple values next time.

Operators - Unitary “-” – negates a number, e.g. in a = / Add; subtract; divide. * Multiply. ** Power, e.g. a**b == ab. // Floor divide, i.e. give the result as next lowest integer. % Modulus, i.e. remainder of integer division; but note, used sometimes to format Unused in core, but reserved for matrix operations in libraries. We'll cover string formatting later, but if you are interested:

Comparison operators == != Equal to; not equal to < > <= >= Less than; greater than; less than or equal to; greater than or equal to. is not is For checking whether objects are the same. These result in two special Boolean values: True # Note capital letter False In common with many languages, these also evaluate to numbers, False being zero and True being one, but also, numbers can evaluate to Booleans, with zero being False and any other number (including negatives) being True.

Bitshift operators >> << Bitshifts, e.g. 1 >> 1 == 0 ( >> 1 == ) 1 << 1 == 2 ( << 1 == ) & ^ | Bitwise AND, XOR, OR & == | == ^ == ~ Bitwise inversion ~ == In general, dealing with binary data is rare, but if you need to, here are the operators.

Useful functions abs(x) Returns the absolute positive value, or the magnitude for a complex number. round(number, ndigits) Rounds number to (optional) ndigits decimal points and returns it. If ndigits not there or None, returns nearest int. Watch out for values where a ".5" or equivalent at other magnitudes is being rounded: rounding can be up or down because of storage precision issues.

Precedence Operators have a specific order they are assessed in, if not put in parentheses. For example: a = 3+2*2 == 7 3+(2*2) Multiplication before addition. != 10 (3+2)*2 a = -3**2 == -9 -(3**2) Power raising before negation. != 9 (-3)**2 A full list is at: but always put parentheses around component expressions to clarify for yourself you have what you're after. The negation one is particularly one to look out for, as it is rare to parenthesize negatives normally. This example is from the documentation.

Mixed maths What happens if we mix types, for example: a = or, worse: a = "2" + 2 "2" the character plus 2 the number. The last case is especially hard for humans to see the problem with, as we treat characters as if they were numbers, but for a computer, they are completely unrelated: "2" is represented (at simplest) as is represented (at simplest) as Likewise: print ("answer = " + 2) fails, as it expects text not a number. A computer has no idea "2" and 2 are related. One is writing, the other a number. Perhaps more strangely, the binary for "2", when translated into a number, is 50.

Weakly typed vs strongly typed
We can distinguish between weakly typed languages (which allow easy mixing of variable types, like the number 2 and the character “2”) and strongly typed languages. We call the changing of one value type to another “type casting” (or type conversion). We can divide casting into: Implicit casting or coercion: it just happens. Explicit casting by identifier: there's some kind of marker that casting is needed - for example, in Java the type to change to in parentheses: int a = (int)floatB; Explicit casting by procedure/routine: some function is called to do the change. Python tends to use either implicit casting or casting by procedure. When we look at string processing, we'll see that there are some cases where the detailed representation of the string is determined by identifiers in front of them.

Implicit casting Most languages have some degree of implicit casting, usually when the change can be made without losing information, so: a = 1 b = a in Python, this will convert a to 1.0 (a float) during evaluation. In Python, all numbers are converted to the more complicated type. Caveat coder, though: in other languages, because ints are more exact, ints have preference. In Java, one int will render many expressions as integer results.

Explicit casting Python jumps straight from implicit casting to explicit casting by routine, using builtin functions, thus: a = float(x) # Where x can include a group of numerical text characters. a = int(x) # Likewise. a = str(x) # Convert to a string. a = bool(x) # Convert to Boolean True or False. And also, not strictly casting, but: a = int(x,base=10) # int to some base. a = oct(x) # Convert an int to a octal (base 8) string prefixed with "0o". a = hex(x) # Convert an int to a hexadecimal (base 16) string prefixed "0x". In actual fact, these are a special type of function called a constructor, used to make an object of a specific type, rather than a more usual conversion function. Converting fractions to other bases isn't generally done because of the uses in computing bases are put to. If you're interested, information here: Octal is a little obscure these days, but hex is still uses a lot, for example to represent web colours.

print As an example of how shifty this can be: print(10) Will convert 10 to "10" and print it. But print("here's a number " + 10) won't; you need: print("here's a number" + str(10))

Review: Core Python Python is generally written one statement per line. Control Flow clauses are indented, and indents and dedents therefore have syntactic meaning. Care must be taken with them. Variables are combinations of labels and values, assigned thus: a = 10 b = 10.1 c = "hello world" A label can attach to any value, including complicated chunks of code such as functions.

Review: Core Python Here we've just dealt with simple variables attached to one literal. Variables are generally written in snake_case and start with a letter. These are combined with operators. As operators are calculated before others, always use parentheses to force the order. You can change one variable type to another with int(x); float(x); str(x). You can find the type of a variable with type(variable_name). You can detach a label by assigning it to None. You can delete a label using del(variable_name).

Core computing Variables as identifiers and literals or objects. Functions as objects. Unreachable things and garbage collection. Style conventions (variable names; i,j,k) Hardwiring Objects and the dot operator Dynamic vs static languages Manifest vs implicit typing Type inference Operator precedence Primitives vs objects Bit operations Weak vs strong typing Implicit vs explicit casting

Computational thinking
Get the program working for generalities first. Type inference. Types as objects not primitives. Complex numbers. Pythonic goodness

Programming for Geographical Information Analysis: Core Skills

Similar presentations

Presentation on theme: "Programming for Geographical Information Analysis: Core Skills"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Programming for Geographical Information Analysis: Core Skills

Similar presentations

Presentation on theme: "Programming for Geographical Information Analysis: Core Skills"— Presentation transcript:

Similar presentations

About project

Feedback