Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming for Geographical Information Analysis: Core Skills

Similar presentations


Presentation on theme: "Programming for Geographical Information Analysis: Core Skills"— Presentation transcript:

1 Programming for Geographical Information Analysis: Core Skills
Containers

2 Review: Core Python Python is generally written one statement per line. Control Flow clauses are indented, and indents and dedents therefore have syntactic meaning. Care must be taken with them. Variables are combinations of labels and values, assigned thus: a = 10 b = 10.1 c = "hello world" A label can attach to any value, including complicated chunks of code such as functions.

3 Review: Core Python Here we've just dealt with simple variables attached to one literal. Variables are generally written in snake_case and start with a letter. These are combined with operators. As operators are calculated before others, always use parentheses to force the order. You can change one variable type to another with int(x); float(x); str(x). You can find the type of a variable with type(variable_name). You can detach a label by assigning it to None. You can delete a label using del(variable_name).

4 Enclosing delimiters Python uses three style of special enclosing delimiters. These are what the Python documentation calls them: {} braces # Sometimes called curly brackets elsewhere. [] brackets # Sometimes called square brackets elsewhere. () parentheses # Sometimes called curved brackets elsewhere. We'll try and use the Python terms.

5 This lecture Holding more than one thing that won't change. Holding more than one thing that will change. Strings, revisited. Formatting strings. Sets. Holding things associated with a name. This lecture we'll look at ways to hold multiple items using the same variable name. Broadly speaking, the approach to this is based on whether the things you'll hold will change or not, and how you want to access the values.

6 Holding more than one thing
What happens if we want to store more than one thing? What happens if we want to store data points? We don’t want to have to call each by a new name. Languages generally have variables that are arrays: one variable that can store multiple things with one name and a numerical index. Here’s one in Java: Assignment: array_name[3] = 21; Subscription: System.out.print (array_name[3]);

7 Out of range Depending on the language and data stored, the array will either refer to a space in memory where all the literals are stored, or will contain pointers (memory addresses) or references (linked labels) to their locations in memory (in Python it is generally the later). Attempts to read a cell that doesn't exist will usually generate some kind of error message (usually "Index out of range/bounds" or similar) and end the program.

8 Arrays As arrays can be very large, they usually require manifest typing – you have to say what they are going to hold (and sometimes how large they will be) to build them. Python arrays need manifest typing. Python arrays are in a special module. You’d make one like this: import array a = array.array('i',[0,0,0,0]) # Signed int type 'i' a.insert(3, 21) print(a[3]) Unlike many arrays, you don't need to say how big it will be.

9 Arrays Arrays are very efficient: they are memory optimised for space and often for searching. However, in general in most languages they aren't very easy to use: You can only put pre-defined data types in them (usually just one) In most languages (though not Python) they have a fixed size (in Python, stuff just gets added to the end whatever the index, though attempts to read non-existent cells still generates errors). Because of this, most languages have wrappers for arrays that make them more flexible. In Python, these objects are called Containers, and much more used than arrays. Wrappers are code that goes around other code like an ectoskeleton to add functionality. Indeed, arrays in Python are actually regarded as an addon, rather than a core datatype, and the wrappers are core, in reverse of the usual situation.

10 Containers The container depends on how data is to be accessed, and whether it can be altered or not. Data changeable: "Mutables" Data fixed "Immutables" Access by position: "Sequences" List Tuple String Bytes Access by name: "Mappings" Dictionary Named tuple Access by checking existence: "Sets" Set Frozen set We will mainly be dealing with those in bold.

11 Mutability Mutability is whether data can be changed after it is stored in an object. Immutable objects are more efficient, as the computer can optimise around them guaranteed they won't change. Obviously, though, not being able to change them is limiting. Numbers and strings are immutable, though they may not seem it: if you assign a new value to a pre-existing label, the value is created anew. Of the containers, strings and tuples are immutable. We'll see that a) strings are containers, and b) that tuples can be immutable, yet contain mutable objects. Dictionaries, sets, and lists are mutable.

12 Sequences Have the advantage of being a sequence (usually the sequence as added, but potentially ordered). Their length is found with a builtin function, thus: len(sequence_name) a = len(seq) #e.g Just like arrays, we refer to the values in them using a name and position index: name[i] # Where i is an int. print(seq[2]) # e.g. All indices start with zero and go to len(sequence_name) - 1 i = 0 i = 1 i = 2 i = 3 i = len(x) - 1

13 Sequences Tuples: immutable sequence of objects, either literal-style or more complicated objects. Lists: mutable sequence of objects, either literal-style or more complicated objects. Both can contain other sequences as their objects, and both can contain mixed types of object, so, for example ints and strings.

14 This lecture Holding more than one thing that won't change: Tuples.
Holding more than one thing that will change. Strings, revisited. Formatting strings. Sets. Holding things associated with a name.

15 Tuples Tuples are produced by the comma operator if used without [] {} enclosers. For example: a = 2,3 Frequently they are inside (), but this is only necessary for making the empty tuple: a = (2,3) a = () The following is a tuple with one element (the documentation calls this a singleton, though beware this has other, more general meanings): a = 2, You can also use a constructor function: a = tuple(some_other_container) Like int(x) and float(x) there is a constructor function for tuples.

16 Tuples You can also add and multiply tuples (but not subtract from them or divide): >>> a = 1,2,3 >>> a = a + (4,) # or a += 4, >>> a (1, 2, 3, 4) >>> a = a*2 # or a *= 2 (1,2,3,1,2,3) a *= 2 only works with single values, obviously - that is, you can't multiple a tuple by a tuple of multiple values.

17 Subscription You can get values out, thus: a[0] # First a[len(a) - 1] # Last But also: a[-1] # Last a[-len(a)] # First Negatives count back from the end (essentially negatives get len()added to them).

18 Packing Forcing multiple answers into a sequence is known as 'packing'. Example: the divmod() builtin function returns a tuple (a // b, a % b) >>> c = divmod (9,2) >>> type(c) <class 'tuple'> # c is automatically made tuple type >>> c[0] # subscription, cell 0 4 # 9//2 >>> c[1] # subscription, cell 1 1 # 9%2

19 Unpacking Unpacking is splitting a sequence across variables: >>> a = (1,2) >>> b, c = a >>> b 1 >>> c 2 There must be the right number of variables to unpack into. Relatively rare in other languages, so feels Pythonic.

20 Packing/Unpacking This packing/unpacking means you can assign multiple variables at once: a, b = 1, 2 But also means you can do this: a, b = divmod(9,2) If you see two variable names with a comma between, they are usually being assigned some unpacked sequence. You can find out what type with: >>> c = divmod(9,2) >>> type(c) or >>> type(divmod(9,2)) Which returns the sequence (here a tuple) and passes it straight to the type function.

21 Ranges Ranges are a special type of immutable sequence. range(start, stop, step) # Elements in italics optional Start; stop; step should be ints. Default for start = 0 Default for step = 1 Generates numbers up to but not including stop. range(4) generates 0,1,2,3 Note how this is the same as sequence indices - this will come in useful. range(2,8,2) generates 2,4,6 range(0,-5,-1) generates 0,-1,-2,-3,-4 Ranges generate an immutable sequence of numbers. You can use these for making synthetic data, but we'll see other uses for them when we look at processing containers in the next lecture.

22 Tuple constructor a = tuple(x) # Where x is a container or producer of a sequence One way to make a tuple is thus: >>> a = tuple(range(5)) >>> a (0,1,2,3,4) Note you can't just assign a label: >>> a = range(5) 'range(0,5)' >>> type(a) <class 'range'>

23 Sequence functions min(a) Smallest item in a. max(a) Largest item in a. a.index(x,i,j) Index of the first occurrence of x in a (at or after optional index i and before index j). a.count(x) Counts of x in a. any(a) For a sequence of Booleans, checks whether any are true Returns as soon as it finds one (likewise, because of conversion, whether any numbers are != 0) One useful element of having data in sequences is that there are functions that work specifically with sequences; for example, the max() function will find the maximum value in the sequence.

24 This lecture Holding more than one thing that will change: Lists.
Holding more than one thing that won't change: Tuples. Holding more than one thing that will change: Lists. Strings, revisited. Formatting strings. Sets. Holding things associated with a name.

25 Lists Like tuples, but mutable. Formed with brackets: Assignment: >>> a = [1,2,3] Or with a constructor function: a = list(some_other_container) Subscription (again, zero based indexing): >>> a[0] 1 >>> a[len(a) - 1] 3

26 Assignment a = (1,2,3) a[0] = 10 # Breaks the program. a = [1,2,3] a[0] = 10 # Fine a = [10,20,30] # Assigns a new list, # rather than altering the old one. Essentially the same kind of operation as : b = 100 b = 200

27 For mutables, if you change a variable's content, it changes for all attached labels.
You can still disconnect it and connect it to something new. >>> a = [10] # List containing 10. >>> b = a >>> b[0] 10 >>> b[0] = 20 >>> a[0] # Changing b changes a, as they refer to the same 20 # object. >>> b = [30] # Disconnect b from [10] and attach it to a new list. Essentially a new "b". >>> a[0] 30 The what of variables

28 Slices: extended indexing
Extended indexing is a way of referencing values in a sequence. These are sometimes called a 'slice' (essentially slice objects are generated, containing the indices generated by a range). a[i:j] returns all the elements between i and j, including a[i] but not a[j] a[i:j:k] returns the same, but stepping k numbers each time. Slices must have at least [:] (slice everything), but the rest is optional. If j > len(a), the last position is used. If i is None or omitted, zero is used. If i > j, the slice is empty. a[:2] # First two values. a[-2:] # Last two values. “For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.”

29 Assigning slices While you can use slices as indices with immutable sequences, they can be used with mutable sequences like lists for assignment as well. >>> a = [0,1,2,3,4] >>> b = [10,20,30] >>> a[1:3] = b >>> a [0,10,20,30,3,4] # Note we replace 2 values with 3. >>> b = [10,20] >>> a[1:5] = b [0,10,20] # We replace 4 values with 2.

30 List related functions
All the same functions as immutable sequences, plus these "in place" functions (i.e. functions that change the original, rather than returning a copy): del(a[:]) Delete a slice. a.clear() Empty the list. a.extend(b) Extends a with the contents of b (i.e. a += b) a.insert(i, x) Inserts x into a at the index position i. a.pop(i) Returns the item at i and removes it from a. a.remove(x) Remove the first item from a equal to x a.reverse() Reverses the items of a.

31 2D lists There is nothing to stop sequences containing other sequences. You can therefore built 2D (or more) sequences. a = [[1,2,3],[10,20,30],[100,200,300]] They can be regular (i.e. have all second dimension sequences the same length) or irregular. Reference like this: >>> a[0] [1,2,3] >>> a[0][0] 1

32 NB: 2D sequences Note that tuples can contain lists. This seems odd, as lists are mutable, but it is only the object references/links in the tuple that are immutable, not the objects themselves. Tuples of numbers and strings are immutable because numbers and strings are immutable. >>> a = ([1,2],[10,20]) >>> a[0][0] = 100 >>> a ([100,2],[10,20]) >>> a[0] = [1000,2000] TypeError: 'tuple' object does not support item assignment

33 NB: 2D sequences Finally, you can combine two lists or tuples into a list of tuples like this: >>> a = [1,2,3,4,5] >>> b = [10,20,30,40,50] >>> c = zip(a,b) >>> d = list(c) >>> d [(1,10),(2,20),(3,30),(4,40),(5,50)] As we'll see in the module on control flow, this allows us to process two sets of number simultaneously. List(seq)

34 Command line arguments
When you run a Python program at the command line, you can pass information into it. This is not unusual. When you click on a Word file in Windows, what is happening in the background is Word is being activated and passed the filename to open. Indeed, when you run a Python program, you are doing this: python helloworld.py you're saying "run the python program, and the thing after its name is the file for it to execute." The elements passed into a program like this are called "command line arguments". They are used to quickly say how a program should work without re-writing the code. One place lists are used is in getting data into a program.

35 Command line arguments
We can actually get our code to record command line arguments. python ourfile.py arg1 arg2 arg3 Inside the program, these are made available in a list called sys.argv argv[0] is actually the script name (and, depending on OS, path) If run using the -c option of the interpreter, argv[0] == "-c". After argv[0] come the other arguments as strings.

36 Example #args.py import sys print ("Hello " + sys.argv[1]) > python args.py Dave > Hello Dave

37 This lecture Strings, revisited. Formatting strings.
Holding more than one thing that won't change: Tuples. Holding more than one thing that will change: Lists. Strings, revisited. Formatting strings. Sets. Holding things associated with a name.

38 Strings As well as tuples and ranges, there are two additional important immutable sequences: Bytes (immutable sequences of 8 ones or zeros (usually represented as ints between 0 and 255 inclusive, as is 255 as an int); of which Byte Arrays are the mutable version) Strings (text) Many languages have a primitive type which is an individual character. Python doesn't - str (the string type) are just sequences of one-character long other str. “The array module provides an array() object that is like a list that stores only homogeneous data and stores it more compactly. The following example shows an array of numbers stored as two byte unsigned binary numbers (typecode "H") rather than the usual 16 bytes per entry for regular lists of Python int objects:” “Python strings cannot be changed — they are immutable. Therefore, assigning to an indexed position in the string results in an error:”

39 Strings Moreover, it may seem odd that they are immutable, but this helps with memory management. If you change a str the old one is destroyed and a new one created. >>> a = "hello world" >>> a = "hello globe" # New string (and label). >>> a = str(2) # String "2" as text. >>> a[0] # Subscription. 'h' >>> a[0] = "m" # Attempted assignment. TypeError: 'str' object does not support item assignment

40 String Literals String literals are formed 'content' or "content" (inline) or '''content''' or """content""" (multiline). In multiline quotes, line ends are preserved unless the line ends “\” print('''This is \ all one line. This is a second.''') For inline quotes, you need to end the quote and start again on next line (with or without “+” for variables): print("This is all " + "one line.") print("This is a second") # Note the two print statements.

41 String concatenation (joining)
Strings can be concatenated (joined) though: >>> a = "hello" + "world" >>> a = "hello" "world" # "+" optional if just string literals. >>> a 'helloworld' # Note no spaces. To add spaces, do them inside the strings or between them: >>> a = "hello " + "world" >>> a = "hello" + " " + "world" For string variables, need "+" >>> h= "hello" >>> a = h + "world"

42 Immutable concatenation
But, remember that each time you change an immutable type, you make a new one. This is hugely inefficient, so continually adding to a immutables takes a long time. There are alternatives: With tuples, use a list instead, and extend this (a new list isn't created each time). With bytes, use a bytearray mutable. With a string, build a list of strings and then use the str.join() function built into all strings once complete. >>> a = ["x","y","z"] >>> b = " ".join(a) >>> b 'x y z' >>> c = " and ".join(a) >>> c 'x and y and z' “/”.join(args) Where args is a sequence: joins them with “/”between.

43 Parsing Often we'll need to split strings up based on some delimiter. This is known as parsing. For example, it is usual to read data files a line at a time and them parse them into numbers.

44 Split Strings can be split by: a = str.split(string, delimiter) a = some_string.split(delimiter) (There's no great difference) For example: a = "Daisy, Daisy/Give me your answer, do." b = str.split(a," ") As it happens, whitespace is the default.

45 Search and replace str.startswith(strA, 0, len(string)) Checks whether a string starts with strA. str.endswith(suffix, 0, len(string)) Second two params are optional start and end search locations. str.find(strA, 0, len(string)) Gives index position or -1 if not found str.index(strA, 0, len(string)) Raises error message if not found. rfind and rindex do the same from right-to-left Once an index is found, you can uses slices to extract substrings. strB = strA[index1:index2] lstrip(str)/rstrip(str) Removes leading whitespace from left/right strip([chars]) As above, but both sides, and with optional characters to strip str.replace(substringA, substringB, int) Replace all occurrences of A with B. The optional final int arg will control the max number of replacements.

46 Escape characters What if we want quotes in our strings? Use double inside single, or vice versa: a = "It's called 'Daisy'." a = 'You invented "Space Paranoids"?' If you need to mix them, though, you have problems as Python can't tell where the string ends: a = 'It's called "Daisy".' Instead, you have to use an escape character, a special character that is interpreted differently from how it looks. All escape characters start with a backslash, for a single quote it is simply: a = 'It\'s called "Daisy".'

47 Escape characters \newline Backslash and newline ignored \\
\' Single quote (') \" Double quote (") \b ASCII Backspace (BS) \f ASCII Formfeed (FF) \n ASCII Linefeed (LF) \r ASCII Carriage Return (CR) \t ASCII Horizontal Tab (TAB) \ooo Character with octal value ooo \xhh Character with hex value hh \N{name} Character named name in the Unicode database \uxxxx Character with 16-bit hex value xxxx \Uxxxxxxxx Character with 32-bit hex value xxxxxxxx

48 String Literals Going back to our two line example: print("This is all " + "one line.") print("This is a second") # Note the two print statements. Note that we can now rewrite this as: "one line. \n" + "This is a second")

49 String Literals There are some cases where we want to display the escape characters as characters rather than escaped characters when we print or otherwise use the text. To do this, prefix the literal with "r": >>> a = r"This contains a \\ backslash escape" From then on, the backslashes as interpreted as two backslashes. Note that if we then print this, we get: >>> a 'This contains a \\\\ backslash escape' Note that the escape is escaped. String literal markups: R or r is a “raw” string, escaping escapes to preserve their appearance. F or f is a formatted string (we'll come to these). U or u is Python 2 legacy similar to R. Starting br or rb or any variation capitalised – a sequence of bytes. “Why can’t raw strings (r-strings) end with a backslash? More precisely, they can’t end with an odd number of backslashes: the unpaired backslash at the end escapes the closing quote character, leaving an unterminated string. An example of when we might want to use this is when constructing a regex statement. These search text on the basis of patterns which include quite a lot of backslashes, and escaping each one can make for very confusing patterns.

50 Formatting strings There are a wide variety of ways of formatting strings. print( "{0} has: {1:10.2f} pounds".format(a,b) ) print('%(a)s has: %(b)10.2f pounds'%{'a':'Bob','b': }) See website for examples.

51 This lecture Sets. Strings, revisited. Formatting strings.
Holding more than one thing that won't change: Tuples. Holding more than one thing that will change: Lists. Strings, revisited. Formatting strings. Sets. Holding things associated with a name.

52 Sets Unordered collections of unique objects. Main type is mutable, but there is a FrozenSet: a = {"red", "green", "blue"} a = set(some_other_container) Can have mixed types and container other containers. Note you can't use a = {} to make an empty set (as this is an empty dictionary), have to use: a = set() Implement the classic sets you know of from set mathematics.

53 Add/Remove a.add("black") a.remove("blue") # Creates a warning if item doesn't exist. a.discard("pink") # Silent if item doesn't exist. a.clear() # Discard everything.

54 Operators | or a.union(b) # Union of sets a and b. & or a.intersection(b) # Intersection. - or a.difference(b) # Difference (elements of a not in b). ^ or a.symmetric_difference(b) # Inverse of intersection. x in a # Checks if item x in set a. x not in a # Checks if item x is not in set a. a <= b or a.issubset(b) # If a is contained in b. a < b # a is a proper subset (i.e. not equal to) a >= b or a.issuperset(b) # If b is contained in a. a > b # a is a proper superset Operators only work on sets; functions work on (some) other containers. Specifically, the functions work on containers that are iterable, that is, you can ask for their contents one at a time. We'll come on to this.

55 Other functions Most of the functions have partners that adjust the set, for example: a &= b or a.intersection_update(b) Updates a so it is just its previous intersection with b. For a complete list, see:

56 This lecture Holding things associated with a name: Dict.
Holding more than one thing that won't change: Tuples. Holding more than one thing that will change: Lists. Strings, revisited. Formatting strings. Sets. Holding things associated with a name: Dict.

57 Mappings Mappings link (map) one set of data to another, so requests for the first get the second. The main mapping class is dict (dictionary; in other languages these are sometimes called associative arrays, or ~hashtables) They're composed of a table of keys and values. If you ask for the key you get the value. An example would be people's names and their addresses. Keys have to be unique. Keys have to be immutable objects (we don't want them changing after they're used). Dictionaries are not ordered.

58 Dict If strings you can also do:
a = {1:"Person One", 2:"Person Two", 3:"Person 3"} If strings you can also do: a = {"one"="Person One", "two"="Person Two"} a = {} # Empty dictionary. keys = (1,2,3) values = ("Person One", "Person Two", "Person 3") a = dict(zip(keys, values)) a[key] = value # Set a new key and value. print(a[key]) # Gets a value given a key.

59 Useful functions del a[key] # Remove a key and value. a.clear() # Clear all keys and values. get(a[key], default) # Get the value, or if not there, returns default. (normally access would give an error) a.keys() a.values() # Return a "view" of keys, values, or pairs. a.items() These are essentially a complicated insight into the dictionary. To use these, turn them into a list: list(a.items()) list(a.keys()) Again, there are update methods. See:

60 Dictionaries Dictionaries are hugely important as, not that you’d know it, objects are stored as dictionaries of attributes and methods.

61 Review Tuples (,), lists [], sets {}, and dictionaries {:} {} Empty dict; set() empty set list&tuple_access[int] dict_access[key] Containers are indexed from 0 to len(c)-1. Sets and Frozen Sets are mutable and immutable sets of unstructured data. They can’t be accessed by the [index] notation but you can iterate over them. Created using set() or frozenset().

62 Review A lot of the errors are due to confusing mutable and immutable objects. a = 2 # a is a reference to an immutable object 2. a = 3 # The 2 is destroyed and a new 3 referenced instead. b = [1,2] # Mutable list of immutable objects. b[1] = 3 # The 2 is destroyed and a new 3 referenced instead. b = (1,2) # Immutable tuple of immutable objects. b[1] = 3 # Error b = ([1,2]) # Immutable tuple of mutable objects b[0] = 3 # Error, you can't change what a tuple references. b[0][1] = 3 # You can change the nature of the thing referenced as long as you don't change the individual thing referenced. The 2 is destroyed and a new 3 referenced instead.

63 Review This kind of thing is very pythonic: a = 1,2 # Tuple packing a,b = 1,2 a,b = some_function() # Container unpacking Test the type of objects returned with type(x).


Download ppt "Programming for Geographical Information Analysis: Core Skills"

Similar presentations


Ads by Google