Computer Science 121 Scientific Computing Winter 2016 Chapter 4 Collections and Indexing
We've seen two kinds of collection –Array (sequence of numbers) –Text/string (sequence of characters) Two main issues –How to access individual elements of a collection –How to group related elements together (even when their types differ)
4.1 Indexing Consider census data for a single street: >>> elmstreet = array([3, 5, 2, 0, 4, 5, 1]) Matlab can give us various stats about this data >>> sum(elmstreet) # total residents 20 >>> mean(elmstreet) # mean household size >>> max(elmstreet) # largest household size 5 >>> min(elmstreet) # smallest household size 0
4.1 Indexing Some data may be bogus >>> min(elmstreet) # smallest size 0 Need to know bogus values, and where they “live” In general, need to know –Value of an element –Position (index) of the element
4.1 Indexing: where Boolean operators on arrays give us arrays of Booleans: >>> elmstreet == 0 array([False, False, False, True, False, False, False], dtype=bool) The where operator tells us the indices of the True elements >>> where(elmstreet == 0) (array([3]),) >>> where(elmstreet > 2) (array([0, 1, 2, 4, 5, 6]),) >>> where(elmstreet < 0) (array([], dtype=int64),)
4.1 Indexing: First and last Elements First element has index 0 >>> elmstreet array([3, 5, 2, 0, 4, 5, 1]) >>> elmstreet[0] 3 Last element can be referenced by special -1 index >>> elmstreet[-1] 1
4.1 Indexing: Subsequences Can use a list of indices instead of a single index >>> elmstreet[[0,2,4]] >>> elmstreet[[0,2,4]] = -1 >>> elmstreet array([-1, 5, -1, 0, -1, 5, 1])
4.1 Indexing: Extending an Array Use append to add an element at end of array: >>> elmstreet = append(elmstreet,8) >>> elmstreet elmstreet = Can append more than one element: >>> elmstreet = append(elmstreet,[9,10,11])
Fibonacci Redux With arrays, we only need a single variable and line (versus three) to do Fibonacci: >>> fib = arange(2) >>> fib array([0, 1]) >>> fib = append(fib, fib[-1] + fib[-2]) >>> fib array([0, 1, 1, 2, 3, 5])
4.2: 2D Arrays, a.k.a. Matrices Lots of data are best represented as tables:
4.2 Matrices We can store such data in a matrix: >>> elmstreet = array([[3,2,1,35000],\ [5,2,3,41000],\ [2,1,1,25000],\ [2,2,0,56000],\ [4,2,2,62000],\ [5,3,2,83000],\ [1,1,0,52000]]) Backslash says "continued on next row..." Household index is implicit (as row number)
4.2 Matrices Like len operator for 1D arrays (a.k.a. vectors), shape operator reports size of matrix: >>> shape(elmstreet) (7, 4) With matrices, we use two indices (instead of one) for referencing values: >>> elmstreet[2,3] >>> elmstreet[3,2] 0
4.2 Matrices As with 1D, we can access part of matrix by using an array of indices >>> elmstreet[[3,4,6], 3] array([56000, 62000, 52000]) Grab a whole row using colon notation >>> elmstreet[0,:] # whole first row array([3, 2, 1, 35000])
4.2 Matrices Also works for columns: >>> elmstreet[:, 0] # whole first col array([3, 5, 2, 2, 4, 5, 1])
As with a vector, we can do operations on a scalar and a matrix: >>>elmstreet*2 array([[6,4,2,70000], [10,4,6,82000], [4,2,2,50000], [4,4,0,112000], [8,4,4,124000], [10,6,4,166000], [2,2,0,104000]])
... and element-by-element on two matrices: >>> a = array([[1,2,3],[4,5,6],[7,8,9]]) >>> b = array([[2,4,6],[0,1,0],[0,3,1]]) >>> a + b array([[ 3, 6, 9], [ 4, 6, 6], [ 7, 11, 10]]) >>> a * b array([[ 2, 8, 18], [ 0, 5, 0], [ 0, 24, 9]])
Of course, matrices must be same size: >>> a + array([[1,2],[0,5]]) ValueError: operands could not be broadcast together with shapes (3,3) (2,2) And your socks don’t match either.
We can get a lot of mileage by combining colon and other operations >>> children = elmstreet[:, 2] >>> children array([1, 3, 1, 0, 2, 2, 0]) >>> nokidshouses = where(children == 0) >>> nokidshouses (array([3, 6]),) >>> incomenokids = elmstreet[nokidshouses, 3] >>> incomenokids array([[56000, 52000]]) >>> mean(incomenokids)
Can get rows and cols at same time with where : >>> r,c = where(elmstreet >3) >>> r array([0, 1, 1, 2, 3, 4, 4, 5, 5, 6]) >>> c array([3, 0, 3, 3, 3, 0, 3, 0, 3, 3]) For Boolean ops, use logical_ >>> r,c = where(logical_and(elmstreet >3, elmstreet <= 5)) >>> r array([1,4,5]) >>> c array([0,0,0])
4.3 Dictionaries: Mixed Data Types, with Names as Indices Dictionaries (a.k.a. Data Structures) allow us to put different types of data into the same collection: >>> pt = {} >>> pt["x"] = 3 >>> pt["name"] = "Henry" >>> pt {'x': 3, 'name': 'Henry'}
4.3 Arrays of arbitrary types: Lists >>> friends = ["Sally", "Bob", "Jane"] >>> friends ['Sally', 'Bob', 'Jane'] >>> friends[2] 'Jane' Arrays are great for numerical computing For other types of elements in a sequence, use "bare" lists: Unlike arrays, lists can mix types (not recommended): >>> stuff = [ , "Einstein", arange(5)]