Sequential Processing Processing each element in a sequence for e in [1,2,3,4]: print e for c in "hello": print c for e in (1,2,3,"Name"): print e


 1 String and Data Processing

Sequential Processing Processing each element in a sequence for e in [1,2,3,4]: print e for c in “hello”: print c for e in (1,2,3,”Name”): print e 2

List Comprehension When creating a new list with a certain condition or a mapping, [ x for x in [1,2,3 ] ]  [1,2,3] [ x*x for x in [1,2,3] ]  [1,4,9] [ x for x in [1,2,3,…,10] if x%2 == 0 ]  [ 2,4,6,8,10 ] 3

Operation with data records L = [ (1, 3), (1, 4), (1, 5), (2, 1), (2, 2) ] Each tuple has (, ). Counting the number of data (records) of group 1? Creating a list with data from the group 1. L0 = [ x for x in L if x[0] == 1 ] Then count the number of elements with len(L0) Sum the total number from group 1? 4

Operation with data records Create another list whose elements are all from the group 1 and element is only a number. N0 = [ x[1] for x in L if x[0] == 1 ] Then, apply the sum() built-in function. sum(N0)  12 Another built-in functions for a sequence? 5

Built-in Functions for a sequence To compute the maximum, use max(L) function max([1,2,3,4,5,4,3,2,1])  5 To compute the minimum, use min(L) function min([1,2,3,4,5,4,3,2,1])  1 To create a sorted sequence, use sorted(L) function sorted([1,2,3,4,5,4,3,2,1])  [1,1,2,2,3,3,4,4,5] 6

Built-in Functions for a sequence What if we have to deal with the inner product? The zip(L1,L2,…,Ln) function will help! zip([1,2,3],[4,5,6])  [ (1,4), (2,5), (3,6) ] 7

Built-in Functions for a sequence How to use with a loop? Use packing & unpacking! for (a,b) in zip([1,2,3],[4,5,6]): print a,b This will print (1,4), (2,5), and (3,6) within the loop. 8

Built-in Functions for a sequence When you need an index with for statement? Use enumerate() function for (i, e) in enumerate([“Tom”, “Jack”, “Bob”]): print i, e This will print 1 Tom 2 Jack 3 Bob 9

Built-in Functions for a sequence Functions: min( )  the minimum element max( )  the maximum element sum(, )  the sum of elements zip(a list of sequences with the same length) enumerate( )  a list with tuples which has an index (0,1,2,…,) and an element from the given sequence. 10

Built-in Functions for a sequence Packing and Unpacking is useful to deal with multiple values at an operation. Collecting each element whose index is 2’s multiple [ x for (j,x) in enumerate([4,5,6]) if j%2 == 0 ]  [4, 6] Computing the inner product sum( [ a*b for (a,b) in zip((1,2),(1,2)) ])  5 11

 12 Playing with Real Data

File Hopefully, you didn’t forget how to read a file. file = open(, ) lines = file.readlines() file.close() Data processing is essentially dealing with a list of lines. However, you should have a clear mind for the structure of your data. 13

Data Processing Data processing is essentially dealing with a list of lines. However, you should have a clear mind for the structure of your data. 14

Data Processing Pre-existing data is mostly the subject for statistical analysis. Basic description: count, sum, min, max, set operations Descriptive statistics such as mean, variance Information Visualization Comparative Analysis Cross correlation, Hypothesis testing Modeling and Validation Linear regression via Least Squares 15

 16 Pythonic Way for Descriptive Statistics

Sternberg’s experiment Does people process a set of numbers in parallel or in sequential? 17

Data that we have In Excel, we have 18

Data that we have You can download the previous data in a text file Download the file into your project directory Let’s make a list of tuples whose type is (,,, ) 0 th element is the id of each trial 1 st element is the response time in 1/100 sec 2 nd element is the number of digits 3 rd element is 1 for if the digit is included, 2 for not 19

Transforming data Let’s make the data more readable. For example, 20

The structure of data The first (uppermost) group: the number of digits 1/3/5 The second group: the presence of digit in a given number Y/N Let’s make a hierarchical structure 21

The structure of data 22

The structure of data Make a tuple of (1 st level data, 2 nd level data, 3 rd level data) (1,1,40)(3,1,73) (5,1,39) (1,2,45)(3,2,73) (5,2,66) … … … 23

The structure of data Read a list of strings from the file Strip whitespaces using strip() function Separate data into a list of four words using split() Create a list of tuples with list comprehension 24

Reading data step by step file = open(“ ”, “r”) lines = file.readlines() file.close() lines = [ l.strip() for l in lines ] words = [ l.split() for l in lines ] 25

Reading data step by step data = [ (int(w[1]),int(w[2]),\ int(w[3]) for w in words ] print data [ (1,1,40), (1,1,41), …, ] 26

Reading data step by step data = [ (int(w[1]),int(w[2]),\ int(w[3]) for w in words ] print data [ (1,1,40), (1,1,41), …, ] 27

Now, we have data 28 Now, let’s make a list of strings which contains 15 numbers of reaction time. For example, L = [ 1,2,3,4,5,…,100 ] D = [ [ 1,2,3,4,…,15], [16,17,18,…,30], [31,32,…,45], … ] How could we do that? There are many ways we can.

Collection by counting Create a counter variable and collect a list with every 15s. D = [] for j in range(0, len(L), 15): S = [ ] for k in range(j, j+15): S.append(L[j]) D.append(S)

Collection by counting Use list comprehension D = [ [ L[k] for k in range(j, j+15) ] \ for j in range(0, len(L), 15) ] 3. Use list slicing to replace the inner loop D = [ L[j:j+15] for j in range(0, len(L), 15) ] 4. Convert each inner list into a string output = [ “ “.join(sublist) for sublist in D ]

Repeat for each group 31 Before repeating the process, define a function and name the piece of code. def format_data(L): D = [ L[j:j+15] for j in range(0, len(L), 15) ] return “ ”.join([ “ “.join(sublist) for sublist in D ]) The signature of our function is format_data( ) 

Creating a report 32 Let’s make a html report html_template = “ … %s … %s … %s … fo = open(“ ”, “w”) o1 = format_data(L1) o2 = format_data(L2) … fo.write(html_template % (o1, o2, o3, …, )) fo.close() Open the file with your browser.

Textual Visualization 33 More informative visualization by frequency counting. Make a histogram!

Multiple countings 34 A classic problem using another data structure called a dictionary or an associative array. Make a tuple (key, value) A list contains multiple tuples while maintaining each key is unique within the list.

Dictionary 35 Whenever inserting a tuple, test if the key exists already If the key exists, overwrite the value If not, append the tuple into the list (1, 3), (2, 4), (3, 1), (1, 4) {(1,3)} (1,4) (2,4) (3,1)

Dictionary 36 A dictionary is created with {} constructor. For example data = { } # an empty dictionary data = { 1: 3, 2: 4 } The key and value pair is represented by key:value within the constructor

Dictionary 37 Accessing an element requires its key The bracket [] operator takes a key data = { 1: 3, 2: 4 } print data[1] # will print 3 print data[2] # will print 4 Don’t be confused with the indexing

Dictionary 38 The type of a key and value can be anything! data = { “Tom”: “Cruise”, 1:3, (0,2):(3,4) } This is different from a sequence. print data[“Tom”] # will print Cruise print data[1] # will print 3 print data[(0,2)] # will print (3,4)

Dictionary 39 Accessing a non-key value will raise KeyError print d[“Nicole”] KeyError: “Nicole” IN operator tests the key’s existence print “Tom” in d# True print “Nicole” in d# False

Dictionary 40 A set of its keys is obtained by keys() function print d.keys() >>> [ 1, (0,2), “Tom” ] Note that the insertion order is not preserved It depends on the implementation of Python

Dictionary 41 A set of its values is obtained by values() function print d.values() >>> [ 3, (3,4), “Cruise” ] Note that the insertion order is not preserved It depends on the implementation of Python

Dictionary 42 FOR loop works well with the dictionary type for k in d: print “(“, k, “,”, v, “)” There is dictionary comprehension as well!

Dictionary 43 { j:j+1 for j in range(0,3) } >>> { 0:1, 1:2, 2:3 } { j:e for (j,e) in enumerate(L) } >>> { 0:L[0], 1:L[1], …, n:L[n] } { e:[] for e in L } # assume L = [ 1, 1, 2 ] >>> { 1:[], 2:[] }

Frequency counting 44 Make each response time value the key of counts L = [ (1,1,40), (1,1,41), … ] F = { e[2]: 0 for e in L } Initialize each value w/ 0 Duplicate keys ignored automatically!

Frequency counting 45 L = [ (1,1,40), (1,1,41), … ] F = { 36:0, 37:0, …, } Loop each element value in L and update F[key] for e in L: F[ e[2] ] += 1 F = {36: 1, 37: 1, … }

Frequency counting 46 Interestingly, keys are in increasing order! The reason is its implementation. An efficient implementation of a dictionary inevitably requires a sorted set of keys. Why? Searching is efficient with sorted data than non-sorted data