 1 String and Data Processing. Sequential Processing Processing each element in a sequence for e in [1,2,3,4]: print e for c in “hello”: print c for.

Slides:



Advertisements
Similar presentations
Chapter 6 Lists and Dictionaries CSC1310 Fall 2009.
Advertisements

CSE Lecture 3 – Algorithms I
Dictionaries Last half of Chapter 5. Dictionary A sequence of key-value pairs – Key is usually a string or integer – Value can be any python object There.
 1 Sorting. For computer, sorting is the process of ordering data. [ ]  [ ] [ “Tom”, “Michael”, “Betty” ]  [ “Betty”, “Michael”,
Sequences A sequence is a list of elements Lists and tuples
CSCI/CMPE 4341 Topic: Programming in Python Chapter 6: Lists, Tuples, and Dictionaries – Exercises Xiang Lian The University of Texas – Pan American Edinburg,
 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket.
Lists in Python.
Programming Training Main Points: - Lists / Arrays in Python. - Fundamental algorithms on Arrays.
Data Structures in Python By: Christopher Todd. Lists in Python A list is a group of comma-separated values between square brackets. A list is a group.
Handling Lists F. Duveau 16/12/11 Chapter 9.2. Objectives of the session: Tools: Everything will be done with the Python interpreter in the Terminal Learning.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Data Collections: Dictionaries CSC 161: The Art of Programming Prof. Henry Kautz 11/4/2009.
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. Chapter 14 Tuples, Sets, and Dictionaries 1.
Beyond Lists: Other Data Structures CS303E: Elements of Computers and Programming.
 Expression Tree and Objects 1. Elements of Python  Literals, Strings, Tuples, Lists, …  The order of file reading  The order of execution 2.
Chapter 7 Lists and Tuples. "The Practice of Computing Using Python", Punch & Enbody, Copyright © 2013 Pearson Education, Inc. Data Structures.
CSC 211 Data Structures Lecture 13
Built-in Data Structures in Python An Introduction.
Introducing Python CS 4320, SPRING Resources We will be following the Python tutorialPython tutorial These notes will cover the following sections.
Data Collections: Lists CSC 161: The Art of Programming Prof. Henry Kautz 11/2/2009.
Lists. The list is a most versatile datatype available in Python which can be written as a list of comma-separated values (items) between square brackets.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 8 Lists and Tuples.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 8 Arrays.
14. DICTIONARIES AND SETS Rocky K. C. Chang 17 November 2014 (Based on from Charles Dierbach, Introduction to Computer Science Using Python and Punch and.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 4 Karsten Hokamp, PhD Genetics TCD, 01/12/2015.
Introduction to Computing Using Python Dictionaries: Keeping track of pairs  Class dict  Class tuple.
Chapter 10 Loops: while and for CSC1310 Fall 2009.
LECTURE 3 Python Basics Part 2. FUNCTIONAL PROGRAMMING TOOLS Last time, we covered function concepts in depth. We also mentioned that Python allows for.
CS190/295 Programming in Python for Life Sciences: Lecture 6 Instructor: Xiaohui Xie University of California, Irvine.
LISTS and TUPLES. Topics Sequences Introduction to Lists List Slicing Finding Items in Lists with the in Operator List Methods and Useful Built-in Functions.
Python Programing: An Introduction to Computer Science
 1 Searching. Why searching? Searching is everywhere. Searching is an essential operation for a dictionary and other data structures. Understand the.
Python Data Structures By Greg Felber. Lists An ordered group of items Does not need to be the same type – Could put numbers, strings or donkeys in the.
Python Files and Lists. Files  Chapter 9 actually introduces you to opening up files for reading  Chapter 14 has more on file I/O  Python can read.
Introduction to Programming Oliver Hawkins. BACKGROUND TO PROGRAMMING LANGUAGES Introduction to Programming.
Programming Training Main Points: - More Fundamental algorithms on Arrays. - Reading / Writing from files - Problem Solving.
CS100 - PYTHON – EXAM 2 REVIEW -ONLY THE VITAL STUFF- PYTHON STRING METHODS, LOOPS, FILES, AND DICTIONARIES.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
Today… Files from the Web! Dictionaries. Lists of lists. Winter 2016CISC101 - Prof. McLeod1.
Dictionaries Alexandra Stefan CSE1310 – University of Texas at Arlington.
Lists/Dictionaries. What we are covering Data structure basics Lists Dictionaries Json.
Intro to CS Nov 21, 2016.
Advanced Python Idioms
Python – May 18 Quiz Relatives of the list: Tuple Dictionary Set
Algorithmic complexity: Speed of algorithms
Tuples and Lists.
Containers and Lists CIS 40 – Introduction to Programming in Python
CSc 120 Introduction to Computer Programing II
Intro to Computer Science CS1510 Dr. Sarah Diesburg
CISC101 Reminders Slides have changed from those posted last night…
CS190/295 Programming in Python for Life Sciences: Lecture 6
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Recitation Outline C++ STL associative containers Examples
Programming Training Main Points:
Python Tutorial for C Programmer Boontee Kruatrachue Kritawan Siriboon
Algorithmic complexity: Speed of algorithms
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Advanced Python Idioms
CISC101 Reminders Assignment 2 due today.
CHAPTER 4: Lists, Tuples and Dictionaries
Algorithmic complexity: Speed of algorithms
Advanced Python Idioms
Introduction to Computer Science
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Hash Maps Implementation and Applications
Dictionary.
Introduction to Computer Science
Presentation transcript:

 1 String and Data Processing

Sequential Processing Processing each element in a sequence for e in [1,2,3,4]: print e for c in “hello”: print c for e in (1,2,3,”Name”): print e 2

List Comprehension When creating a new list with a certain condition or a mapping, [ x for x in [1,2,3 ] ]  [1,2,3] [ x*x for x in [1,2,3] ]  [1,4,9] [ x for x in [1,2,3,…,10] if x%2 == 0 ]  [ 2,4,6,8,10 ] 3

Operation with data records L = [ (1, 3), (1, 4), (1, 5), (2, 1), (2, 2) ] Each tuple has (, ). Counting the number of data (records) of group 1? Creating a list with data from the group 1. L0 = [ x for x in L if x[0] == 1 ] Then count the number of elements with len(L0) Sum the total number from group 1? 4

Operation with data records Create another list whose elements are all from the group 1 and element is only a number. N0 = [ x[1] for x in L if x[0] == 1 ] Then, apply the sum() built-in function. sum(N0)  12 Another built-in functions for a sequence? 5

Built-in Functions for a sequence To compute the maximum, use max(L) function max([1,2,3,4,5,4,3,2,1])  5 To compute the minimum, use min(L) function min([1,2,3,4,5,4,3,2,1])  1 To create a sorted sequence, use sorted(L) function sorted([1,2,3,4,5,4,3,2,1])  [1,1,2,2,3,3,4,4,5] 6

Built-in Functions for a sequence What if we have to deal with the inner product? The zip(L1,L2,…,Ln) function will help! zip([1,2,3],[4,5,6])  [ (1,4), (2,5), (3,6) ] 7

Built-in Functions for a sequence How to use with a loop? Use packing & unpacking! for (a,b) in zip([1,2,3],[4,5,6]): print a,b This will print (1,4), (2,5), and (3,6) within the loop. 8

Built-in Functions for a sequence When you need an index with for statement? Use enumerate() function for (i, e) in enumerate([“Tom”, “Jack”, “Bob”]): print i, e This will print 1 Tom 2 Jack 3 Bob 9

Built-in Functions for a sequence Functions: min( )  the minimum element max( )  the maximum element sum(, )  the sum of elements zip(a list of sequences with the same length) enumerate( )  a list with tuples which has an index (0,1,2,…,) and an element from the given sequence. 10

Built-in Functions for a sequence Packing and Unpacking is useful to deal with multiple values at an operation. Collecting each element whose index is 2’s multiple [ x for (j,x) in enumerate([4,5,6]) if j%2 == 0 ]  [4, 6] Computing the inner product sum( [ a*b for (a,b) in zip((1,2),(1,2)) ])  5 11

 12 Playing with Real Data

File Hopefully, you didn’t forget how to read a file. file = open(, ) lines = file.readlines() file.close() Data processing is essentially dealing with a list of lines. However, you should have a clear mind for the structure of your data. 13

Data Processing Data processing is essentially dealing with a list of lines. However, you should have a clear mind for the structure of your data. 14

Data Processing Pre-existing data is mostly the subject for statistical analysis. Basic description: count, sum, min, max, set operations Descriptive statistics such as mean, variance Information Visualization Comparative Analysis Cross correlation, Hypothesis testing Modeling and Validation Linear regression via Least Squares 15

 16 Pythonic Way for Descriptive Statistics

Sternberg’s experiment Does people process a set of numbers in parallel or in sequential? 17

Data that we have In Excel, we have 18

Data that we have You can download the previous data in a text file Download the file into your project directory Let’s make a list of tuples whose type is (,,, ) 0 th element is the id of each trial 1 st element is the response time in 1/100 sec 2 nd element is the number of digits 3 rd element is 1 for if the digit is included, 2 for not 19

Transforming data Let’s make the data more readable. For example, 20

The structure of data The first (uppermost) group: the number of digits 1/3/5 The second group: the presence of digit in a given number Y/N Let’s make a hierarchical structure 21

The structure of data 22

The structure of data Make a tuple of (1 st level data, 2 nd level data, 3 rd level data) (1,1,40)(3,1,73) (5,1,39) (1,2,45)(3,2,73) (5,2,66) … … … 23

The structure of data Read a list of strings from the file Strip whitespaces using strip() function Separate data into a list of four words using split() Create a list of tuples with list comprehension 24

Reading data step by step file = open(“ ”, “r”) lines = file.readlines() file.close() lines = [ l.strip() for l in lines ] words = [ l.split() for l in lines ] 25

Reading data step by step data = [ (int(w[1]),int(w[2]),\ int(w[3]) for w in words ] print data [ (1,1,40), (1,1,41), …, ] 26

Reading data step by step data = [ (int(w[1]),int(w[2]),\ int(w[3]) for w in words ] print data [ (1,1,40), (1,1,41), …, ] 27

Now, we have data 28 Now, let’s make a list of strings which contains 15 numbers of reaction time. For example, L = [ 1,2,3,4,5,…,100 ] D = [ [ 1,2,3,4,…,15], [16,17,18,…,30], [31,32,…,45], … ] How could we do that? There are many ways we can.

Collection by counting Create a counter variable and collect a list with every 15s. D = [] for j in range(0, len(L), 15): S = [ ] for k in range(j, j+15): S.append(L[j]) D.append(S)

Collection by counting Use list comprehension D = [ [ L[k] for k in range(j, j+15) ] \ for j in range(0, len(L), 15) ] 3. Use list slicing to replace the inner loop D = [ L[j:j+15] for j in range(0, len(L), 15) ] 4. Convert each inner list into a string output = [ “ “.join(sublist) for sublist in D ]

Repeat for each group 31 Before repeating the process, define a function and name the piece of code. def format_data(L): D = [ L[j:j+15] for j in range(0, len(L), 15) ] return “ ”.join([ “ “.join(sublist) for sublist in D ]) The signature of our function is format_data( ) 

Creating a report 32 Let’s make a html report html_template = “ … %s … %s … %s … fo = open(“ ”, “w”) o1 = format_data(L1) o2 = format_data(L2) … fo.write(html_template % (o1, o2, o3, …, )) fo.close() Open the file with your browser.

Textual Visualization 33 More informative visualization by frequency counting. Make a histogram!

Multiple countings 34 A classic problem using another data structure called a dictionary or an associative array. Make a tuple (key, value) A list contains multiple tuples while maintaining each key is unique within the list.

Dictionary 35 Whenever inserting a tuple, test if the key exists already If the key exists, overwrite the value If not, append the tuple into the list (1, 3), (2, 4), (3, 1), (1, 4) {(1,3)} (1,4) (2,4) (3,1)

Dictionary 36 A dictionary is created with {} constructor. For example data = { } # an empty dictionary data = { 1: 3, 2: 4 } The key and value pair is represented by key:value within the constructor

Dictionary 37 Accessing an element requires its key The bracket [] operator takes a key data = { 1: 3, 2: 4 } print data[1] # will print 3 print data[2] # will print 4 Don’t be confused with the indexing

Dictionary 38 The type of a key and value can be anything! data = { “Tom”: “Cruise”, 1:3, (0,2):(3,4) } This is different from a sequence. print data[“Tom”] # will print Cruise print data[1] # will print 3 print data[(0,2)] # will print (3,4)

Dictionary 39 Accessing a non-key value will raise KeyError print d[“Nicole”] KeyError: “Nicole” IN operator tests the key’s existence print “Tom” in d# True print “Nicole” in d# False

Dictionary 40 A set of its keys is obtained by keys() function print d.keys() >>> [ 1, (0,2), “Tom” ] Note that the insertion order is not preserved It depends on the implementation of Python

Dictionary 41 A set of its values is obtained by values() function print d.values() >>> [ 3, (3,4), “Cruise” ] Note that the insertion order is not preserved It depends on the implementation of Python

Dictionary 42 FOR loop works well with the dictionary type for k in d: print “(“, k, “,”, v, “)” There is dictionary comprehension as well!

Dictionary 43 { j:j+1 for j in range(0,3) } >>> { 0:1, 1:2, 2:3 } { j:e for (j,e) in enumerate(L) } >>> { 0:L[0], 1:L[1], …, n:L[n] } { e:[] for e in L } # assume L = [ 1, 1, 2 ] >>> { 1:[], 2:[] }

Frequency counting 44 Make each response time value the key of counts L = [ (1,1,40), (1,1,41), … ] F = { e[2]: 0 for e in L } Initialize each value w/ 0 Duplicate keys ignored automatically!

Frequency counting 45 L = [ (1,1,40), (1,1,41), … ] F = { 36:0, 37:0, …, } Loop each element value in L and update F[key] for e in L: F[ e[2] ] += 1 F = {36: 1, 37: 1, … }

Frequency counting 46 Interestingly, keys are in increasing order! The reason is its implementation. An efficient implementation of a dictionary inevitably requires a sorted set of keys. Why? Searching is efficient with sorted data than non-sorted data