Presentation is loading. Please wait.

Presentation is loading. Please wait.

 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket.

Similar presentations


Presentation on theme: " 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket."— Presentation transcript:

1  1 Week3: Files and Strings

2 List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket [] operator. Using an index, we can access any element of a list. We can read a value like L[0], L[1], … We can change a value like L[0] = 1, L[2] = “Mine” 2

3 Tuple Tuple is a list but not modifiable. Once a tuple is created, it can’t be mutated. T = (“Samsung”, “2013/04”, 30) Accessing by indices, T[0], T[1], … No way to change its element T[0] = “Apple” Don’t be confused with packing – unpacking! 3

4 String The string type is same as a tuple with characters. Name = “Tom” print Name[0], Name[1] # will print “T o” Strings provide many convenient functions. split(): separate a sentence into words strip(): remove the leading and trailing whitespaces find(): find a word from a sentence 4

5  5 String data type

6 String 6 String is a tuple of characters. Instead of () construction operator, use “ or ‘ Each character in a string can be accessed by [] operator. print L[0]# will print ‘T’ print L[2]# will print ‘M’ TOM …”“ S

7  7 Substring

8 8 Substring is a part of a string including itself and “” string. A substring of a string can be accessed by [] operator with a slice representation. A slice is similar to range function but has a simpler form. range(1,3)  [1:3], range(1,len(S))  [1:] S = “abcdefg” S[1:3]  ”bcd”S[1:]  “bcdefg”

9 More Substring Examples 9 Substring is a part of a string including itself and “” string. A substring of a string can be accessed by [] operator with a slice representation. A slice is similar to range function but has a simpler form. range(1,3)  [1:3], range(1,len(S))  [1:] S = “abcdefg” S[1:3]  ”bcd”S[1:]  “bcdefg”

10  10 Indexing for List/Tuple/String

11 More Substring Examples 11 S = “I love honey” S[-1]  “y”# a negative index counts from the last S[0:5:2]  “Ilv” # we can use steps; 0, 2, 4 th chars. S[-1:-6:-1]  ”yenoh” # using a negative step S[:5]  “I lov” # an empty beginning means 0 S[2:]  “love honey” # an empty limit means the length of a string

12 List and tuple indexing 12 Index is universal for a list and a tuple as well L = [ 4,3,2,”hello” ] or T = ( 4,3,2,”hello” ) L[1:]  [3,2,”hello”] T[:2]  (4,3) L[::2]  [ 3, “hello” ]  [ 4, 2 ] T[::-1]  (“hello”, 2, 3, 4 )

13 Indexing Summary 13 An index used in the bracket [ ] operator has a form of :[:] The beginning index or 0 The limit of slicing or the length of a given sequence The step of slicing or 1

14  14 Special Characters

15 15 Whitespaces Tab ‘\t’: a fixed number of spaces (8 spaces) Carriage return ‘\r’: move to the next line New line ‘\n’: the beginning of the (next) line Space ‘ ‘ \t\r\n

16  16 String Comparisons

17 String comparisons 17 The equality operators (‘==‘ and ‘!=‘) work same with strings in the case sensitive manner. “Hello” == “hello”  False “Abc” == “Abcd”  False “Hello” != “hello”  True “Abc” != “Abcd”  True a = “hello” a == “”  False a == “hello”  True

18 String comparisons 18 The comparison operators (>, =, <=) compare a pair of string in the lexicographical order. A string which appears first in a dictionary is smaller than another. “abc” < “bcd”  True “Abc” < “abc”  True (Uppercases come first) “abc” < “abcd”  True (If lengths are different, a smaller length comes first) “1abc” < “abcd”  True (Numbers come first)

19 String comparisons 19 We can use the min2(x,y) function used in the previous quiz. min2(min2(“Tom”, “Batty”), “Kim”)  “Batty” Similarly, max2(x,y) function could be used. min2(min2(“Tom”, “Batty”), “Kim”)  “Tom”

20 String comparisons count_Tom(count, x) could be defined: def count_Tom(count,x): if x == “Tom”: return count + 1 else return count count_if([“Tom”, “Batty”, “Tom”, “Kim”], count_Tom)  2 20

21  21 Python Feature: Ternary If-Else Statement

22 Quick If-Else Operation 22 def count_Tom(count,x): if x == “Tom”: return count + 1 else return count The above function is too long compared to its logic. In a short form, we can use count = count + 1 if x == “Tome” else count

23  23 Data as a string or a list of strings

24 Number and String 24 Numbers have a string representation Such as 1  “1” 103  “103” 32  “032” Converting an integer to a string is done by str() function. str(1)  “1”, str(103)  “103”, str(32)  “32”, …

25 Number and String 25 A string is converted back to a number by int() or float() function. The use of int or float is decided by a programmer, by you. int(“32”)  32 int(“-33”)  -33 “32” + “33”  “3233” “32” + 33  will raise a TypeError int(“32”) + 33  65

26 Number and String 26 The int() function cannot handle a float number. int(“3.24”)  ValueError: invalid literal for int() with base 10: '3.24’ If ‘.’ is in a string, use float() function float(“3.24”)  3.24 These casting functions only work with valid strings. int(“ 3 4 “), float(“3.24 3.25”), int(“ x3”)  ValueError

27 String as a record 27 A string is a wonderful record. Storing a sequence of integers: my_nums = “1,2,3,4,5” your_nums = “1 2 3 4 5” Heterogeneous data tuple: me = “Joohwi 10/24 5.11” you = “Tom 3/2 5.6”

28 String as a collection 28 A string can contain multiple records. class_info = “Tom 732 Dave 733 Dorothy 734 … “ another_class_info = “Tom 732, Dave 733, Dorothy 734, …” maybe_another_info = “Tom 732\nDave 733\nDorothy 734\n…”

29 Find a substring from a string 29 S = “Romeo, Juliet, Mulan, Fiona” I want to know if “Mickey” is included in a data of S. Easy! Use IN function. “Romeo” IN S  True “Jul” IN S  True However, IN operator doesn’t tell you where it is. Instead IN operator, use find() function.

30 ‘find()’ function 30 Ask the given string S, if it has a w. S.find(w)  will return the index where w starts. find() function will return the position of w, which is a substring of S. “Mickey Mouse”.find(“Mouse”)  7 S = “Mini Mouse” S.find(“Mouse”)  5

31 ‘replace()’ function 31 How to correct a string? Use replace(u, v) function. The replace() function of the string type will replace a substring u to another substring v. S = “Hello” S.replace(“ello”, “ELLO”)  produce “HELLO” Note that replace() function always creates a new string.

32 String Formatting 32 Formatting an output is a inarguably frequently used function. From a tuple T = (“Tom”, “Jack”, “Kim”), Let’s make greetings for each name. Old way: print “Hello,”, T[0]  Hello, Tom print “Hello”,, T[1]  Hello, Jack

33 String Formatting 33 A new way! Prepare a format string (output pattern) message_template = “Hello, {name}” Replace the placeholder with an actual value; Tom, Jack,.. message_template.replace(“{name}”, “Tom”)  “Hello, Tom”

34 String Formatting Assumption? The substring {name} cannot be contained in the output. Another Example: form = “{Name}’s score is {Score}” data = [ (“Tom”, 100), (“Jack”, 99) ] Replace() function will help form.replace(“{Name}”, data[0][0]).replace(“{Score}”, data[0][1])  “Tom’s score is 100” 34

35 String Formatting form.replace(“{Name}”, data[0][0]).replace(“{Score}”, data[0][1])  “Tom’s score is 100” will raise ValueError. The data[0][1]’s type is an integer and cannot replace a substring. Instead, use str() function to convert a number into a string. form.replace(“{Name}”, data[0][0]).replace(“{Score}”, str(data[0][1])) 35

36 String Formatting It is inconvenient to use str() function for every data. What if we have a marker giving a hint of the data type in a format string? We have the feature already. The ‘%’ operator for a string will do that. “%s’s score is %d” % (“Tom”, 100)  “Tom’s score is 100” 36

37 String Formatting The placeholders in a given format string is replaced by a tuple given to ‘%’ operator together with the format string. “%s %s %s” % (“I”, “am”, “a boy”)  “I am a boy” “%d %s %f” % (3, “>”, 2.9)  “3 > 2.9” ‘%s’, ‘%d’, and ‘%f’ takes a string, a decimal, and a float number, respectively. 37

38  38 File

39 File is a string stored in an external storage. File is another source of input. It is useful in reading huge data automatically. Otherwise, we have to type into our Python code. File is another destination for output. You can store your data permanently into HDD. 39

40 Data Processing Data flows from a source to a destination. Common practice for data processing 1. Read a file and make a string or strings separated by lines. 2. Transform each line into a tuple. Numeric strings are transformed into a float or an int Date and time strings are transformed into a datetime object 3. Process those tuples and produce outputs 4. Format the processed data into a string 40

41  41 Reading a list of strings from a file

42 Example Data: CSV format Name, Date, Amount Galaxy S5, 2014/05, 32 iPhone 5s, 2014/05, 108 Galaxy Note, 2014/05, 12 iPhone 4, 2014/05, 7 Galaxy S5, 2014/04, 98 Galaxy Note, 2014/04, 1 Moto X, 2014/04, 16 iPhone 5s, 2014/04, 99 42

43 Reading a file 43 File is an external resource. In order to read a file, the operating system must help. The behavior of a file might be different from Mac and Windows. Locating a file is done by a path name. Basic knowledge on file system is assumed in this class.

44 How to read a file 44 Python provides open() function. open(,,…)  The file object provides a set of functions to access a file.

45 Example of file reading 45 List  Tuple  String  File f = open(“test.csv”, “r”) lines = f.readlines() f.close() The lines variable has a list of strings, which is each line of the file, ‘test.csv’

46 Readlines() 46 print lines >>> ['Name, Date, Amount\n', 'Galaxy S5, 2014/05, 32\n', 'iPhone 5s, 2014/05, 108\n', 'Galaxy Note, 2014/05, 12\n', 'iPhone 4, 2014/05, 7\n', 'Galaxy S5, 2014/04, 98\n', 'Galaxy Note, 2014/04, 1\n', 'Moto X, 2014/04, 16\n', 'iPhone 5s, 2014/04, 99\n']

47  47 File Processing

48 First Processing 48 Remove the leading and trailing whitespaces (\n). strip() function will do this. We have to apply the strip() function for the entire elements of the list. How? This is an example of mapping! [ x0.strip(), x1.strip(), x2.strip(), …, xn.strip() ]  [x0, x1, x2, …, xn ] Use list comprehension! or collect_mapping_if()

49 Remove Whitespaces 49 lines = [ l.strip() for l in lines ]

50 String to Words 50 After removing whitespaces, each word should be separated by delimiters; for example, ‘, ’ here. Let’s do this. words = [ l.split(“, “) for l in lines ]

51 String to int 51 We have a list of words for each line. They are all strings. To compute the amount as a number, a type conversion from string to int is required. int() function will convert a string to an integer.

52 String to int 52 Let’s do this tuples = [ (w[0], w[1], int(w[2]) for w in words[1:] ] Why words[1:]? What it means?

53 Ready for Processing 53 Now, we have a proper form of data. Each individual item is separated from a string. A number has been converted to an integer to perform algebraic operations. We are ready for further analysis.

54 More Problems 54 What if a file is too large to load into our memory? What if data is stored in a different format? What other formats do we need? In this class, we will deal with, CSV, XML, and Excel What if data is scattered into many different files? What if data is related within data? How could we represent relationships such as social network?

55  55 Writing strings into a file

56 How to write a file? 56 Let’s write the contents we have back into a file. In the same way, open a file. f = open(“testout.csv”, “w”) “w” states that the file is used for writing.

57 How to write a file? 57 Use a write function of a file and give a string. String formatting via % operator. “%s, %s, %d” % (w[0], w[1], w[2]) When % operator is used with a string, it is format operator, not modulo. for w in words: f.write(“%s, %s, %d\n” % (w[0], w[1], w[2]))


Download ppt " 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket."

Similar presentations


Ads by Google