Computation with strings 3 Day 4 - 9/07/16 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University
Course organization http://www.tulane.edu/~howard/NLP/ 1.1.7. Schedule of assignments Is there anyone here that wasn't here last week? NLP, Prof. Howard, Tulane University 07-Sep-2016
Scripts for the example code 2.2.5. How to use a script in Spyder NLP, Prof. Howard, Tulane University 07-Sep-2016
§4. Computation with strings What is a string? A string is a sequence of characters delimited between single or double quotes. NLP, Prof. Howard, Tulane University 07-Sep-2016
4.3.4. Practice 2 1. Write the code to perform the changes given below on these two strings, but try to using slicing rather than stripping: >>> S = 'ABCDEFGH' >>> s = 'abcdefgh' a. Make the first 3 characters of S lowercase. >>> S[:3].lower() b. Make the last 4 characters of s uppercase. >>> s[4:].upper() c. Create a string from the first 4 characters of S and the last 4 characters of s and then switch its case. >>> S[:4]+s[4:].swapcase() 'ABCDEFGH' >>> (S[:4]+s[4:]).swapcase() 'abcdEFGH' d. Concatenate both strings and slice out every even character. >>> (S+s)[1::2] e. Concatenate both strings and reverse the order of the characters. >>> (S+s)[::-1] f. Retrieve the index of ‘E’ and ‘h’. >>> S.index('e') >>> s.index('h') NLP, Prof. Howard, Tulane University 07-Sep-2016
4.3.4. Practice 2, cont. Slice the string ‘Nov. 1, 1957’ into month (a string), day (an integer), and year (an integer). month = str[:3] day = int(str[5]) year = int(str[-4:]) Use this sort of embedded index/find() to slice the following strings out of E: E[E.find('ab'):2] E[E.rfind('bc'):E.rfind('d')] E[E.find('d'):] NLP, Prof. Howard, Tulane University 07-Sep-2016
4.3.4. Practice 2, cont. 5 a) s[:s.find('tion')] b) p[p.find('anti'):] returns whole word c) s.strip('tion'), p.strip('anti') 6) w2[len('anti'):], w1[:-len('tion')] 7) What is the longest sequence of operators that you can make? NLP, Prof. Howard, Tulane University 07-Sep-2016
4.4.4. Practice 3 1. Show which of len(), sorted(), and set() takes the most time to process. >>> timeit.timeit(f3) 0.1531529426574707 >>> timeit.timeit(f4) 0.9779191017150879 >>> timeit.timeit(f5) 0.8416359424591064 NLP, Prof. Howard, Tulane University 07-Sep-2016
4.4.4. Practice 3, cont. In Operator iteration there is a line with the expression L[2:6].capitalize().upper(). Show whether it is faster for Python to process it as it is, or broken into its more-easily-readable parts. >>> timeit.timeit(f6) 0.5537698268890381 >>> timeit.timeit(f7) 0.6827390193939209 NLP, Prof. Howard, Tulane University 07-Sep-2016
4.4.4. Practice 3, cont. 3. Recall the discussion of the default precedence of * and + in Operator precedence. Perhaps * comes first because it is quicker to process than +. Test this hypothesis by writing a function for either combination of * and + and time them to see which runs faster. >>> timeit.timeit(f8) 0.55234694480896 >>> timeit.timeit(f9) 0.6955819129943848 NLP, Prof. Howard, Tulane University 07-Sep-2016
4.5.1. Practice 4 # 1. Combine them into a single string with proper spacing for two sentences. Check it with print. >>> lim = L1+' '+L2+' '+L3+' '+L4+' '+L5 >>> print lim # 2. Now combine them into a single string so that each one prints out on its# own line. This is very hard to do, but the answer is hidden in line 20 above. >>> lim2 = L1+' '+L2+'\n'+L3+' '+L4+' '+L5 >>> print lim2 # Finally, modify the string you developed in #2 so so it prints out to the conventional form of limericks, [NEW!] five lines with the third and forth indented by two spaces. >>> lim3 = L1+'\n'+L2+'\n '+L3+'\n '+L4+'\n'+L5 >>> print lim3 NLP, Prof. Howard, Tulane University 07-Sep-2016
4.6.4. Practice 5 Which of these are illegitimate names?: Mutability prevents you from slicing a character into a string, but there is still a sneaky way of transforming ‘name’ into ‘game’. Do you recall it? >>> 'name'.replace('n','g') NLP, Prof. Howard, Tulane University 07-Sep-2016
4.7.4. Practice 6 Take a look at English terms with diacritical marks, select a (lowercase) word with a non-ascii character in it and see if you can print it to the console as uppercase. NLP, Prof. Howard, Tulane University 07-Sep-2016
Next time Finish §4 and remaining practices. NLP, Prof. Howard, Tulane University 07-Sep-2016