Computation with strings 2 Day 3 - 9/02/16 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University
Course organization http://www.tulane.edu/~howard/NLP/ 1.1.7. Schedule of assignments Is there anyone here that wasn't here on Monday? NLP, Prof. Howard, Tulane University 02-Sep-2016
Computer hygiene Refresh your browser! http://nomorehairloss.org/wp-content/uploads/2014/06/Hygiene.jpg NLP, Prof. Howard, Tulane University 02-Sep-2016
Scripts for the example code 2.2.5. How to use a script in Spyder NLP, Prof. Howard, Tulane University 02-Sep-2016
There are two, so far 4.1. How to create a string Note: The code for this and the next section can be downloaded as this script nlp4sec1-2.py, presumably to your Downloads folder, and then moved to pyScipts, from whence you can open it in Spyder and run each line one by one. 4.3. How to find your way around a string Note: The code for this section is found at nlp4sec3.py. NLP, Prof. Howard, Tulane University 02-Sep-2016
§4. Computation with strings What is a string? A string is a sequence of characters delimited between single or double quotes. NLP, Prof. Howard, Tulane University 02-Sep-2016
Changes to text NLP, Prof. Howard, Tulane University 02-Sep-2016
4.1.2. Object type >>> type('one') == type(1) >>> str(1) >>> str(1.0) >>> '1' == str(1) >>> '1.0' == str(1.0) >>> type('1') == type(str(1)) >>> type('1.0') == type(str(1.0)) >>> int('1') >>> float('1.0') >>> 1 == int('1') >>> 1.0 == float('1.0') >>> type(1) == type(int('1')) >>> type(1.0) == type(float('1.0')) NLP, Prof. Howard, Tulane University 02-Sep-2016
4.2.5. Practice 1 1. What types are output by len(), sort(), and set()? >>> B = 'balloon' >>> type(len(B)) <type 'int'> >>> type(sorted(B)) <type 'list'> >>> type(set(B)) <type 'set'> NLP, Prof. Howard, Tulane University 02-Sep-2016
4.2.5. Practice 1, cont. 2. Write the code to perform the changes given below on these two strings: >>> S = 'ABCDEFGH' >>> s = 'abcdefgh' a. Extract the first 3 characters of S and make them lowercase. >>> S.strip('DEFGH').lower() b. Extract the last 4 characters of s and make them uppercase. >>> s.strip('abcd').upper() c. Create a string from the first 4 characters of S and the last 4 characters of s and then switch its case. >>> S.strip('EFGH')+s.strip('abcd').swapcase() 'ABCDEFGH' >>> (S.strip('EFGH')+s.strip('abcd')).swapcase() NLP, Prof. Howard, Tulane University 02-Sep-2016
4.2.5. Practice 1, cont. 3. Here are two real life strings to work with: >>> mail = 'howard@tulane.edu' >>> url = 'http://www.tulane.edu/~howard/NLP/' a. How would you strip out the user name and the server name from my email address? >>> mail.strip('@tulane.edu') >>> 'howar' >>> mail.strip('tulane.edu').strip('@') b. Internet addresses start with the transfer protocol that the site uses. For web pages, this is usually the hypertext transfer protocol, http. How would you strip this information out to leave just the address of the book? >>> url.strip('http://') c. Following up on (b), how would you extract just Tulane’s server address? >>> url.lstrip('http://').rstrip('/~howard/NLP') NLP, Prof. Howard, Tulane University 02-Sep-2016
4.2.5. Practice 1, cont. Now we mix types: >>> day = 1 >>> month = 'Sept' >>> year = 2016 Concatenate them into the string ‘Sept. 1, 2016’. >>> month+'. '+str(day)+', '+str(year) Then split the string ‘Nov. 1, 1957’ into its three parts. >>> date = 'Nov. 1, 1957' >>> m = date.strip('. 1, 1957') >>> d = int(date.strip('Nov.').strip('1957').strip().strip(',')) >>> y = int(date.strip('Nov. 1').strip(',').strip()) NLP, Prof. Howard, Tulane University 02-Sep-2016
In a picture NLP, Prof. Howard, Tulane University 02-Sep-2016
4.3.1.3. How to slice a string >>> E[0:3] >>> E[1:4] NLP, Prof. Howard, Tulane University 02-Sep-2016
4.3.4. Practice 2 1. Write the code to perform the changes given below on these two strings, but try to using slicing rather than stripping: >>> S = 'ABCDEFGH' >>> s = 'abcdefgh' a. Make the first 3 characters of S lowercase. >>> S[:3].lower() b. Make the last 4 characters of s uppercase. >>> s[4:].upper() c. Create a string from the first 4 characters of S and the last 4 characters of s and then switch its case. >>> S[:4]+s[4:].swapcase()' 'ABCDEFGH' >>> (S[:4]+s[4:]).swapcase() 'abcdEFGH' d. Concatenate both strings and slice out every even character. >>> (S+s)[1::2] e. Concatenate both strings and reverse the order of the characters. >>> (S+s)[::-1] f. Retrieve the index of ‘E’ and ‘h’. >>> S.index('e') >>> s.index('h') NLP, Prof. Howard, Tulane University 02-Sep-2016
Others 4.3.4. Practice 2 NLP, Prof. Howard, Tulane University 02-Sep-2016
Next time §4. practices 3, 4, 5, 6, 7 -- not all of which are finished yet. NLP, Prof. Howard, Tulane University 02-Sep-2016