LING 388: Computers and Language Lecture 11
Administrivia Homework 5 graded
List Comprehensions From last time: import re paragraph1 = 'Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do. Once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?"' paragraph2 = [re.sub("[\"',.?]", "", word) for word in paragraph1.split()]
List Comprehensions Recall also: 303 57 5.315789473684211 len(paragraph1) 303 len(paragraph2) 57 len(paragraph1)/len(paragraph2) 5.315789473684211 but that's including spaces and punctuation!
List Comprehensions How to remove spaces and punctuations from paragraph1? len(re.sub("[ \"',.?]", "", paragraph1)) 236 len(re.sub("[ \"',.?]", "", paragraph1))/len(paragraph2) 4.140350877192983
List Comprehensions [x.lower() for x in paragraph2] ['alice', 'was', 'beginning', 'to', 'get', 'very', 'tired', 'of', 'sitting', 'by', 'her', 'sister', 'on', 'the', 'bank', 'and', 'of', 'having', 'nothing', 'to', 'do', 'once', 'or', 'twice', 'she', 'had', 'peeped', 'into', 'the', 'book', 'her', 'sister', 'was', 'reading', 'but', 'it', 'had', 'no', 'pictures', 'or', 'conversations', 'in', 'it', 'and', 'what', 'is', 'the', 'use', 'of', 'a', 'book', 'thought', 'alice', 'without', 'pictures', 'or', 'conversations'] [re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()] [re.sub("[.,?'\"]", "",x) for x.lower() in paragraph1.split()] File "<stdin>", line 1 SyntaxError: can't assign to function call
List Comprehensions What does this do? from collections import Counter [re.sub("[.,?'\"]", "",x) for x in paragraph1.split() if x[0].isupper()] from collections import Counter
Counter object Compute counts of words: Counter([re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()]) Counter({'of': 3, 'or': 3, 'the': 3, 'conversations': 2, 'her': 2, 'sister': 2, 'book': 2, 'was': 2, 'alice': 2, 'had': 2, 'pictures': 2, 'to': 2, 'it': 2, 'and': 2, 'is': 1, 'what': 1, 'in': 1, 'twice': 1, 'she': 1, 'on': 1, 'peeped': 1, 'sitting': 1, 'a': 1, 'tired': 1, 'once': 1, 'bank': 1, 'beginning': 1, 'do': 1, 'thought': 1, 'get': 1, 'by': 1, 'no': 1, 'nothing': 1, 'having': 1, 'use': 1, 'into': 1, 'reading': 1, 'without': 1, 'very': 1, 'but': 1}) c = Counter([re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()]) c.most_common(5) [('of', 3), ('or', 3), ('the', 3), ('conversations', 2), ('her', 2)] c.most_common(6) [('of', 3), ('or', 3), ('the', 3), ('conversations', 2), ('her', 2), ('sister', 2)] Elements with equal counts are ordered arbitrarily
Counter object Enter this: See also from collections import Counter c = Counter() c['alice'] += 1 c['a'] += 1 c c.most_common() See also https://docs.python.org/3/library/collections.html#collections.Counter
Counter object Alternatively (from stackoverflow.com):
Homework 5 revisited http://www.bbc.com/earth/story/20160826-why-our-ancestors-drilled- holes-in-each-others-skulls
Homework 5 revisited Assume file handling: f = open("trepanation.txt") s = f.read() f.close() Let's use what we've learnt today in class …