Download presentation
Presentation is loading. Please wait.
1
LING 388: Computers and Language
Lecture 11
2
Administrivia Homework 5 graded
3
List Comprehensions From last time: import re
paragraph1 = 'Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do. Once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?"' paragraph2 = [re.sub("[\"',.?]", "", word) for word in paragraph1.split()]
4
List Comprehensions Recall also: 303 57 5.315789473684211
len(paragraph1) 303 len(paragraph2) 57 len(paragraph1)/len(paragraph2) but that's including spaces and punctuation!
5
List Comprehensions How to remove spaces and punctuations from paragraph1? len(re.sub("[ \"',.?]", "", paragraph1)) 236 len(re.sub("[ \"',.?]", "", paragraph1))/len(paragraph2)
6
List Comprehensions [x.lower() for x in paragraph2]
['alice', 'was', 'beginning', 'to', 'get', 'very', 'tired', 'of', 'sitting', 'by', 'her', 'sister', 'on', 'the', 'bank', 'and', 'of', 'having', 'nothing', 'to', 'do', 'once', 'or', 'twice', 'she', 'had', 'peeped', 'into', 'the', 'book', 'her', 'sister', 'was', 'reading', 'but', 'it', 'had', 'no', 'pictures', 'or', 'conversations', 'in', 'it', 'and', 'what', 'is', 'the', 'use', 'of', 'a', 'book', 'thought', 'alice', 'without', 'pictures', 'or', 'conversations'] [re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()] [re.sub("[.,?'\"]", "",x) for x.lower() in paragraph1.split()] File "<stdin>", line 1 SyntaxError: can't assign to function call
7
List Comprehensions What does this do? from collections import Counter
[re.sub("[.,?'\"]", "",x) for x in paragraph1.split() if x[0].isupper()] from collections import Counter
8
Counter object Compute counts of words:
Counter([re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()]) Counter({'of': 3, 'or': 3, 'the': 3, 'conversations': 2, 'her': 2, 'sister': 2, 'book': 2, 'was': 2, 'alice': 2, 'had': 2, 'pictures': 2, 'to': 2, 'it': 2, 'and': 2, 'is': 1, 'what': 1, 'in': 1, 'twice': 1, 'she': 1, 'on': 1, 'peeped': 1, 'sitting': 1, 'a': 1, 'tired': 1, 'once': 1, 'bank': 1, 'beginning': 1, 'do': 1, 'thought': 1, 'get': 1, 'by': 1, 'no': 1, 'nothing': 1, 'having': 1, 'use': 1, 'into': 1, 'reading': 1, 'without': 1, 'very': 1, 'but': 1}) c = Counter([re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()]) c.most_common(5) [('of', 3), ('or', 3), ('the', 3), ('conversations', 2), ('her', 2)] c.most_common(6) [('of', 3), ('or', 3), ('the', 3), ('conversations', 2), ('her', 2), ('sister', 2)] Elements with equal counts are ordered arbitrarily
9
Counter object Enter this: See also from collections import Counter
c = Counter() c['alice'] += 1 c['a'] += 1 c c.most_common() See also
10
Counter object Alternatively (from stackoverflow.com):
11
Homework 5 revisited holes-in-each-others-skulls
12
Homework 5 revisited Assume file handling:
f = open("trepanation.txt") s = f.read() f.close() Let's use what we've learnt today in class …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.