Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 388: Computers and Language

Similar presentations


Presentation on theme: "LING 388: Computers and Language"— Presentation transcript:

1 LING 388: Computers and Language
Lecture 11

2 Administrivia Homework 5 graded

3 List Comprehensions From last time: import re
paragraph1 = 'Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do. Once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?"' paragraph2 = [re.sub("[\"',.?]", "", word) for word in paragraph1.split()]

4 List Comprehensions Recall also: 303 57 5.315789473684211
len(paragraph1) 303 len(paragraph2) 57 len(paragraph1)/len(paragraph2) but that's including spaces and punctuation!

5 List Comprehensions How to remove spaces and punctuations from paragraph1? len(re.sub("[ \"',.?]", "", paragraph1)) 236 len(re.sub("[ \"',.?]", "", paragraph1))/len(paragraph2)

6 List Comprehensions [x.lower() for x in paragraph2]
['alice', 'was', 'beginning', 'to', 'get', 'very', 'tired', 'of', 'sitting', 'by', 'her', 'sister', 'on', 'the', 'bank', 'and', 'of', 'having', 'nothing', 'to', 'do', 'once', 'or', 'twice', 'she', 'had', 'peeped', 'into', 'the', 'book', 'her', 'sister', 'was', 'reading', 'but', 'it', 'had', 'no', 'pictures', 'or', 'conversations', 'in', 'it', 'and', 'what', 'is', 'the', 'use', 'of', 'a', 'book', 'thought', 'alice', 'without', 'pictures', 'or', 'conversations'] [re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()]  [re.sub("[.,?'\"]", "",x) for x.lower() in paragraph1.split()]   File "<stdin>", line 1 SyntaxError: can't assign to function call

7 List Comprehensions What does this do? from collections import Counter
[re.sub("[.,?'\"]", "",x) for x in paragraph1.split() if x[0].isupper()] from collections import Counter

8 Counter object Compute counts of words:
Counter([re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()]) Counter({'of': 3, 'or': 3, 'the': 3, 'conversations': 2, 'her': 2, 'sister': 2, 'book': 2, 'was': 2, 'alice': 2, 'had': 2, 'pictures': 2, 'to': 2, 'it': 2, 'and': 2, 'is': 1, 'what': 1, 'in': 1, 'twice': 1, 'she': 1, 'on': 1, 'peeped': 1, 'sitting': 1, 'a': 1, 'tired': 1, 'once': 1, 'bank': 1, 'beginning': 1, 'do': 1, 'thought': 1, 'get': 1, 'by': 1, 'no': 1, 'nothing': 1, 'having': 1, 'use': 1, 'into': 1, 'reading': 1, 'without': 1, 'very': 1, 'but': 1}) c = Counter([re.sub("[.,?'\"]", "",x.lower()) for x in paragraph1.split()]) c.most_common(5) [('of', 3), ('or', 3), ('the', 3), ('conversations', 2), ('her', 2)] c.most_common(6) [('of', 3), ('or', 3), ('the', 3), ('conversations', 2), ('her', 2), ('sister', 2)] Elements with equal counts are ordered arbitrarily

9 Counter object Enter this: See also from collections import Counter
c = Counter() c['alice'] += 1 c['a'] += 1 c c.most_common() See also

10 Counter object Alternatively (from stackoverflow.com):

11 Homework 5 revisited holes-in-each-others-skulls

12 Homework 5 revisited Assume file handling:
f = open("trepanation.txt") s = f.read() f.close() Let's use what we've learnt today in class …


Download ppt "LING 388: Computers and Language"

Similar presentations


Ads by Google