Download presentation
Presentation is loading. Please wait.
1
LING 388: Computers and Language
Lecture 19
2
nltk book: Language Processing and Python
2 A Closer Look at Python: Texts as Lists of Words: Assuming sent1,..,sent9 from nltk.book import *
3
nltk book: Language Processing and Python
sent2, sent3 + for concatenation
4
nltk book: Language Processing and Python
.append() to the end of the list (mutates the list) .append() vs .extend() to the end of the list: from stackoverflow.com
5
nltk book: Language Processing and Python
Indexing [<index>]: Slices [<index>:<index>]: (can omit either <index>, default value)
6
nltk book: Language Processing and Python
We know indexing works on strings (as well as lists): Repetition (*), Concatenation (+): .join() .split()
7
nltk book: Language Processing and Python
Understanding check: Answer: Last two words by alphabetic sorting…
8
nltk book: Language Processing and Python
3.1 Frequency Distributions methods: .plot() .most_common() .hapaxes()
9
nltk book: Language Processing and Python
10
nltk book: Language Processing and Python
specifically relevant to Moby Dick; other reported words are generic "English plumbing"
11
nltk book: Language Processing and Python
12
nltk book: Language Processing and Python
Extract long words (using list comprehension):
13
nltk book: Language Processing and Python
text5: chat corpus Pick out all the words longer than 7 characters that occur more than 7 times (using list comprehension) and sort them:
14
nltk book: Language Processing and Python
Classes: FreqDist vs. Text
15
nltk book: Language Processing and Python
Word length distribution (3.4 Counting Other Things)
16
nltk book: Language Processing and Python
fdistl1.plot() fdistl1.plot(cumulative=True)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.