Download presentation
Presentation is loading. Please wait.
Published byEustacia Singleton Modified over 9 years ago
1
NLTK & BASIC TEXT STATS DAY 19 - 10/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University
2
Course organization 08-Oct-2014NLP, Prof. Howard, Tulane University 2 http://www.tulane.edu/~howard/LING3820/ http://www.tulane.edu/~howard/LING3820/ The syllabus is under construction. http://www.tulane.edu/~howard/CompCultEN/ http://www.tulane.edu/~howard/CompCultEN/ Chapter numbering 3.7. How to deal with non-English characters 3.7. How to deal with non-English characters 4.5. How to create a pattern with Unicode characters 4.5. How to create a pattern with Unicode characters 6. Control 6. Control
3
The quiz as a function in a script Review of scripts & functions 08-Oct-2014 3 NLP, Prof. Howard, Tulane University
4
Open Spyder 08-Oct-2014 4 NLP, Prof. Howard, Tulane University
5
Could you download the archive? NLTK 08-Oct-2014 5 NLP, Prof. Howard, Tulane University
6
08-Oct-2014NLP, Prof. Howard, Tulane University 6 Loading the book's texts >>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1,..., text9 and sent1,..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G. K. Chesterton 1908 >>>
7
08-Oct-2014NLP, Prof. Howard, Tulane University 7 Searching text Show every token of a word in context, called concordance view: >>> text1.concordance('monstrous') Show the words that appear in a similar range of contexts: >>> text1.similar('monstrous') Show the contexts that two words share: >>> text1.common_contexts(['whale','man'])
8
08-Oct-2014NLP, Prof. Howard, Tulane University 8 Searching text, cont. Plot how far each token of a word is from the beginning of a text. >>> text1.dispersion_plot(['monstrous']) Generate random text. >>> text1.generate()
9
08-Oct-2014NLP, Prof. Howard, Tulane University 9 Counting vocabulary Count the word and punctuation tokens in a text: >>> len(text1) List the unique words, i.e. the word types, in a text: >>> set(text1) Count how many types there are in a text: >>> len(set(text1)) Count the tokens of a word type: >>> text1.count('smote')
10
08-Oct-2014NLP, Prof. Howard, Tulane University 10 Lexical richness or diversity The lexical richness or diversity of a text can be estimated as tokens per type: >>> len(text1) / len(set(text1) The frequency of a type can be estimated as tokens per all tokens, but '/' does integer division: >>> from __future__ import division >>> 100 * text1.count('a') / len(text1)
11
There is no quiz for Monday. We will learn how to get our own text into Python & NLTK. Next time 08-Oct-2014NLP, Prof. Howard, Tulane University 11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.