NLTK & BASIC TEXT STATS DAY 19 - 10/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.

Slides:



Advertisements
Similar presentations
Regular expressions Day 2
Advertisements

Text Corpora and Lexical Resources Chapter 2 of Natural Language Processing with Python.
NLTK & Python Day 4 LING Computational Linguistics Harry Howard Tulane University.
Strings and regular expressions Day 10 LING Computational Linguistics Harry Howard Tulane University.
Introduction to Natural Language Processing Source: Natural Language Processing with Python --- Analyzing Text with the Natural Language Toolkit.
TEXT STATISTICS 1 DAY /20/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TEXT STATISTICS 7 DAY /05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
Python for NLP and the Natural Language Toolkit CS1573: AI Application Development, Spring 2003 (modified from Edward Loper’s notes)
UNICODE & CONTROL DAY /24/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
SPEECH RECOGNITION LEXICON DAY 19 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
TEXT STATISTICS 5 DAY /29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Lecture 3 Ngrams Topics Python NLTK N – grams SmoothingReadings: Chapter 4 – Jurafsky and Martin January 23, 2013 CSCE 771 Natural Language Processing.
COMPUTATION WITH STRINGS 4 DAY 5 - 9/05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
ON-LINE DOCUMENTS 3 DAY /17/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
UNICODE DAY /22/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
NLTK & Python Day 7 LING Computational Linguistics Harry Howard Tulane University.
COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
SCRIPTS & FUNCTIONS DAY /06/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TWITTER DAY /07/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
TWITTER 2 DAY /10/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Information extraction 2 Day 37 LING Computational Linguistics Harry Howard Tulane University.
WEB TEXT DAY /14/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Regular Expressions The ultimate tool for textual analysis.
Tokenization & POS-Tagging
NLTK & Python Day 5 LING Computational Linguistics Harry Howard Tulane University.
REGULAR EXPRESSIONS 3 DAY 8 - 9/12/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
REGULAR EXPRESSIONS 4 DAY 9 - 9/15/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
REGULAR EXPRESSIONS 2 DAY 7 - 9/10/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
NLTK & Python Day 6 LING Computational Linguistics Harry Howard Tulane University.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
NLTK & Python Day 8 LING Computational Linguistics Harry Howard Tulane University.
REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TEXT STATISTICS 3 DAY /24/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
ON-LINE DOCUMENTS DAY /13/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
CONTROL 2 DAY /26/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TWITTER 3 DAY /12/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 3 DAY 4 - 9/03/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
CONTROL 3 DAY /29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing
LING 3820 & 6820 Natural Language Processing Harry Howard
NLTK Natural Language Processing with Python, Steven Bird, Ewan Klein, and Edward Loper, O'REILLY, 2009.
Lists 2 Day /19/14 LING 3820 & 6820 Natural Language Processing
CSCE 590 Web Scraping – NLTK
Flat text Day 6 - 9/12/16 LING 3820 & 6820 Natural Language Processing
Computation with strings 2 Day 3 - 9/02/16
Natural Language Processing (NLP)
Flat text 2 Day 7 - 9/14/16 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Flat text 3 Day 8 - 9/16/16 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Computation with strings 3 Day 4 - 9/07/16
Computation with strings 1 Day 2 - 8/31/16
LING 388: Computers and Language
Regular expressions 2 Day /23/16
LING 3820 & 6820 Natural Language Processing Harry Howard
control 4 Day /01/14 LING 3820 & 6820 Natural Language Processing
LING 3820 & 6820 Natural Language Processing Harry Howard
LING 388: Computers and Language
MATHS Wombwell Park Street Primary School Working at the
Control 3 Day /05/16 LING 3820 & 6820 Natural Language Processing
NLP 2 Day /07/16 LING 3820 & 6820 Natural Language Processing
CSCE 771 Natural Language Processing
Regular expressions 3 Day /26/16
Natural Language Processing (NLP)
Computation with strings 4 Day 5 - 9/09/16
Control 1 Day /30/16 LING 3820 & 6820 Natural Language Processing
Natural Language Processing (NLP)
Presentation transcript:

NLTK & BASIC TEXT STATS DAY /08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University

Course organization 08-Oct-2014NLP, Prof. Howard, Tulane University 2   The syllabus is under construction.   Chapter numbering  3.7. How to deal with non-English characters 3.7. How to deal with non-English characters  4.5. How to create a pattern with Unicode characters 4.5. How to create a pattern with Unicode characters  6. Control 6. Control

The quiz as a function in a script Review of scripts & functions 08-Oct NLP, Prof. Howard, Tulane University

Open Spyder 08-Oct NLP, Prof. Howard, Tulane University

Could you download the archive? NLTK 08-Oct NLP, Prof. Howard, Tulane University

08-Oct-2014NLP, Prof. Howard, Tulane University 6 Loading the book's texts >>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1,..., text9 and sent1,..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G. K. Chesterton 1908 >>>

08-Oct-2014NLP, Prof. Howard, Tulane University 7 Searching text  Show every token of a word in context, called concordance view: >>> text1.concordance('monstrous')  Show the words that appear in a similar range of contexts: >>> text1.similar('monstrous')  Show the contexts that two words share: >>> text1.common_contexts(['whale','man'])

08-Oct-2014NLP, Prof. Howard, Tulane University 8 Searching text, cont.  Plot how far each token of a word is from the beginning of a text. >>> text1.dispersion_plot(['monstrous'])  Generate random text. >>> text1.generate()

08-Oct-2014NLP, Prof. Howard, Tulane University 9 Counting vocabulary  Count the word and punctuation tokens in a text: >>> len(text1)  List the unique words, i.e. the word types, in a text: >>> set(text1)  Count how many types there are in a text: >>> len(set(text1))  Count the tokens of a word type: >>> text1.count('smote')

08-Oct-2014NLP, Prof. Howard, Tulane University 10 Lexical richness or diversity  The lexical richness or diversity of a text can be estimated as tokens per type: >>> len(text1) / len(set(text1)  The frequency of a type can be estimated as tokens per all tokens, but '/' does integer division: >>> from __future__ import division >>> 100 * text1.count('a') / len(text1)

There is no quiz for Monday. We will learn how to get our own text into Python & NLTK. Next time 08-Oct-2014NLP, Prof. Howard, Tulane University 11