Download presentation
Presentation is loading. Please wait.
Published byHector Young Modified over 9 years ago
1
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit
2
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Overview The NLTK is a set of Python modules to carry out many common natural language tasks. Access it at nltk.sourceforge.netnltk.sourceforge.net There are versions for Windows, OS X, Unix, Linux. Detailed instructions on Installation tab In addition to the toolkit you will need two other modules: tkinter and Numeric. We haven’t been able to get numeric to install smoothly with Python 2.4 under Windows, only with 2.3. You do also want the contrib and data packages. Pay attention to what INSTALL.TXT in the data package says about the NLTK_CORPORA path.
3
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Accessing NLTK Standard Python import command >>> from nltk.corpus import gutenberg >>> gutenberg.items() ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'blake-songs.txt', 'chesterton- ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'milton- paradise.txt', 'shakespeare-caesar.txt', 'shakespeare- hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt'] Or >>> import nltk.corpus >>> nltk.corpus.gutenberg.items() ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'blake-songs.txt', 'chesterton- ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'milton- paradise.txt', 'shakespeare-caesar.txt', 'shakespeare- hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']
4
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Modules The NLTK modules include: –token: classes for representing and processing individual elements of text, such as words and sentences –probability: classes for representing and processing probabilistic information. –tree: classes for representing and processing hierarchical information over text. –cfg: classes for representing and processing context free grammars. –fsa: finite state automata –tagger: tagging each word with a part-of-speech, a sense, etc –parser: building trees over text (includes chart, chunk and probabilistic parsers) –classifier: classify text into categories (includes feature, featureSelection, maxent, naivebayes –draw: visualize NLP structures and processes –corpus: access (tagged) corpus data We will cover some of these explicitly as we reach topics.
5
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html One Simple Example IDLE 1.0.3 >>> from nltk.tokenizer import * >>> text_token = Token(TEXT='Hello world. This is a test file.') >>> print text_token >>> WhitespaceTokenizer(SUBTOKENS='WORDS').tokenize(text_token) >>> print text_token,,,,,, ]> >>> print text_token['TEXT'] Hello world. This is a test file. >>> print text_token['WORDS'] [,,,,,, ]
6
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html LAB Detailed documentation and tutorials under the Documentation tab at the Sourceforge site. Work through the “gentle introduction” and “elementary language processing” tutorials on the NLTK: nltk.sourceforge.net/tutorial/introduction/index.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.