Presentation is loading. Please wait.

Presentation is loading. Please wait.

NLTK (Natural Language Tool Kit) Unix for Poets (without Unix) Unix  Python.

Similar presentations


Presentation on theme: "NLTK (Natural Language Tool Kit) Unix for Poets (without Unix) Unix  Python."— Presentation transcript:

1 NLTK (Natural Language Tool Kit) http://www.nltk.org/ http://www.nltk.org/ Unix for Poets (without Unix) Unix  Python

2 Homework #4 No need to buy the book – Free online at http://www.nltk.org/bookhttp://www.nltk.org/book Read Chapter 1 – http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html Start with exercise 22 and go as far as you can – Exercise 23: Solve however you like – (no need to use for and if) Due Tuesday at sunrise – Send email to Kenneth.Church@jhu.eduKenneth.Church@jhu.edu

3 Installing Chapter 01: pp. 1 - 4 – Python – NLTK – Data

4 George Miller’s Example: Erode Exercise: Use “erode” in a sentence: – My family erodes a lot. to eat into or away; destroy by slow consumption or disintegration – Battery acid had eroded the engine. – Inflation erodes the value of our money. Miller’s Conclusion: – Dictionary examples are more helpful than defs Definition Examples George Miller: Chomsky’s Mentor & Wordnet

5 Introduction to Programming Traditional (Start with Definitions) Constants: 1 Variables: x Objects: – lists, strings, arrays, matrices Expressions: 1+x Statements: Side Effects – print 1+x; Conditionals: – If (x<=1) return 1; Iteration: for loops Functions Recursion Streams Non-Traditional (Start with Examples) Recursion def fact(x): if(x <= 1): return 1 else: return x * fact(x-1) Streams: – Unix Pipes Briefly mentioned – Everything else

6 Python def fact(x): if(x <= 1): return 1 else: return x * fact(x-1) def fact2(x): result=1 for i in range(x): result *=(i+1); return result Exercise: Fibonacci in Python Recursion Iteration

7 Flatten: List  String >>> def flatten(list): if(len(list) == 1): return list[0]; else: return list[0] + ' ' + flatten(list[1:len(list)]); First Rest flatten = split -1

8 Python Objects Lists >>> sent1 ['Call', 'me', 'Ishmael', '.'] >>> type(sent1) >>> sent1[0] 'Call' >>> sent1[1:len(sent1)] ['me', 'Ishmael', '.'] Strings >>> sent1[0] 'Call' >>> type(sent1[0]) >>> sent1[0][0] 'C' >>> sent1[0][1:len(sent1[0])] 'all' First Rest

9 Types & Tokens Polymorphism

10 Polymorphism (From Wikipedia)

11 Tokens Types

12 FreqDist Tokens Types

13

14 Concordances

15 URLs (Chapter 3)

16

17 HTML

18 Works with almost any URL! >>>url="https://jshare.johnshopkins.edu/kchurch4/public _html/teaching/103/Lecture07/WebProgramming/java script_example_with_sounds.html" >>> def url2text(url): html = urlopen(url).read() raw = nltk.clean_html(html) tokens = nltk.word_tokenize(raw) return nltk.Text(tokens) >>> text=url2text(url) >>> text.concordance('Nonsense')

19 An Equivalence Relation (= R ) A Partition of S ≡ Set of Subsets of S –Mutually Exclusive & Exhaustive Equivalence Classes ≡ A Partition such that –All the elements in a class are equivalent (with respect to = R ) –No element from one class is equivalent to an element from another Example: Partition integers into evens & odds Even integers: 2,4,6… Odd integers: 1,3,5… –x = R y  x has the same parity as y Three Properties –Reflexive: a = R a –Symmetric: a = R b  b = R a –Transitive: a = R b & b = R c  a = R c

20 >>> for s in wn.synsets('car'): print s.lemma_names ['car', 'auto', 'automobile', 'machine', 'motorcar'] ['car', 'railcar', 'railway_car', 'railroad_car'] ['car', 'gondola'] ['car', 'elevator_car'] ['cable_car', 'car'] >>> for s in wn.synsets('car'): print flatten(s.lemma_names) + ': ' + s.definition car auto automobile machine motorcar: a motor vehicle with four wheels; usually propelled by an internal combustion engine car railcar railway_car railroad_car: a wheeled vehicle adapted to the rails of railroad car gondola: the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant car elevator_car: where passengers ride up and down cable_car car: a conveyance for passengers or freight on a cable railway Word Net (Ch2): An Equivalence Relation

21 Synonymy: An Equivalence Relation?

22 Comments

23 A Partial Order (≤ R ) Powerset({x,y,z}) – Subsets ordered by inclusion – a≤ R b  a  b Three properties – Reflexive: a≤a – Antisymmetric: a≤b & b≤a  a=b – Transitivity: a≤b & b≤c  a≤c

24 Wordnet: A Partial Order >>> for h in wn.synsets('car')[0].hypernym_paths()[0]: print h.lemma_names ['entity'] ['physical_entity'] ['object', 'physical_object'] ['whole', 'unit'] ['artifact', 'artefact'] ['instrumentality', 'instrumentation'] ['container'] ['wheeled_vehicle'] ['self-propelled_vehicle'] ['motor_vehicle', 'automotive_vehicle'] ['car', 'auto', 'automobile', 'machine', 'motorcar']

25 Help s = wn.synsets('car')[0] >>> s.name 'car.n.01' >>> s.pos 'n' >>> s.lemmas [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')] >>> s.examples ['he needs a car to get to work'] >>> s.definition 'a motor vehicle with four wheels; usually propelled by an internal combustion engine' >>> s.hyponyms()[0:3] [Synset('stanley_steamer.n.01'), Synset('hardtop.n.01'), Synset('loaner.n.02')] >>> s.hypernyms() [Synset('motor_vehicle.n.01')]

26 CFGs: Context Free Grammars (Ch8)

27 Ambiguity

28 The Chomsky Hierarchy – Type 0 > Type 1 > Type 2 > Type 3 – Recursively Enumerable > CS > CF > Regular Examples – Type 3: Regular (Finite State): Grep & Regular Expressions Right-Branching: A  a A Left-Branching: B  B b – Type 2: Context-Free (CF): Center-Embedding: C  …  x C y Parenthesis Grammars:  ( ) w w R – Type 1: Context-Sensitive (CS): w w – Type 0: Recursively Enumerable – Beyond Type 0: Halting Problem

29 Summary Chapter 1 NLTK (Natural Lang Toolkit) – Unix for Poets without Unix – Unix  Python Object-Oriented – Polymorphism: “len” applies to lists, sets, etc. Ditto for: +, help, print, etc. Types & Tokens – “to be or not to be” – 6 types & 4 tokens FreqDist: sort | uniq –c Concordances Chapters 2-8 Chapter 3: URLs Chapter 2 – Equivalence Relations: Parity Synonymy (?) – Partial Orders: Wordnet Ontology Chapter 8: CF Parsing – Chomsky Hierarchy CS > CF > Regular


Download ppt "NLTK (Natural Language Tool Kit) Unix for Poets (without Unix) Unix  Python."

Similar presentations


Ads by Google