NLTK (Natural Language Tool Kit) Unix for Poets (without Unix) Unix  Python.

Slides:



Advertisements
Similar presentations
Grammar types There are 4 types of grammars according to the types of rules: – General grammars – Context Sensitive grammars – Context Free grammars –
Advertisements

Chapter 5: Languages and Grammar 1 Compiler Designs and Constructions ( Page ) Chapter 5: Languages and Grammar Objectives: Definition of Languages.
Prepare for next time No need to buy the book – Free online at Read Chapter 1 –
The Big Picture Chapter 3. We want to examine a given computational problem and see how difficult it is. Then we need to compare problems Problems appear.
Programming Languages Wrap-up. Your Toolkit Object-oriented Imperative Functional Logic.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Languages, grammars, and regular expressions
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
CS 280 Data Structures Professor John Peterson. Lexer Project Questions? Must be in by Friday – solutions will be posted after class The next project.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
FUNDAMENTALS OF PRACTICAL COMPUTING Ken ChurchFUNDAMENTALS OF PRACTICAL COMPUTING Ken Church Intended audience: – Students considering a major.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Wordnet, Raw Text Pinker, continuing Chapter 2
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Formal Language Theory. Homework Read documentation on Graphviz – –
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
CS 280 Data Structures Professor John Peterson. How Does Parsing Work? You need to know where to start (“statement”) This grammar is constructed so that.
Grammars CPSC 5135.
C H A P T E R TWO Syntax and Semantic.
Sets Define sets in 2 ways  Enumeration  Set comprehension (predicate on membership), e.g., {n | n  N   k  k  N  n = 10  k  0  n  50} the set.
CPS 506 Comparative Programming Languages Syntax Specification.
Language: Set of Strings
1 Partial Orderings Aaron Bloomfield CS 202 Epp, section ???
Ch 1.1 Warm Up Problems Objectives: - understand/use properties & classifications of real numbers.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Algorithms. Homework None – Lectures & Homework Solutions: – Video:
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
Lecture 1 Overview Topics 1. Proof techniques: induction, contradiction Proof techniques June 1, 2015 CSCE 355 Foundations of Computation.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Recursively Enumerable Languages
RelationsCSCE 235, Spring Introduction A relation between elements of two sets is a subset of their Cartesian products (set of all ordered pairs.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
1 CS Programming Languages Class 04 September 5, 2000.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 12 Mälardalen University 2007.
C Sc 132 Computing Theory Professor Meiliu Lu Computer Science Department.
1 The Chomsky Hierarchy. 2 Unrestricted Grammars: Productions String of variables and terminals String of variables and terminals.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Spring
Mid-Terms Exam Scope and Introduction. Format Grades: 100 points -> 20% in the final grade Multiple Choice Questions –8 questions, 7 points each Short.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Computability. Turing Machines Read input letter & tape letter, write tape letter, move left or right.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Theory of Computation. Introduction to The Course Lectures: Room ( Sun. & Tue.: 8 am – 9:30 am) Instructor: Dr. Ayman Srour (Ph.D. in Computer Science).
Relations and Their Properties
CSCE 355 Foundations of Computation
CST229 Week 6 Questions or concerns? Homework #4 due
CS 326 Programming Languages, Concepts and Implementation
Syntax Specification and Analysis
Automata and Languages What do these have in common?
Context Sensitive Grammar & Turing Machines
CSCE 355 Foundations of Computation
Formal Language Theory
ConceptNet: Search ontology classes via human senses ---A proposal
CSCE 355 Foundations of Computation
CSE322 Chomsky classification
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
C H A P T E R T W O Syntax.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
9.5 Equivalence Relations
Equivalence Relations
Agenda Lecture Content: Relations (Relasi)
Presentation transcript:

NLTK (Natural Language Tool Kit) Unix for Poets (without Unix) Unix  Python

Homework #4 No need to buy the book – Free online at Read Chapter 1 – Start with exercise 22 and go as far as you can – Exercise 23: Solve however you like – (no need to use for and if) Due Tuesday at sunrise – Send to

Installing Chapter 01: pp – Python – NLTK – Data

George Miller’s Example: Erode Exercise: Use “erode” in a sentence: – My family erodes a lot. to eat into or away; destroy by slow consumption or disintegration – Battery acid had eroded the engine. – Inflation erodes the value of our money. Miller’s Conclusion: – Dictionary examples are more helpful than defs Definition Examples George Miller: Chomsky’s Mentor & Wordnet

Introduction to Programming Traditional (Start with Definitions) Constants: 1 Variables: x Objects: – lists, strings, arrays, matrices Expressions: 1+x Statements: Side Effects – print 1+x; Conditionals: – If (x<=1) return 1; Iteration: for loops Functions Recursion Streams Non-Traditional (Start with Examples) Recursion def fact(x): if(x <= 1): return 1 else: return x * fact(x-1) Streams: – Unix Pipes Briefly mentioned – Everything else

Python def fact(x): if(x <= 1): return 1 else: return x * fact(x-1) def fact2(x): result=1 for i in range(x): result *=(i+1); return result Exercise: Fibonacci in Python Recursion Iteration

Flatten: List  String >>> def flatten(list): if(len(list) == 1): return list[0]; else: return list[0] + ' ' + flatten(list[1:len(list)]); First Rest flatten = split -1

Python Objects Lists >>> sent1 ['Call', 'me', 'Ishmael', '.'] >>> type(sent1) >>> sent1[0] 'Call' >>> sent1[1:len(sent1)] ['me', 'Ishmael', '.'] Strings >>> sent1[0] 'Call' >>> type(sent1[0]) >>> sent1[0][0] 'C' >>> sent1[0][1:len(sent1[0])] 'all' First Rest

Types & Tokens Polymorphism

Polymorphism (From Wikipedia)

Tokens Types

FreqDist Tokens Types

Concordances

URLs (Chapter 3)

HTML

Works with almost any URL! >>>url=" _html/teaching/103/Lecture07/WebProgramming/java script_example_with_sounds.html" >>> def url2text(url): html = urlopen(url).read() raw = nltk.clean_html(html) tokens = nltk.word_tokenize(raw) return nltk.Text(tokens) >>> text=url2text(url) >>> text.concordance('Nonsense')

An Equivalence Relation (= R ) A Partition of S ≡ Set of Subsets of S –Mutually Exclusive & Exhaustive Equivalence Classes ≡ A Partition such that –All the elements in a class are equivalent (with respect to = R ) –No element from one class is equivalent to an element from another Example: Partition integers into evens & odds Even integers: 2,4,6… Odd integers: 1,3,5… –x = R y  x has the same parity as y Three Properties –Reflexive: a = R a –Symmetric: a = R b  b = R a –Transitive: a = R b & b = R c  a = R c

>>> for s in wn.synsets('car'): print s.lemma_names ['car', 'auto', 'automobile', 'machine', 'motorcar'] ['car', 'railcar', 'railway_car', 'railroad_car'] ['car', 'gondola'] ['car', 'elevator_car'] ['cable_car', 'car'] >>> for s in wn.synsets('car'): print flatten(s.lemma_names) + ': ' + s.definition car auto automobile machine motorcar: a motor vehicle with four wheels; usually propelled by an internal combustion engine car railcar railway_car railroad_car: a wheeled vehicle adapted to the rails of railroad car gondola: the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant car elevator_car: where passengers ride up and down cable_car car: a conveyance for passengers or freight on a cable railway Word Net (Ch2): An Equivalence Relation

Synonymy: An Equivalence Relation?

Comments

A Partial Order (≤ R ) Powerset({x,y,z}) – Subsets ordered by inclusion – a≤ R b  a  b Three properties – Reflexive: a≤a – Antisymmetric: a≤b & b≤a  a=b – Transitivity: a≤b & b≤c  a≤c

Wordnet: A Partial Order >>> for h in wn.synsets('car')[0].hypernym_paths()[0]: print h.lemma_names ['entity'] ['physical_entity'] ['object', 'physical_object'] ['whole', 'unit'] ['artifact', 'artefact'] ['instrumentality', 'instrumentation'] ['container'] ['wheeled_vehicle'] ['self-propelled_vehicle'] ['motor_vehicle', 'automotive_vehicle'] ['car', 'auto', 'automobile', 'machine', 'motorcar']

Help s = wn.synsets('car')[0] >>> s.name 'car.n.01' >>> s.pos 'n' >>> s.lemmas [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')] >>> s.examples ['he needs a car to get to work'] >>> s.definition 'a motor vehicle with four wheels; usually propelled by an internal combustion engine' >>> s.hyponyms()[0:3] [Synset('stanley_steamer.n.01'), Synset('hardtop.n.01'), Synset('loaner.n.02')] >>> s.hypernyms() [Synset('motor_vehicle.n.01')]

CFGs: Context Free Grammars (Ch8)

Ambiguity

The Chomsky Hierarchy – Type 0 > Type 1 > Type 2 > Type 3 – Recursively Enumerable > CS > CF > Regular Examples – Type 3: Regular (Finite State): Grep & Regular Expressions Right-Branching: A  a A Left-Branching: B  B b – Type 2: Context-Free (CF): Center-Embedding: C  …  x C y Parenthesis Grammars:  ( ) w w R – Type 1: Context-Sensitive (CS): w w – Type 0: Recursively Enumerable – Beyond Type 0: Halting Problem

Summary Chapter 1 NLTK (Natural Lang Toolkit) – Unix for Poets without Unix – Unix  Python Object-Oriented – Polymorphism: “len” applies to lists, sets, etc. Ditto for: +, help, print, etc. Types & Tokens – “to be or not to be” – 6 types & 4 tokens FreqDist: sort | uniq –c Concordances Chapters 2-8 Chapter 3: URLs Chapter 2 – Equivalence Relations: Parity Synonymy (?) – Partial Orders: Wordnet Ontology Chapter 8: CF Parsing – Chomsky Hierarchy CS > CF > Regular