Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

ThinkPython Ch. 10 CS104 Students o CS104 n Prof. Norman.
Container Types in Python
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Python Mini-Course University of Oklahoma Department of Psychology Day 4 – Lesson 15 Tuples 5/02/09 Python Mini-Course: Day 4 – Lesson 15 1.
Chapter 4 Syntax.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Lecture 03 – Sequences of data.  At the end of this lecture, students should be able to:  Define and use functions  Import functions from modules 
Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
SYNTAX 1 DAY 30 – NOV 6, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Phonetics, Phonology, Morphology and Syntax
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
ELN – Natural Language Processing Giuseppe Attardi
Methods in Computational Linguistics II Queens College Lecture 7: Structuring Things.
April 2005CSA2050:NLTK1 CSA2050: Introduction to Computational Linguistics NLTK.
Lists in Python.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Lecture 6 : Revision.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Development of a German- English Translator Felix Zhang Period Thomas Jefferson High School for Science and Technology Computer Systems Research.
 Expression Tree and Objects 1. Elements of Python  Literals, Strings, Tuples, Lists, …  The order of file reading  The order of execution 2.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Built-in Data Structures in Python An Introduction.
Linguistic Essentials
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Rules, Movement, Ambiguity
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
UniMAP Sem2-10/11 DKT121: Fundamental of Computer Programming1 Arrays.
Guide to Programming with Python Chapter Five Lists and dictionaries (data structure); The Hangman Game.
Chapter 10 Loops: while and for CSC1310 Fall 2009.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
NATURAL LANGUAGE PROCESSING
KUKUM-06/07 EKT120: Computer Programming 1 Week 6 Arrays-Part 1.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
CSC 594 Topics in AI – Natural Language Processing
Containers and Lists CIS 40 – Introduction to Programming in Python
Statistical NLP: Lecture 3
Natural Language Processing (NLP)
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCE 590 Web Scraping - NLTK
CS 388: Natural Language Processing: Syntactic Parsing
Lists in Python.
EKT150 : Computer Programming
CS 3304 Comparative Languages
String and Lists Dr. José M. Reyes Álamo.
Python Tutorial for C Programmer Boontee Kruatrachue Kritawan Siriboon
Natural Language Processing (NLP)
CSCE 590 Web Scraping - NLTK
CSA2050: Introduction to Computational Linguistics
Artificial Intelligence 2004 Speech & Natural Language Processing
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
Natural Language Processing (NLP)
Presentation transcript:

Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions

Split into words sent = “That isn’t the problem, Bob.” sent.split() vs. nltk.word_tokenize(sent) 1

List Comprehensions Compact way to process every item in a list. [x for x in array] dest = [] for x in array: dest.append(x) 2

Methods Using the iterating variable, x, methods can be applied. Their value is stored in the resulting list. [len(x) for x in array] dest = [] for x in array: dest.append(len(x)) 3

Conditionals Elements from the original list can be omitted from the resulting list, using conditional statements [x for x in array if len(x) == 3] dest = [] for x in array: if len(x) == 3: dest.append(x) 4

Building up These can be combined to build up complicated lists [x.upper() for x in array if len(x) > 3 and x.startswith(‘t’)] dest = [] for x in array: if len(x) > 3 and x.startswith(‘t’): dest.append(x.upper()) 5

Lists Containing Lists Lists can contain lists [[a, 1], [b, 2], [d, 4]]...or tuples [(a, 1), (b, 2), (d, 4)] [ [d, d*d] for d in array if d < 4] 6

Using multiple lists Multiple lists can be processed simultaneously in a list comprehension [x*y for x in array1 for y in array2] 7

List Comprehension Exercises Make a list of the first ten multiples of ten (10, 20, , 100) using a list comprehension. Make a list of the first ten cubes (1, 8, ) using a list comprehension. Store five names in a list. Make a second list that adds the phrase "is awesome!" to each name, using a list comprehension. Write out the following code without using a list comprehension: plus_thirteen = [number + 13 for number in range(1,11)] Exercises from: 8

Lists within lists are often called 2-d arrays 9 This is another way we store tables. Similar to nested dictionaries. a = [[0,1], [1,0]] a[1][1] a[0][0]

Numpy & Arrays Numpy is a commonly used package for numerical calculations in python. Its main object is a multidimensional array. A[1]List A[1][2]‘Rectangular’ 2-d Matrix A[1][2][3]‘Cube/Prism’ 3-d Matrix A[1][2][3][4]4-d Matrix etc. 10

Numpy arrays from numpy import * a = array([1,2,3,4]) a = array([1,2], [3,4]) a.ndimNumber of dimensions a.shapeLength of each dimension a.sizeTotal number of elements 11

numpy array initialization >>> zeros( (3,4) ) array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) >>> ones( (2,3,4), dtype=int16 ) array([[[ 1, 1, 1, 1], [ 1, 1, 1, 1], [ 1, 1, 1, 1]], [[ 1, 1, 1, 1], [ 1, 1, 1, 1], [ 1, 1, 1, 1]]], dtype=int16) >>> empty( (2,3) ) array([[ e-262, e-154, e-260], [ e-313, e-307, e+000]]) 12

Content Types arrays are homogenous (ndarray) –array([1, 3, 4], dtype=int16) lists are not homogenous –[‘abc’, 123, [list1, list2]] dtype describes the “type” of object in the array –str, tuple, int, etc. –numpy.int16, numpy.int32, numpy.float64 etc. 13

zip Zip allows you to “zip” two lists together, creating a list of tuples names = [‘Andrew’, ‘Beth’, ‘Charles’] ages = [35, 34, 33] name_age = zip(names, ages) –[(‘Andrew’, 35), (‘Beth’, 34), (‘Charles’, 33)] 14

foreach vs. indexed for loops “More pythonic” for n, a in zip(names, ages): print “%s -- %s” % (n, a) vs. for i in xrange(len(names)): print “%s -- %s” % (names[i], ages[i]) 15

map map allows you to apply the same function to a list of objects. a = [‘1’, ‘2’, ‘4’] map(int, a) 16

map Any function can be ‘map’ed over a list, but the elements of the list need to be a value argument. def uppercase(s): return s.upper() a = [‘abc’, ‘def’, ‘ghi’] map(uppercase, a) 17

Functions as objects A function name can be assigned to a variable. map is an example of this, where the first argument to map is a function object. a = [1, 3, 4] len(a) sum(a) functions = [len, sum] for fn in functions: print str(fn), fn(a) 18

lambda Lambda functions are single use functions that do not need to be ‘def’ed. Using the uppercase example again: def uppercase(s): return s.upper() a = [‘abc’, ‘def’, ‘ghi’] map(uppercase, a) 19

lambda Lambda functions are single use functions that do not need to be ‘def’ed. These are “anonymous” functions Using the uppercase example again: a = [‘abc’, ‘def’, ‘ghi’] map(lambda s : s.upper(), a) By design, lambdas are only a single statement 20

Aside: Glob Construct a list of all filemames matching a pattern. from glob import glob glob(‘*.txt’) glob(‘/Users/andrew/Documents/*/*.ppt’) 21

Linguistic Annotation Text only takes us so far. People are reliable judges of linguistic behavior. We can model with machines, but for “gold-standard” truth, we ask people to make judgments about linguistic qualities. 22

Example Linguistic Annotations Sentence Boundaries Part of Speech Tags Phonetic Transcription Syntactic parse trees Speaker Identity Semantic Role Speech Act Document Topic Argument structure Word Sense many many many more 23

We need… Techniques to process these. Every corpus has its own format for linguistic annotation. so…we need to parse annotation formats. 24

Constructing a linguistic corpus Decisions that need to be made: –Why are you doing this? –What material will be collected? –How will it be collected? Automatically? Manually? Found material vs. laboratory language? –What meta information will be stored? –What manual annotations are required? How will each annotation be defined? How many annotators will be used? How will agreement be assessed? How will disagreements be resolved? –How will the material be disseminated? Is this covered by your IRB if the material is the result of a human subject protocol? 25

Part of Speech Tagging Task: Given a string of words, identify the parts of speech for each word. 26

Part of Speech tagging Surface level syntax. Primary operation Parsing Word Sense Disambiguation Semantic Role labeling Segmentation Discourse, Topic, Sentence 27

How is it done? Learn from Data. Annotated Data: Unlabeled Data: 28

Learn the association from Tag to Word 29

Limitations Unseen tokens Uncommon interpretations Long term dependencies 30

Format conversion exercise The/DET Dog/NN is/VB fast/JJ./. The dog is fast. 1, 3, DET 5, 7, NN 9, 10, VB 12, 15, JJ 16, 16,. 31

Parsing Generate a parse tree. 32

Parsing Generate a Parse Tree from: The surface form (words) of the text Part of Speech Tokens 33

Parsing Styles 34

Parsing styles 35

Context Free Grammars for Parsing S → VP S →NP VP NP → Det Nom Nom → Noun Nom → Adj Nom VP → Verb Nom Det → “A”, “The” Noun → “I”, “John”, “Address” Verb → “Gave” Adj → “My”, “Blue” Adv → “Quickly” 36

Limitations The grammar must be built by hand. Can’t handle ungrammatical sentences. Can’t resolve ambiguity. 37

Probabilistic Parsing Assign each transition a probability Find the parse with the greatest “likelihood” Build a table and count –How many times does each transition happen Structured learning. 38

Segmentation Sentence Segmentation Topic Segmentation Speaker Segmentation Phrase Chunking –NP, VP, PP, SubClause, etc. 39