LING/C SC 581: Advanced Computational Linguistics

Slides:



Advertisements
Similar presentations
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 15 th.
Advertisements

LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
Using CourseCompass Features You must already be registered or enrolled in a current class.
Using MyMathLab Features You must already be registered or enrolled in a current class.
LING 581: Advanced Computational Linguistics Lecture Notes January 26th.
Using CourseCompass Features You must already be registered or enrolled in a current class.
LING 364: Introduction to Formal Semantics Lecture 1 January 12th.
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit.
COMPSCI 101 S Principles of Programming Lecture 1 – Introduction.
Welcome to Math 110 Online Section 001, Summer 2015.
Welcome to CS 115! Introduction to Programming. Class URL Please write this down!
PLEASE GRAB A SEAT ANYWHERE FOR NOW. Welcome to the CMSC 201 Class!!! Mr. Lupoli ITE 207.
COMP Introduction to Programming Yi Hong May 13, 2015.
CST 229 Introduction to Grammars Dr. Sherry Yang Room 213 (503)
Welcome to CS 101! Introduction to Computers Spring 2015 This slide is based on Dr. Keen slides for CS101 day sections, with some modifications.
Welcome to CS 115! Introduction to Programming. Class URL Write this down!
COP3502: Introduction to Computer Science Yashas Shankar.
IST 210: Organization of Data
Welcome to CS 101! Introduction to Computers Fall 2015.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 13 th.
Course Information CSE 2031 Fall Instructor U. T. Nguyen /new-yen/ Office: CSEB Office hours:  Tuesday,
Course Information CSE 2031 Fall Instructor U.T. Nguyen Office: CSE Home page:
PROBLEM SOLVING AND PROGRAMMING ISMAIL ABUMUHFOUZ | CS 170.
WELCOME TO MICRO ECONOMICS AB 224 Discussion of Syllabus and Expectations in the Class.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Welcome to CS 115! Introduction to Programming Spring 2016.
Assignment Preliminaries The following applies to all assignments © 2016 B. Wilkinson Assignprelim.ppt Modification date: January 3, 2016.
Web Application Development Instructor: Matthew Schurr Please sign in on the sheet at the front of the room when you arrive.
Course Information EECS 2031 Fall Instructor Uyen Trang (U.T.) Nguyen Office: LAS Office hours: 
CSc 120 Introduction to Computer Programing II
Introduction to Computers Spring 2017
Course Overview - Database Systems
Course Information EECS 2031 – Section A Fall 2017.
CSc 1302 Principles of Computer Science II
Introduction to Programming
Welcome to MATH FALL 2016.
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong.
Using MyMathLab Features
CSE1320 INTERMEDIATE PROGRAMMING
CSE1320 INTERMEDIATE PROGRAMMING
How to Be Successful in English 3000
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CSE1320 INTERMEDIATE PROGRAMMING
LING 408/508: Computational Techniques for Linguists
LING 388: Computers and Language
CSE1320 INTERMEDIATE PROGRAMMING
Welcome to CS 1010! Algorithmic Problem Solving.
LING/C SC 581: Advanced Computational Linguistics
Course Overview - Database Systems
Welcome to CS 1010! Algorithmic Problem Solving.
Introduction to Computers Fall 2017
LING/C SC 581: Advanced Computational Linguistics
CSC2310 Principles of Computer Programming
Introduction to Computers Spring 2018
CSE1320 INTERMEDIATE PROGRAMMING
Introduction to Computers Fall 2018
CSE1320 INTERMEDIATE PROGRAMMING
Welcome to CS220/MATH 320 – Applied Discrete Mathematics Fall 2018
Introduction to Computers SPRING 2019
Introduction to Microbiology BI 234
Course Information EECS 2031 Fall 2016.
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Using CourseCompass Features
BIT 115: Introduction To Programming
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 11th

Course Webpage for lecture slides and Panopto recordings: http://elmo.sbs.arizona.edu/~sandiway/#courses available from just before class time (afterwards, look again for corrections/updates) in .pptx (good for animations) and .pdf formats Meeting information Tues/Thurs 3:30-4:45pm. McClellandPark, Room 102. not guaranteed!

Course Objectives Follow-on course to LING/C SC/PSYC 538 Computational Linguistics: pre-requisite: 538 continue with selected material from the 538 textbook (J&M): 25 chapters, a lot of material not covered in 438/538 And gain more extensive experience with new stuff not in textbook dealing with natural language software packages Installation, input data formatting operation project exercises useful “real-world” computational experience abilities gained will be of value to employers

Computational Facilities Use your own laptop/desktop can also make use of the computers in this lab but you don’t have installation rights on these computers Platforms Windows is maybe possible but you really should run some variant of Unix… Linux (e.g. Ubuntu, separate bootable partition or via virtualization software if you use Windows) OSX Not quite Linux, some porting issues, especially with C programs, can use Virtual Box (Linux under OSX) Or Macports or Homebrew (no need for virtualization)

Grading Office hours (by appointment): Completion of all homework tasks will result in a satisfactory grade (A) Tasks typically should be completed before the corresponding class next week. email me your work (sandiway@email.arizona.edu). also be prepared to come up and present your work (if called upon). Office hours (by appointment): also after class

Syllabus Revisions to the syllabus Homeworks you may discuss questions with other students however, you must write it up yourself (in your own words, your own code etc.) cite (web) references and your classmates (in the case of discussion) Student Code of Academic Integrity: plagiarism etc. http://deanofstudents.arizona.edu/codeofacademicintegrity Revisions to the syllabus “the information contained in the course syllabus, other than the grade and absence policies, may be subject to change with reasonable advance notice, as deemed appropriate by the instructor.”

Python 3 Grab from python.org non-standard on MacOS (Apple: Python 2.7.10) which python /usr/bin/python which python3 /Library/Frameworks/Python.framework/Versions/3.5/bin/python3 printenv PATH

www.nltk.org

NLTK 3.2.5 Install See http://www.nltk.org/install.html Use pip (pip3 for python3) to install packages from the Python Package Index (PyPI) sudo pip3 install -U nltk updated my nltk 3.2.4 to 3.2.5

NLTK Data Install See http://www.nltk.org/data.html python3

Homework 1 Part 1: Install (if you haven't already) a latest or recent version of Python 3, NLTK, and NLTK Data on your laptop Part 2: The Penn Treebank is partially installed as a corpus in NLTK Data (Sections 00 and 01: wsj_0001.mrg to wsj_0199.mrg) from nltk.corpus import treebank

Homework 1 Part 2 (contd.) Submit relevant parts of your python console + a screenshot of a random tree from the treebank Parsed Sentence (example): t= treebank.parsed_sents("wsj_0199.mrg")[0] Tree drawing: t.draw() Random number generation: import random random.seed() random.randrange(1,200) returns some x: 1 <= x < 200

Homework 1 Part 3: Alice in Wonderland nltk.corpus.gutenberg.fileids() ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt'] raw = nltk.corpus.gutenberg.raw('carroll-alice.txt') sents = nltk.sent_tokenize(raw) len(sents) 1625

Homework 1 Part 3 (contd.): Alice in Wonderland Part of speech tag a random sentence from the book and comment on the accuracy of the tagging. Submit your console use nltk.tokenize.word_tokenize(string) nltk.pos_tag(list_of_words) Example: [('The', 'DT'), ('Mouse', 'NNP'), ('only', 'RB'), ('growled', 'VBD'), ('in', 'IN'), ('reply', 'NN'), ('.', '.')] https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Homework 1 Due date: next Monday (midnight) by email (Subject: 581 Homework 1) submit one PDF file (put your name at the top of the file)