LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 10th
Today's Lecture unfortunately will be short.. Homework 1: install Python 3 and nltk/nltk_data on your own computer unfortunate throat issues/coughing…
Administrivia No lectures on I'm away on business on Th 3/7 (Spring Recess) I'm away on business on Tu 1/29 Th 1/31 Tu 2/5 Th 2/7 (What to do: TBD) Guest lectures (more to be announced): 2/14 Tatjana Scheffler ….
Course Webpage for lecture slides and Panopto recordings: http://elmo.sbs.arizona.edu/~sandiway/#courses available from just before class time (afterwards, look again for corrections/updates) in .pptx (good for animations) and .pdf formats Meeting information Tues/Thurs 3:30-4:45pm. McClellandPark, Room 102. not guaranteed!
Course Objectives Follow-on course to LING/C SC/PSYC 538 Computational Linguistics: pre-requisite: 538 continue with selected material from the 538 textbook (J&M): 25 chapters, a lot of material not covered in 438/538 And gain more extensive experience with new stuff not in textbook dealing with natural language software packages Installation, input data formatting operation project exercises useful “real-world” computational experience abilities gained will be of value to employers
Computational Facilities Use your own laptop/desktop can also make use of the computers in this lab but you don’t have installation rights on these computers Platforms Windows is maybe possible but you really should run some variant of Unix… Linux (e.g. Ubuntu, separate bootable partition or via virtualization software if you use Windows) OSX Not quite Linux, some porting issues, especially with C programs, can use Virtual Box (Linux under OSX) Or Macports or Homebrew (no need for virtualization)
Grading Office hours (by appointment): Completion of all homework tasks will result in a satisfactory grade (A) Tasks typically should be completed before the corresponding class next week. email me your work (sandiway@email.arizona.edu). also be prepared to come up and present your work (if called upon). Office hours (by appointment): also after class
Syllabus Revisions to the syllabus Homeworks you may discuss questions with other students however, you must write it up yourself (in your own words, your own code etc.) cite (web) references and your classmates (in the case of discussion) Student Code of Academic Integrity: plagiarism etc. http://deanofstudents.arizona.edu/codeofacademicintegrity Revisions to the syllabus “the information contained in the course syllabus, other than the grade and absence policies, may be subject to change with reasonable advance notice, as deemed appropriate by the instructor.”
www.python.org
Windows 10: 32 or 64 bit system? Choose correct file: Look for System in Control Panel: Choose correct file: Windows x86-64 executable installer (for 64 bit systems) Windows x86 executable installer (for 32 bit systems)
Running the Python 3 installer
Finish installation, and start Python Four versions present: Python command line (2.7.x) IDLE Python (2.7.x) Python 3.6 IDLE Python 3.6
Python resources python.org
Why Python? Installed by default on OSX but you should install and use Python 3
Python on Ubuntu sudo apt install python3-pip
Python on Ubuntu
NLTK 3.2.5 Install See http://www.nltk.org/install.html Use pip3 (for python3) to install packages from the Python Package Index (PyPI) sudo pip3 install -U nltk updated my nltk 3.2.4 to 3.2.5
NLTK Data Install See http://www.nltk.org/data.html python3 If you get an SSL certificate error message, run: /Applications/Python 3.6/Install Certificates.command
Windows 10: setup Environment variable PATH should be set correctly to point to Python 3 install directory Type in search: Edit environment variables for your account
Windows 10: install nltk On the command line: pip3 install pyyaml nltk Package pyyaml must be used somewhere in nltk … Source: http://www.pitt.edu/~naraehan/python2/faq.html
Windows 10: install numpy and test nltk On the command line: pip3 install numpy (the chunking algorithm uses it) Let's test nltk: .word_tokenize() converts a string into words .pos_tag() does part-of-speech tagging .ne_chunk() does named entity recognition
Windows 10: test nltk .draw() takes a Tree object and draws it in a pop-up window
Windows 10: install nltk data Install corpus data (from inside Python) using nltk.download()
Windows 10: test nltk data There is a sample of the well-known Penn Treebank Wall Street Journal (WSJ) corpus included 3,914 parsed sentences 49,000+ parsed sentences in the full corpus
Example:
nltk: where is it installed?