Download presentation
Presentation is loading. Please wait.
Published byHillary Willis Modified over 5 years ago
1
LING/C SC 581: Advanced Computational Linguistics
Lecture Notes Jan 10th
2
Today's Lecture unfortunately will be short..
Homework 1: install Python 3 and nltk/nltk_data on your own computer unfortunate throat issues/coughing…
3
Administrivia No lectures on I'm away on business on
Th 3/7 (Spring Recess) I'm away on business on Tu 1/29 Th 1/31 Tu 2/5 Th 2/7 (What to do: TBD) Guest lectures (more to be announced): 2/14 Tatjana Scheffler ….
4
Course Webpage for lecture slides and Panopto recordings:
available from just before class time (afterwards, look again for corrections/updates) in .pptx (good for animations) and .pdf formats Meeting information Tues/Thurs 3:30-4:45pm. McClellandPark, Room 102. not guaranteed!
5
Course Objectives Follow-on course to LING/C SC/PSYC 538 Computational Linguistics: pre-requisite: 538 continue with selected material from the 538 textbook (J&M): 25 chapters, a lot of material not covered in 438/538 And gain more extensive experience with new stuff not in textbook dealing with natural language software packages Installation, input data formatting operation project exercises useful “real-world” computational experience abilities gained will be of value to employers
6
Computational Facilities
Use your own laptop/desktop can also make use of the computers in this lab but you don’t have installation rights on these computers Platforms Windows is maybe possible but you really should run some variant of Unix… Linux (e.g. Ubuntu, separate bootable partition or via virtualization software if you use Windows) OSX Not quite Linux, some porting issues, especially with C programs, can use Virtual Box (Linux under OSX) Or Macports or Homebrew (no need for virtualization)
7
Grading Office hours (by appointment):
Completion of all homework tasks will result in a satisfactory grade (A) Tasks typically should be completed before the corresponding class next week. me your work also be prepared to come up and present your work (if called upon). Office hours (by appointment): also after class
8
Syllabus Revisions to the syllabus Homeworks
you may discuss questions with other students however, you must write it up yourself (in your own words, your own code etc.) cite (web) references and your classmates (in the case of discussion) Student Code of Academic Integrity: plagiarism etc. Revisions to the syllabus “the information contained in the course syllabus, other than the grade and absence policies, may be subject to change with reasonable advance notice, as deemed appropriate by the instructor.”
10
Windows 10: 32 or 64 bit system? Choose correct file:
Look for System in Control Panel: Choose correct file: Windows x86-64 executable installer (for 64 bit systems) Windows x86 executable installer (for 32 bit systems)
11
Running the Python 3 installer
12
Finish installation, and start Python
Four versions present: Python command line (2.7.x) IDLE Python (2.7.x) Python 3.6 IDLE Python 3.6
13
Python resources python.org
14
Why Python? Installed by default on OSX
but you should install and use Python 3
15
Python on Ubuntu sudo apt install python3-pip
16
Python on Ubuntu
17
NLTK 3.2.5 Install See http://www.nltk.org/install.html
Use pip3 (for python3) to install packages from the Python Package Index (PyPI) sudo pip3 install -U nltk updated my nltk to 3.2.5
18
NLTK Data Install See http://www.nltk.org/data.html python3
If you get an SSL certificate error message, run: /Applications/Python 3.6/Install Certificates.command
19
Windows 10: setup Environment variable PATH should be set correctly to point to Python 3 install directory Type in search: Edit environment variables for your account
20
Windows 10: install nltk On the command line:
pip3 install pyyaml nltk Package pyyaml must be used somewhere in nltk … Source:
21
Windows 10: install numpy and test nltk
On the command line: pip3 install numpy (the chunking algorithm uses it) Let's test nltk: .word_tokenize() converts a string into words .pos_tag() does part-of-speech tagging .ne_chunk() does named entity recognition
22
Windows 10: test nltk .draw() takes a Tree object and draws it in a pop-up window
23
Windows 10: install nltk data
Install corpus data (from inside Python) using nltk.download()
24
Windows 10: test nltk data
There is a sample of the well-known Penn Treebank Wall Street Journal (WSJ) corpus included 3,914 parsed sentences 49,000+ parsed sentences in the full corpus
25
Example:
26
nltk: where is it installed?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.