Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm.

Similar presentations


Presentation on theme: "1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm."— Presentation transcript:

1 1 Introduction LING 575 Week 1: 1/08/08

2 Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm 2

3 Before next time Select papers that you’d like to present –Reply to the 1 st message at GoPost by noon Saturday Read M&S 9.3.3 –Remember to hand in your questions next time. 3

4 General information 4

5 5 General info Course url: http://courses.washington.edu/ling575x –Syllabus (incl. slides, assignments, and papers): updated every week. –GoPost: –Collect it: Please check your emails at least once per day.

6 6 Office hour Email: –Email address: fxia@u.washington.edufxia@u.washington.edu –Subject line should include “ling575” –The 48-hour rule: it works both ways Office hour: –Time: Fr: 10:30-11:30am –Location: Padelford A-210G

7 7 Slides The slides will be online before class if possible. The final version will be uploaded a few hours after class.

8 8 Prerequisites CS 326 (Data Structures) or equivalent: Stat 391 (Prob. and Stats for CS) or equivalent: Basic concepts in probability and statistics Programming in Perl, C, C++, Java, or Python LING570 LING572 Being comfortable with formulas

9 9 Grades for LING575 No midterm or final exams. Graded: Assignments (5): 45-60% Presentation: 15-25% Not graded: Reading: 5-10% Class participation: 10-20%

10 Assignments Assignments: –Due at 2:30pm on Tuesdays –1% penalty for each hour after the due date. Nothing accepted after 4 days. –Submit via CollectIt Reading: –Papers should be read before class. –Bring at least two questions to class. –Your answers will be checked but not graded. 10

11 Presentation Select your week by noon this Saturday (1/12) by replying to the GoPost message: – first come, first service If later for whatever reason, the week you selected no long works for you, it is your responsibility to find someone to switch. For your week, email Fei the slides by noon the Monday (i.e., the day before your presentation). –1% penalty for each hour after the due date. 11

12 Patas If you need to have a patas account, you need to email linghelp@u.washington.edu right away to get an account. The directory for LING575: ~/dropbox/07-08/575x/ –hw1/, hw2/, ….: Assignments and solution –hmm/: A pre-existing HMM package –misc_slides/: Solution to exams and misc slides that are not on the course url. 12

13 13 Course plan

14 ML learning Supervised learning: LING572 Semi-supervised learning: –Some annotated data, plus a large amount of annotated data –Ex: self-training, co-training, transductive SVM Unsupervised learning: –There are no annotated data –Ex: EM 14

15 Unsupervised learning No annotated data But the knowledge has to come from somewhere. –Dictionary / lexicon –Seed examples –…  We choose unsupervised POS tagging as a case to study. 15

16 Supervised POS tagging It is a sequence labeling problem. Statistical approach: –Sequence labeling algorithms: HMM, MEMM, CRF, … –Classification algorithms: decision tree, naïve Bayes, MaxEnt, SVM, Boosting, …. Most unsupervised POS tagging algorithms use EM to estimate HMM parameters. 16

17 Major approaches to unsupervised tagging All assume a large amount of un- annotated data Approach #1: use EM to estimate HMM –No lexicon –With full lexicon –With filtered lexicon 17

18 Major approaches (cont) Approach #2: clustering the words based on –distributional cues – morphological cues Approach #3: cross-lingual approach: –It requires parallel data –Seeds are created by projecting POS info from one language to the other. 18

19 Major approaches (cont) Approach #4: Prototype learning: –It requires a small number of prototypes: e.g., “book” is a noun, “the” is a determiner. –Prototypes would help to label other words. 19

20 In this course We will –discuss the papers in each category –explore various methods aiming at improving the start of the art. Compared to last year’s ling573, this course focuses –more on machine learning –less on search and rule writing 20


Download ppt "1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm."

Similar presentations


Ads by Google