Download presentation
Presentation is loading. Please wait.
1
Course Summary LING 572 Fei Xia 03/06/07
2
Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
3
Problem descriptions
4
Two types of problems Classification problem Sequence Labeling problem In both cases: –A predefined set of labels: C = {c 1, c 2, …c n } –Training data: { (x i, y i ) }, where y i 2 C, and y i is known or unknown. –Test data
5
NLP tasks Classification problems: –Document classification –Spam detection –Sentiment analysis –…–… Sequence labeling problems: –POS tagging –Word segmentation –Sentence segmentation –NE detection –Parsing –IGT detection –…–…
6
General approach
7
Step 1: Preprocessing Converting the NLP task to a classification or sequence labeling problem Creating the attribute-value table: –Define feature templates –Instantiate feature templates and select features –Decide what kind of feature values to use (e.g., binarizing features or not) –Converting a multi-class problem to a binary problem (optional)
8
Feature selection Dimensionality reduction –Feature selection Wrapping methods Filtering methods: –Mutual info, 2, Information gain, …. –Feature extraction Term clustering: Latent semantic indexing (LSI)
9
Multiclass Binary One-vs-all All-pairs Error-correcting Output Codes (ECOC)
10
Step 2: Training and decoding Choose a ML learner Train and test on development set, with different settings of non-model parameters Choose the best setting for the development set Run the learner on the test data with the best setting
11
Step 3: Post-processing Label sequence the output we want System combination –Voting: majority voting, weighted voting –More sophisticated models
12
Supervised algorithms
13
Main ideas kNN and Ricchio: finding the nearest neighbors / prototypes DT and DL: finding the right group NB, MaxEnt: calculating P(y | x) Bagging: Reducing the instability Boosting: Forming a committee TBL: Improving the current guess
14
ML learners Modeling Training Testing (a.k.a. decoding)
15
Modeling NB: assuming features are conditionally independent. MaxEnt:
16
Training kNN: no training Rocchio: calculate prototypes DT: build a decision tree –Choose a feature and then split data DL: build a decision list: –Choose a decision rule and then spit data TBL: build a transformation list by –Choose a transformation and then update the current label field
17
Training (cont) NB: calculate P(c i ) and P(f j | c i ) by simple counting. MaxEnt: calculate the weights of feature functions by iteration. Bagging: create bootstrap samples and learn base classifiers. Boosting: learn base classifiers and their weights.
18
Testing kNN: calculate distances between x and x i, find the closest neighbors. Rocchio: calculate distances between x and prototypes. DT: traverse the tree DL: find the first matched decision rule. TBL: apply transformations one by one.
19
Testing (cont) NB: calc MaxEnt: calc Bagging: run the base classifiers and choose the class with highest votes. Boosting: run the base classifiers and calc the weighted sum.
20
Sequence labeling problems With classification algorithms: –Having features that refer to previous tags –Using beam search to find good sequences With sequence labeling algorithms: –HMM –TBL –MEMM –CRF –…–…
21
Semi-supervised algorithms Self-training Co-training … Adding some unlabeled data to the labeled data
22
Unsupervised algorithms MLE EM: –General algorithm: E-step, M-step –EM for PM models Forward-backward for HMM Inside-outside for PCFG IBM models for MT
23
Important concepts
24
Concepts Attribute-value table Feature templates vs. features Weights: –Feature weights –Classifier weights –Instance weights –Feature values
25
Concepts (cont) Maximum entropy vs. Maximum likelihood Maximize likelihood vs. minimize training error Training time vs. test time Training error vs. test error Greedy algorithm vs. iterative approach
26
Concepts (cont) Local optima vs. global optima Beam search vs. Viterbi algorithm Sample vs. resample Model parameters vs. non-model parameters
27
Assignments
28
Read code: –NB: binary features? –DT: difference between DT and C4.5 –Boosting: AdaBoost and AdaBoostM2 –MaxEnt: binary features? Write code: –Info2Vectors –BinVectors – 2– 2 Complete two projects
29
Projects Steps: –Preprocessing –Training and testing –Postprocssing Two projects: –Project 1: Document classification –Project 2: IGT detection
30
Project 1: Document classification A typical classification problem Data are prepared already –Feature template: word appeared in the doc –Feature value: word frequency
31
Project 2: IGT detection Can be framed as a sequence labeling problem –Preprocessing: Define label set –Postprocessing: Tag sequence spans Sequence labeling problem using classification algorithm with beam search To use classification classifiers: –Preprocessing: Define features Choose feature values …
32
Project 2 (cont) Preprocessing: –Define label set –Define feature templates –Decide on feature values Training and decoding –Write beam search Postprocessing –Convert label sequence spans
33
Project 2 (cont) Presentation Final report A typical conference paper: –Introduction –Previous work –Methodology –Experiments –Discussion –Conclusion
34
Using Mallet Difficulties: –Java –A large package Benefits: –Java –A large package –Many learning algorithms: comparing the implementation with “standard” algorithms
35
Bugs in Mallet? In Hw9, include a new section: –Bugs –Complaints –Things you like about Mallet
36
Course summary 9 weeks: 18 sessions 2 kinds of problems 9 supervised algorithms 1 semi-supervised algorithm 1 unsupervised algorithm 4 related issues: feature selection, multiclass binary, system combination, beam search 2 projects 1 well-known package 9 assignments, including 1 presentation and 1 final report N papers
37
What’s the next? Learn more about the algorithms covered in class. Learn new algorithms: –SVM, CRF, regression algorithms, graphical models, … Try new tasks: –Parsing, spam filtering, reference resolution, …
38
Misc Hw7: due tomorrow 11pm Hw8: due Thursday 11pm Hw9: due 3/13 11pm Presentation: No more than 15+5 minutes
39
What must be included in the presentation? Label set Feature templates Effect of beam search 3+ ways to improve the system and results on dev data (test_data/) Best system: results on dev data and the setting Results on test data (more_test_data/)
40
Grades, etc. 9 assignments + class participation Hw1-Hw6: –Total: 740 –Max: 696.56 –Min: 346.52 –Ave: 548.74 –Median: 559.08
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.