Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Final review LING 572 Fei Xia 03/07/06

Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email me by 6am on 3/14. Group meetings: 1:30-4:00pm on 3/16.

Outline Main topics Applying to NLP tasks Tricks

Main topics

Supervised learning –Decision tree –Decision list –TBL –MaxEnt –Boosting Semi-supervised learning –Self-training –Co-training –EM –Co-EM

Main topics (cont) Unsupervised learning –The EM algorithm –The EM algorithm for PM models Forward-backward Inside-outside IBM models for MT Others –Two dynamic models: FSA and HMM –Re-sampling: bootstrap –System combination –Bagging

Main topics (cont) Homework –Hw1: FSA and HMM –Hw2: DT, DL, CNF, DNF, and TBL –Hw3: Boosting Project: –P1: Trigram (learn to use Carmel, relation between HMM and FSA) –P2: TBL –P3: MaxEnt –P4: Bagging, boosting, system combination, SSL

Supervised learning

A classification problem DistrictHouse type IncomePrevious Customer Outcome SuburbanDetachedHighNoNothing SuburbanSemi- detached HighYesRespond RuralSemi- detached LowNoRespond UrbanDetachedLowYesNothing …

Classification and estimation problems Given –x: input attributes –y: the goal –training data: a set of (x, y) Predict y given a new x: –y is a discrete variable  classification problem –y is a continuous variable  estimation problem

Five ML methods Decision tree Decision list TBL Boosting MaxEnt

Decision tree Modeling: tree representation Training: top-down induction, greedy algorithm Decoding: find the path from root to a leaf node, where the tests along the path are satisfied.

Decision tree (cont) Main algorithms: ID3, C4.5, CART Strengths: –Ability to generate understandable rules –Ability to clearly indicate best attributes Weakness: –Data splitting –Trouble with non-rectangular regions –The instability of top-down induction  bagging

Decision list Modeling: a list of decision rules Training: greedy, iterative algorithm Decoding: find the 1 st rule that applies Each decision is based on a single piece of evidence, in contrast to MaxEnt, boosting, TBL

TBL Modeling: a list of transformations (similar to decision rules) Training: –Greedy, iterative algorithm –The concept of current state Decoding: apply every transformation to the data

TBL (cont) Strengths: –Minimizing error rate directly –Ability to handle non-classification problem Dynamic problem: POS tagging Non-classification problem: parsing Weaknesses: –Transformations are hard to interpret as they interact with one another –Probabilistic TBL: TBL-DT

Boosting Training Sample Weighted Sample fTfT f1f1 … f2f2 f ML

Boosting (cont) Modeling: combining a set of weak classifiers to produce a powerful committee. Training: learn one classifier at each iteration Decoding: use the weighted majority vote of the weak classifiers

Boosting (cont) Strengths –It comes with a set of theoretical guarantee (e.g., training error, test error). –It only needs to find weak classifiers. Weaknesses: –It is susceptible to noise. –The actual performance depends on the data and the base learner.

MaxEnt The task: find p* s.t. where If p* exists, it has of the form

MaxEnt (cont) If p* exists, then where

MaxEnt (cont) Training: GIS, IIS Feature selection: –Greedy algorithm –Select one (or more) at a time In general, MaxEnt achieves good performance on many NLP tasks.

Common issues Objective function / Quality measure: –DT, DL: e.g., information gain –TBL, Boosting: minimize training errors –MaxEnt: maximize entropy while satisfying constraints

Common issues (cont) Avoiding overfitting –Use development data –Two strategies: stop early post-pruning

Common issues (cont) Missing attribute values: –Assume a “blank” value –Assign most common value among all “similar” examples in the training data –(DL, DT): Assign a fraction of example to each possible class. Continuous-valued attributes –Choosing thresholds by checking the training data

Common issues (cont) Attributes with different costs –DT: Change the quality measure to include the costs Continuous-valued goal attribute –DT, DL: each “leaf” node is marked with a real value or a linear function –TBL, MaxEnt, Boosting: ??

Comparison of supervised learners DTDLTBLBoostingMaxEnt ProbabilisticPDTPDLTBL-DTConfidenceY ParametricNNNNY representationTreeOrdered list of rules Ordered list of transfor mations List of weighted classifiers List of weighte d features Each iterationAttributeRuleTransfor mation Classifier & weight Feature & weight Data processing Split data Split data* Change cur_y Reweight (x,y) None decodingPath1 st ruleSequenc e of rules Calc f(x)

Semi-supervised Learning

Semi-supervised learning Each learning method makes some assumptions about the problem. SSL works when those assumptions are satisfied. SSL could degrade the performance when mistakes reinforce themselves.

SSL (cont) We have covered four methods: self- training, co-training, EM, co-EM

Co-training The original paper: (Blum and Mitchell, 1998) –Two “independent” views: split the features into two sets. –Train a classifier on each view. –Each classifier labels data that can be used to train the other classifier. Extension: –Relax the conditional independence assumptions –Instead of using two views, use two or more classifiers trained on the whole feature set.

Unsupervised learning

EM is a method of estimating parameters in the MLE framework. It finds a sequence of parameters that improve the likelihood of the training data.

The EM algorithm Start with initial estimate, θ 0 Repeat until convergence –E-step: calculate –M-step: find

The EM algorithm (cont) The optimal solution for the M-step exists for many classes of problems.  A number of well-known methods are special cases of EM. The EM algorithm for PM models –Forward-backward algorithm –Inside-outside algorithm –…–…

Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Similar presentations

Presentation on theme: "Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Similar presentations

Presentation on theme: "Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email."— Presentation transcript:

Similar presentations

About project

Feedback