Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Similar presentations


Presentation on theme: "Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."— Presentation transcript:

1 Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

2 Plan for the Day Announcements  Questions?  No quiz and no new assignment today Weka helpful hints Clustering Advanced Statistical Models More on Optimization and Tuning

3 Weka Helpful Hints

4 Remember SMOreg vs SMO…

5 Setting the Exponent in SMO * Note that an exponent larger than 1.0 means you are using a non-linear kernel.

6 Clustering

7 What is clustering Finding natural groupings of your data Not supervised! No class attribute. Usually only works well if you have a huge amount of data!

8 InfoMagnets: Interactive Text Clustering

9 What does clustering do? Finds natural breaks in your data If there are obvious clusters, you can do this with a small amount of data If you have lots of weak predictors, you need a huge amount of data to make it work

10 What does clustering do? Finds natural breaks in your data If there are obvious clusters, you can do this with a small amount of data If you have lots of weak predictors, you need a huge amount of data to make it work

11 Clustering in Weka * You can pick which clustering algorithm you want to use and how many clusters you want.

12 Clustering in Weka Click here Select the class attribute * Clustering is unsupervised, so you want it to ignore your class attribute!

13 Clustering in Weka * You can evaluate the clustering in comparison with class attribute assignments

14 Adding a Cluster Feature

15 * You should set it explicitly to ignore the class attribute * Set the pulldown menu to No Class

16 Why add cluster features? Class 1 Class 2

17 Why add cluster features? Class 1 Class 2

18 Clustering with Weka K-means and FarthestFirst: disjoint flat clusters EM: statistical approach Cobweb: hierarchical clustering

19 K-Means You choose the number of clusters you want  You might need to play with this by looking at what kind of clusters you get out K initial points chosen randomly as cluster centriods All points assigned to the centroid they are closest to Once data is clustered, a new centroid is picked based on relationships within the cluster

20 K-Means Then clustering occurs again using the new centroids This continues until no changes in clustering take place Clusters are flat and disjoint

21 K-Means

22

23

24

25

26 EM: Expectation Maximization Does not base clustering on distance from a centroid Instead clusters based on probability of class assignment Overlapping clusters rather than disjoint clusters  Every instance belongs to every cluster with some probability

27 EM: Expectation Maximization Two important kinds of probability distributions  Each cluster has an associated distribution of attribute values for each attribute Based on the extent to which instances are in the cluster  Each instance has a certain probability of being in each cluster Based on how close its attribute values are to typical attribute values for the cluster

28 Probabilities of Cluster Membership Initialized 65% B 35% A 25% B 75% A

29 Central Tendencies Computed Based on Cluster Membership A B 65% B 35% A 25% B 75% A

30 Cluster Membership Re-Assigned Probabilistically A B 75% B 25% A 35% B 65% A

31 Central tendencies Re-Assigned Based on Membership A B 75% B 25% A 35% B 65% A

32 Cluster Membership Reassigned A B 60% B 40% A 45% B 55% A

33 EM: Expectation Maximization Iterative like k-means – but guided by a different computation Considered more principled than k-means, but much more computationally expensive Like k-means, you pick the number of clusters you want

34 Advanced Statistical Models

35 Quick View of Bayesian Networks Normally with Naïve Bayes you have simple conditional probabilities  P[Play = yes | Humitity = high] WindyPlay OutlookHumidityTemperature

36 Quick View of Bayesian Networks With Bayes Nets, there are interactions between attributes  P[play = yes & temp = hot | Humidity = high] Similar likelihood computation for an instance  You will still have one conditional probability per attribute to multiply together  But they won’t all be simple  Humidity is related jointly to temperature and play WindyPlay OutlookHumidityTemperature

37 Quick View of Bayesian Networks Learning algorithm needs to find the shape of the network Probabilities come from counts Two stages – similar idea to “kernel methods” WindyPlay OutlookHumidityTemperature

38 Doing Optimization in Weka

39 Optimizing Parameter Settings 1 2 4 5 3 Train Validation Test Iterate over settings Compare performance over validation set; Pick optimal setting Test on Test Set Use a modified form of cross- validation: Or you can have a hold-out Validation set you use for all folds Still N folds, but each fold has less training data than with standard cross validation

40 Remember! Cross-validation is for estimating your performance If you want the model that achieves that estimated performance, train over the whole set Same principle for optimization  Estimate your tuned performance using cross validation with an inner loop for optimization  When you build the model over the whole set, use the settings that work best in cross- validation over the whole set

41 Optimization in Weka Divide your data into 10 train/test pairs  Tune parameters using cross validation on the training set (this is the inner loop)  Use those optimized settings on the corresponding test set  Note that you may have a different set of parameter setting for each of the 10 train/test pairs You can do the optimization in the Experimenter

42 Train/Test Pairs * Use the StratifiedRemoveFolds filter

43 Setting Up for Optimization * Prepare to save the results Load in training sets for all folds We’ll use cross validation Within training folds to Do the optimization

44 What are we optimizing? Let’s optimize the confidence factor. Let’s try.1,.25,.5, and.75

45 Add Each Algorithm to Experimenter Interface

46 Look at the Results * Note that optimal setting varies across folds.

47 Apply the optimized settings on each fold * Performance on Test1 using optimized settings from Train1

48 What if the optimization requires work by hand? Do you see a problem with the following?  Do feature selection over the whole set to see which words are highly ranked  Create user defined features with subsets of these to see which ones look good  Add those to your feature space and do the classification

49 What if the optimization requires work by hand? The problem is that is just like doing feature selection over your whole data set You will over- estimate your performance So what’s a better way of doing that?

50 What if the optimization requires work by hand? You could set aside a small subset of data Using that small subset, do the same process Then use those user defined features with the other part of the data

51 Take Home Message Instance based learning and clustering both make use of similarity metrics Clustering can be used to help you understand your data or to add new features to your data Weka provides opportunities to tune all of its algorithms through the object editor You can use the Experimenter to tune the parameter settings when you are estimating your performance using cross- validation


Download ppt "Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."

Similar presentations


Ads by Google