Presentation is loading. Please wait.

Presentation is loading. Please wait.

What you need to know to get started with writing code for machine learning.

Similar presentations


Presentation on theme: "What you need to know to get started with writing code for machine learning."— Presentation transcript:

1 What you need to know to get started with writing code for machine learning

2 Why use WEKA? Writing code for reading data and make it usable can be tedious Writing code to evaluate results can be too Having a unifying framework is great for code sharing and team work

3 Preparing data for WEKA.arff format Consists of 2 parts: Data description enumerates attributes and specifies their domain @attribute att_name attribute_type or domain Actual data Starts with @data Tab or comma separated format Missing values denoted by ‘?’

4

5 Algorithms supported by WEKA Naïve Bayes Decision Tree Multilayer Perceptron SMO (optimized support vector machine) Linear regression …many many others

6 Performing experiments Data used for training of the classifier should not be used to evaluate the performance of the classifier. Why? Data is split into training and testing set If we wish to evaluate performance of classifier on entire dataset, we use k-fold cross-validation Split data examples into k-bins randomly Leave one bin out, use remaining k-1 bins for training and left-out bin for testing until all k bins are tested

7 Writing code for WEKA Major classed you will most likely be using are in weka.core Attribute – class for information of data attributes Attribute att; att.numValues(); att.name(); att.enumerateValues(); Instances – enumerable container for entire data. Traversing enumerator will go trough every Instance private Instances data; …… Enumeration enumInsts=instances.enumerateInstances(); while(enumInsts.hasNext()) { Instance instance = (Instance) enumInsts.nextElement(); } Has information about attributes data.enumerateAttributes()

8 WEKA’s classes cntd Instance Can get value of an attribute private Instance instance; … instance.value(attribute); Instance.isMissing(attribute);

9 More useful classes Check out weka.core.Utils class Easy useful functions, writing which 100 times will make your head hurt like max,min, mean value of an array, log2, sums etc. weka.Estimators DescreteEstimator – estimator for discrete events DiscreteEstimator newEst = new DiscreteEstimator(10, true); // Create 50 random integers first predicting the probability of the // value, then adding the value to the estimator Random r = new Random(seed); for(int i = 0; i < 50; i++) { current = Math.abs(r.nextInt() % 10); System.out.println(newEst); System.out.println("Prediction for " + current + newEst.getProbability(current)); newEst.addValue(current, 1); }

10 Feel free to use Naïve Bayes original code to see how to estimate prior probabilities from data, how to compute posterior probabilities, how to classify based on posterior probabilities Javadoc is your best friend

11 Lab 1 assignment objectives Play around with running experiments with WEKA (k-fold cross-validations, running pre-written classifiers, etc) Gain experience with data processing Get familiar with research work done in text classification Implement your own classifier! Experiment with your classifier

12 Document Classification Based on the document want to predict whether this document is about a topic or not Document data is not in table format How do we deal with it in that case? Naïve Bayes vs. other algorithms


Download ppt "What you need to know to get started with writing code for machine learning."

Similar presentations


Ads by Google