Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Short Introduction to Weka Natural Language Processing Thursday, September 25th.

Similar presentations


Presentation on theme: "A Short Introduction to Weka Natural Language Processing Thursday, September 25th."— Presentation transcript:

1 A Short Introduction to Weka Natural Language Processing Thursday, September 25th

2 What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’

3 weka Homepage ● http://www.cs.waikato.ac.nz/ml/weka/ ● To run: – java -Xmx1024M -jar ~cs4705/bin/weka.jar &

4 .arff file format ● http://www.cs.waikato.ac.nz/~ml/weka/arff.html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor, Iris- virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa …

5 .arff file format @attribute attrName {numeric, string,, date}  numeric: a number  nominal: a (finite) set of strings, e.g. {Iris-setosa,Iris-versicolor, Iris- virginica}  string:  date: (default ISO-8601) yyyy-MM- dd’T’HH:mm:ss

6 Example Arff Files ● ~cs4705/bin/weka-3-4-11/data/ ● iris.arff ● soybean.arff ● weather.arff

7 To Classify with weka GUI 1.Run weka GUI 2.Click 'Explorer' 3.'Open file...' 4.Select 'Classify' tab 5.'Choose' a classifier 6.Confirm options 7.Click 'Start' 8.Wait... 9.Right-click on Result list entry a.'Save result buffer' b.'Save model'

8 Classify ● Some classifiers to start with. – NaiveBayes – JRip – J48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!

9 Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization

10 Running weka from the Command Line ● Running an N-fold cross validation experiment – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i ● Using a predefined test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff

11 ● Saving the model – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model ● Classifying a test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff ● Getting help – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -?

12 S1 S2 … SN.arff Weka best model results Test.arff results Weka Preprocessing (you) Homework 2 Weka Workflow Grading (us) Experimentation (you) T1 … TN Your Feature Extractor Your Feature Extractor

13 Tips for Homework Success ● Start early ● Read instructions carefully ● Start simply ● Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.


Download ppt "A Short Introduction to Weka Natural Language Processing Thursday, September 25th."

Similar presentations


Ads by Google