A Short Introduction to Weka Natural Language Processing Thursday, September 25th
What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’
weka Homepage ● ● To run: – java -Xmx1024M -jar ~cs4705/bin/weka.jar &
.arff file format ● % 1. Title: Iris Plants Database sepallength sepalwidth petallength petalwidth class {Iris-setosa,Iris-versicolor, Iris- 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa …
.arff file attrName {numeric, string,, date} numeric: a number nominal: a (finite) set of strings, e.g. {Iris-setosa,Iris-versicolor, Iris- virginica} string: date: (default ISO-8601) yyyy-MM- dd’T’HH:mm:ss
Example Arff Files ● ~cs4705/bin/weka /data/ ● iris.arff ● soybean.arff ● weather.arff
To Classify with weka GUI 1.Run weka GUI 2.Click 'Explorer' 3.'Open file...' 4.Select 'Classify' tab 5.'Choose' a classifier 6.Confirm options 7.Click 'Start' 8.Wait... 9.Right-click on Result list entry a.'Save result buffer' b.'Save model'
Classify ● Some classifiers to start with. – NaiveBayes – JRip – J48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!
Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization
Running weka from the Command Line ● Running an N-fold cross validation experiment – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i ● Using a predefined test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff
● Saving the model – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model ● Classifying a test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff ● Getting help – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -?
S1 S2 … SN.arff Weka best model results Test.arff results Weka Preprocessing (you) Homework 2 Weka Workflow Grading (us) Experimentation (you) T1 … TN Your Feature Extractor Your Feature Extractor
Tips for Homework Success ● Start early ● Read instructions carefully ● Start simply ● Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.