Download presentation
Presentation is loading. Please wait.
1
A Short Introduction to Weka Natural Language Processing Thursday, September 25th
2
What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’
3
weka Homepage ● http://www.cs.waikato.ac.nz/ml/weka/ ● To run: – java -Xmx1024M -jar ~cs4705/bin/weka.jar &
4
.arff file format ● http://www.cs.waikato.ac.nz/~ml/weka/arff.html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor, Iris- virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa …
5
.arff file format @attribute attrName {numeric, string,, date} numeric: a number nominal: a (finite) set of strings, e.g. {Iris-setosa,Iris-versicolor, Iris- virginica} string: date: (default ISO-8601) yyyy-MM- dd’T’HH:mm:ss
6
Example Arff Files ● ~cs4705/bin/weka-3-4-11/data/ ● iris.arff ● soybean.arff ● weather.arff
7
To Classify with weka GUI 1.Run weka GUI 2.Click 'Explorer' 3.'Open file...' 4.Select 'Classify' tab 5.'Choose' a classifier 6.Confirm options 7.Click 'Start' 8.Wait... 9.Right-click on Result list entry a.'Save result buffer' b.'Save model'
8
Classify ● Some classifiers to start with. – NaiveBayes – JRip – J48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!
9
Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization
10
Running weka from the Command Line ● Running an N-fold cross validation experiment – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i ● Using a predefined test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff
11
● Saving the model – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model ● Classifying a test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff ● Getting help – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -?
12
S1 S2 … SN.arff Weka best model results Test.arff results Weka Preprocessing (you) Homework 2 Weka Workflow Grading (us) Experimentation (you) T1 … TN Your Feature Extractor Your Feature Extractor
13
Tips for Homework Success ● Start early ● Read instructions carefully ● Start simply ● Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.