Download presentation
Presentation is loading. Please wait.
1
Tutorial for LightSIDE
June 4, 2018 Heejun Kim
2
The workflow for Text Mining
Preparing data Extracting features Building model Predicting labels Error analysis Preparing Data: Movie Reviews, Positive vs. Negative Whether a figure is a triangle or not. In my case, I am interested in predicting whether information is credible or not. Bag of words representation LightSIDE can cover all except the first step
3
Installing LightSIDE You should have JRE (1.8 preferred) or JDK
Download the zip file linked as “Program: LightSIDE” from the course website Unzip the file Mac: LightSide.app Windows: LightSide.bat Linux: run.sh JRE is an acronym for java running environment which is a basically virtual machine that help you to run Java-based program. Introduce where students can find the manual
4
Preparing Data (LightSIDE)
CSV file (comma delimited text file) One column for text, another column for class Additional attributes (e.g., length) that are pre-processed can be included in additional columns (will be read by using column features extractor) Encoding: UTF-8 is recommended
5
Preparing Data (LightSIDE)
6
Open Files Open a file Select a file Check details
For the simple work flow for the 2nd assignment, you will only need to have first, third and last tab. So let me go over the core process first and get some question and explore more functions later.
7
Extract Features Select extractor Configure detailed option Execute
Check performance of features Set threshold For the second assignment, you are only going to use “Basic Features”. However, for the 3rd assignment, you may want to explore other feature extractors. Only Unigram and select the “Skip stopwords in N-Grams” option except #5 question LightSIDE allows you to set a threshold on the minimum number of training set instances that must contain a particular feature in order for that feature to make it into the feature representation. If we set t=2, then only terms that appear in at least 2 training set instances make into the feature representation.
8
An example of feature table
9
Building Model + Predicting Label
Select a machine learning algorithm (e.g., NaiveBayes, Logistic Regression ) Evaluation method: independent training/test data or n-fold cross validation for your project Only Naïve Bayes algorithm
10
Building Model + Predicting Label
Training data + independent test data Training data Test data Training set, validation set, testing set
11
Building Model + Predicting Label
Cross validation (e.g., 5 fold) Test data Run Accuracy 1st 0.78 2nd 0.76 3rd 0.77 4th 0.73 Over-fitting 5th 0.79 avg 0.766 Training data
12
Build Models Configure detailed option Select algorithm
Select a feature table Select evaluation option Execute Only Naïve Bayes algorithm. Options may be appropriate for working with numeric feature values, but are generally unimportant. In some cases, these configuration are important. Sometimes using train.csv for training and testing, other times, using trains.csv for training and using test.csv for testing Showing table in the assignment and explain what training set accuracy and testing set accuracy Check performance of prediction
13
Select exploration method
Explore Results Select case Select a model Select a feature Select matrix Select exploration method Check the case Click a case
14
Choose a file you want to make prediction for
Predict Labels Export results Select a model Check results Choose a file you want to make prediction for
15
Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.