Tutorial for LightSIDE

Tutorial for LightSIDE
June 4, 2018 Heejun Kim

The workflow for Text Mining
Preparing data Extracting features Building model Predicting labels Error analysis Preparing Data: Movie Reviews, Positive vs. Negative Whether a figure is a triangle or not. In my case, I am interested in predicting whether information is credible or not. Bag of words representation LightSIDE can cover all except the first step

Installing LightSIDE You should have JRE (1.8 preferred) or JDK
Download the zip file linked as “Program: LightSIDE” from the course website Unzip the file Mac: LightSide.app Windows: LightSide.bat Linux: run.sh JRE is an acronym for java running environment which is a basically virtual machine that help you to run Java-based program. Introduce where students can find the manual

Preparing Data (LightSIDE)
CSV file (comma delimited text file) One column for text, another column for class Additional attributes (e.g., length) that are pre-processed can be included in additional columns (will be read by using column features extractor) Encoding: UTF-8 is recommended

Preparing Data (LightSIDE)

Open Files Open a file Select a file Check details
For the simple work flow for the 2nd assignment, you will only need to have first, third and last tab. So let me go over the core process first and get some question and explore more functions later.

Extract Features Select extractor Configure detailed option Execute
Check performance of features Set threshold For the second assignment, you are only going to use “Basic Features”. However, for the 3rd assignment, you may want to explore other feature extractors. Only Unigram and select the “Skip stopwords in N-Grams” option except #5 question LightSIDE allows you to set a threshold on the minimum number of training set instances that must contain a particular feature in order for that feature to make it into the feature representation. If we set t=2, then only terms that appear in at least 2 training set instances make into the feature representation.

An example of feature table

Building Model + Predicting Label
Select a machine learning algorithm (e.g., NaiveBayes, Logistic Regression ) Evaluation method: independent training/test data or n-fold cross validation for your project Only Naïve Bayes algorithm

Training data + independent test data Training data Test data Training set, validation set, testing set

Cross validation (e.g., 5 fold) Test data Run Accuracy 1st 0.78 2nd 0.76 3rd 0.77 4th 0.73 Over-fitting 5th 0.79 avg 0.766 Training data

Build Models Configure detailed option Select algorithm
Select a feature table Select evaluation option Execute Only Naïve Bayes algorithm. Options may be appropriate for working with numeric feature values, but are generally unimportant. In some cases, these configuration are important. Sometimes using train.csv for training and testing, other times, using trains.csv for training and using test.csv for testing Showing table in the assignment and explain what training set accuracy and testing set accuracy Check performance of prediction

Select exploration method
Explore Results Select case Select a model Select a feature Select matrix Select exploration method Check the case Click a case

Choose a file you want to make prediction for
Predict Labels Export results Select a model Check results Choose a file you want to make prediction for

Any questions?

Tutorial for LightSIDE

Similar presentations

Presentation on theme: "Tutorial for LightSIDE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tutorial for LightSIDE

Similar presentations

Presentation on theme: "Tutorial for LightSIDE"— Presentation transcript:

Similar presentations

About project

Feedback