Tutorial for LightSIDE

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

University of Sheffield NLP Module 4: Machine Learning.
Lesson One: The Beginning Chapter 2: Processing Learning Processing Daniel Shiffman Presentation by Donald W. Smith Graphics from built-in help reference.
Java Integrated Development Environments: ECLIPSE Part1 Installation.
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
13-Jul-15 Getting Ready for Java. 2 What You Need 256 MB of RAM (512 MB or more recommended) 500 MHz Pentium or better Macintosh: must run Mac OS X, preferably.
15-Jul-15 Starting Eclipse Just the basics. Getting Eclipse If you Google for “Eclipse”: The first hit is the home page, The second.
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Tutorial session 1 Network generation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Carolina Environmental Program UNC Chapel Hill The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison.
1 IMPORTANT NOTE  IMPORTANT NOTE not  As of this writing the default project you will download, import and use in this class is not enabled for Tomcat.
WRF Domain Wizard A tool for the WRF Preprocessing System Jeff Smith Paula McCaslin July 17, 2008.
WEKA – Knowledge Flow & Simple CLI
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
DAY 4: MICROSOFT EXCEL: IN-CLASS PROJECT Aliya Farheen August 27, 2015.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
ITCS 6162 Project Action Rules Implementation
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Liferay Installation Prepared by: Do Xuan Hai 8 August 2011.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Tool Install How to download & install Java 6 & Eclipse updated version based on Dr. G. L. Ray’s slides.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Time to apply stuff… Faculty of Mathematics and Physics Charles University in Prague 5 th October 2015 Workshop 1 – Java Wrestling.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
MIS Week 5 Site:
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
CACI Proprietary Information | Date 1 Upgrading to webMethods Product Suite Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8,
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
NEMO – Reformating tool
Introduction to OBIEE:
Predicting E. Coli Promoters Using SVM
Lightweight introduction
Lightweight introduction
CH. 2: Supervised Learning
WELCOME. How to Control Apple Mac with Your Voice Command?
Prepared by Kimberly Sayre and Jinbo Bi
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Setting up Eclipse Locally
HPOG RoundTable Pre-Work – Running PAGES Reports
Learning about Taxes with Intuit ProFile
Download Orders, Shipments and Receipts
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Tutorial for WEKA Heejun Kim June 19, 2018.
Learning about Taxes with Intuit ProFile
Text Analytics and Machine Learning Workshop Machine Learning Session
CSCI N317 Computation for Scientific Applications Unit Weka
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Intro to Machine Learning
Install MySQL Community Server and MySQL Workbench
Machine Learning with WEKA
Lecture 10 – Introduction to Weka
Chapter 7: Transformations
Statistical Learning Introduction to Weka
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Assignment 8 : logistic regression
Tutorial 8 Sharing, Integrating, and Analyzing Data
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Importing And Exporting
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

Tutorial for LightSIDE June 4, 2018 Heejun Kim

The workflow for Text Mining Preparing data Extracting features Building model Predicting labels Error analysis Preparing Data: Movie Reviews, Positive vs. Negative Whether a figure is a triangle or not. In my case, I am interested in predicting whether information is credible or not. Bag of words representation LightSIDE can cover all except the first step

Installing LightSIDE You should have JRE (1.8 preferred) or JDK Download the zip file linked as “Program: LightSIDE” from the course website Unzip the file Mac: LightSide.app Windows: LightSide.bat Linux: run.sh JRE is an acronym for java running environment which is a basically virtual machine that help you to run Java-based program. Introduce where students can find the manual

Preparing Data (LightSIDE) CSV file (comma delimited text file) One column for text, another column for class Additional attributes (e.g., length) that are pre-processed can be included in additional columns (will be read by using column features extractor) Encoding: UTF-8 is recommended

Preparing Data (LightSIDE)

Open Files Open a file Select a file Check details For the simple work flow for the 2nd assignment, you will only need to have first, third and last tab. So let me go over the core process first and get some question and explore more functions later.

Extract Features Select extractor Configure detailed option Execute Check performance of features Set threshold For the second assignment, you are only going to use “Basic Features”. However, for the 3rd assignment, you may want to explore other feature extractors. Only Unigram and select the “Skip stopwords in N-Grams” option except #5 question LightSIDE allows you to set a threshold on the minimum number of training set instances that must contain a particular feature in order for that feature to make it into the feature representation. If we set t=2, then only terms that appear in at least 2 training set instances make into the feature representation.

An example of feature table

Building Model + Predicting Label Select a machine learning algorithm (e.g., NaiveBayes, Logistic Regression ) Evaluation method: independent training/test data or n-fold cross validation for your project Only Naïve Bayes algorithm

Building Model + Predicting Label Training data + independent test data Training data Test data Training set, validation set, testing set

Building Model + Predicting Label Cross validation (e.g., 5 fold) Test data Run Accuracy 1st 0.78 2nd 0.76 3rd 0.77 4th 0.73 Over-fitting 5th 0.79 avg 0.766 Training data

Build Models Configure detailed option Select algorithm Select a feature table Select evaluation option Execute Only Naïve Bayes algorithm. Options may be appropriate for working with numeric feature values, but are generally unimportant. In some cases, these configuration are important. Sometimes using train.csv for training and testing, other times, using trains.csv for training and using test.csv for testing Showing table in the assignment and explain what training set accuracy and testing set accuracy Check performance of prediction

Select exploration method Explore Results Select case Select a model Select a feature Select matrix Select exploration method Check the case Click a case

Choose a file you want to make prediction for Predict Labels Export results Select a model Check results Choose a file you want to make prediction for

Any questions?