 1.1: Introduction  1.2: Descriptions  1.2.1: White wine description  1.2.2: Brest Tissue description  1.3: Conclusion.

Slides:



Advertisements
Similar presentations
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Advertisements

Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Introduction to Data Mining with XLMiner
Lab4 CPIT 440 Data Mining and Warehouse.
Credit Card Applicants’ Credibility Prediction with Decision Tree n Dan Xiao n Jerry Yang.
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
Data Description Tables and Graphs Data Reduction.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Lab2 CPIT 440 Data Mining and Warehouse.
Rapid Miner Session CIS 600 Analytical Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Select functionality  Execute.
Introduction to SPSS Edward A. Greenberg, PhD
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Basic Data Mining Technique
Organizing a project, making a table Biostatistics 212 Session 5.
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
 Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data.
1 (21) EZinfo Introduction. 2 (21) EZinfo  A Software that makes data analysis easy  Reveals patterns, trends, groups, outliers and complex relationships.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
SPSS Instructions for Introduction to Biostatistics Larry Winner Department of Statistics University of Florida.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Spreadsheet Applications What is Excel?. Microsoft Excel MS Excel is an electronic workbook that gives you the ability to perform business and scientific.
CSE/CIS 787 Analytical Data Mining, Dept. of EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want.
The Excel Component -Screen Shots-. Excel Menu Showing All Functionality.
An Exercise in Machine Learning
Welcome to Introduction to Bioinformatics Monday, 2 May 2005 Probability Anything you like How to read text files.
Why preprocessing? Learning method needs data type: numerical, nominal,.. Learning method cannot deal well enough with noisy / incomplete data Too many.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
A new clustering tool of Data Mining RAPID MINER.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Understanding SPSS Workshop Series February 18, 2016.
Rapid Miner Session CIS 787 Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want  Execute.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Data Mining Project Presentation Group A Saurav Das Guanghao Lin Yi-Chiang Lin Sameer Patil.
Collage Score Card & Software defect prediction
Introduction to the SPSS Interface
A Smart Tool to Predict Salary Trends of H1-B Holders
Machine Learning with Spark MLlib
Digital and Non-Linear Control
MIS 322 – Enterprise Business Process Analysis
MIS 451 Building Business Intelligence Systems
DTIAtlasFiberAnalyzer Tutorial
Lesson 1: Introduction to Trifacta Wrangler
Three steps for use Sample Datasets Assign the dataset file first
Classification and Prediction
Lesson 1: Introduction to Trifacta Wrangler
Lesson 1: Introduction to Trifacta Wrangler
Lesson 1: Introduction to Trifacta Wrangler
© 2011 Pearson Education, Inc. Publishing as Prentice Hall
Lesson 1: Introduction to Trifacta Wrangler
Lesson 1: Introduction to Trifacta Wrangler
Lecture 12: Data Wrangling
A Unifying View on Instance Selection
Lesson 1 – Chapter 1B Chapter 1B – Terminology
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
Classification & Prediction
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
PrognosTILs app Damien Drubay –
CSCI N317 Computation for Scientific Applications Unit Weka
Course Introduction CSC 576: Data Mining.
Independent Analysis Project
Intro to Machine Learning
Somi Jacob and Christian Bach
By Sandeep Patil, Department of Computer Engineering, I²IT
Data Wrangling with pandas
Introduction to the SPSS Interface
Tel Hope Foundation’s International Institute of Information Technology, (I²IT). Tel
P 72 (PDF 76) Figure 32 Information item name Rules in columns
SDMX Converter Abdulla Gozalov, UNSD.
Presentation transcript:

 1.1: Introduction  1.2: Descriptions  1.2.1: White wine description  1.2.2: Brest Tissue description  1.3: Conclusion

In this phase we discuss the first step in data mining PREPROCESSING on two datasets. The first one is an CSV file talked about White Wine, and the other is an XLS file talked about Brest Tissue. We work on Rabid Miner program. In this phase we will use plot data to understanding, find the outlier in data cleaning. Remove attribute (columns) which are not related to each other, set roles to convert target class from regular to label in data transformation. And using sampling from large data in data reduction.

 Methods:  1- Discretize process: In this method we choose quality as target class which is take values from 0 to 10 to represent quality of white wine from bad to excellent as a new classification.  We added four classes : Bad from –infinity to 3 Good from 4 to 5 Very good from 6 to 7 Excellent from 8 to 10

Figure : the model of discretize process

 Figure : the output of discretize method

 Figure : Sample process and Remove correlate attribute on white wine dataset

 Figure : result of sample process and remove correlation attribute on white wine dataset

 Figure filter example process on white win dataset

Figure : sweet white wine based on Syria measurements Figure : non sweet white win based on Syria measurements

 Figure : outlier process on Brest tissue dataset

 Figure: plot outlier method on Brest tissue dataset

Figer: the row of outlier data

Figure : remove correlated attribute from Brest tissue dataset

Figure : the remain attribute after execute the remove correlation process from Brest tissue

1.Preprocessing phase is very important to prepare your data for next phases, and be comfortable your data are correct. 2. You must input your data set as it is extension type 3. When input the attribute you must choose correct data type to work on it with more flexibility. 4. Methods maybe not satisfy for other data set, because each data set has specific characteristics. 5. if you have a sample process in a model every time you can get a deferent results.