ITSC/University of Alabama in Huntsville ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
WEKA (sumber: Machine Learning with WEKA). What is WEKA? Weka is a collection of machine learning algorithms for data mining tasks. Weka contains.
Neural Networks Chapter Feed-Forward Neural Networks.
Brief overview of ideas In this introductory lecture I will show short explanations of basic image processing methods In next lectures we will go into.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Rapid Miner Session CIS 600 Analytical Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Select functionality  Execute.
Dr. Donald Kraft (Chair)
Overview of the current architecture TRADUCTIONS Client system Codes in XML Translation in XML XML Import XML Export Domain Tables Codes Translations Graphical.
Topics Introduction Hardware and Software How Computers Store Data
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Web Services Overview Ashraf Memon. 2 Overview Service Oriented Architecture Web service overview Benefits of Web services Core technologies: XML, SOAP,
1 Peter Fox Data Science – ITEC/CSCI/ERTH Week 7, October 19, 2010 Data Mining.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Appendix: The WEKA Data Mining Software
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston.
Constructing Data Mining Applications based on Web Services Composition Ali Shaikh Ali and Omer Rana
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
EARTH SCIENCE MARKUP LANGUAGE Tutorial on how to write an ESML Description File (for ESML Schema v3.0) “Define Once Use Anywhere” INFORMATION TECHNOLOGY.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
CSE/CIS 787 Analytical Data Mining, Dept. of EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
An Exercise in Machine Learning
Data Mining and Decision Support
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
WEKA's Knowledge Flow Interface Data Mining Knowledge Discovery in Databases ELIE TCHEIMEGNI Department of Computer Science Bowie State University, MD.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Detecting Web Attacks Using Multi-Stage Log Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
What’s new in FUSION? Bob McGaughey
Topics Introduction Hardware and Software How Computers Store Data
DATA MINING © Prentice Hall.
Rule Induction for Classification Using
Dipartimento di Ingegneria «Enzo Ferrari»,
Basic machine learning background with Python scikit-learn
Waikato Environment for Knowledge Analysis
Advanced Analytics. Advanced Analytics What is Machine Learning?
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Machine Learning with Weka
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Topics Introduction Hardware and Software How Computers Store Data
Face Components detection
Tutorial for WEKA Heejun Kim June 19, 2018.
iSRD Spam Review Detection with Imbalanced Data Distributions
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with WEKA
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

ITSC/University of Alabama in Huntsville ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville

ITSC/University of Alabama in Huntsville Tutorial Outline Overview of the Mining System –Architecture –Data Formats –Components Using the client: ADaM Plan Builder Demos –How to write a mining plan

ITSC/University of Alabama in Huntsville ADaM v4.0 Architecture Simple component based architecture Each operation is a stand alone executable Users can either use the PlanBuilder or write scripts using their favorite scripting language (Perl, Python, etc) Users can write custom programs using one or more of the operations Users can create webservices using these operations

ITSC/University of Alabama in Huntsville A1A1 A2A2 A3A3 AnAn A` WS DPWS DPWS DP WS DP WS DP ……… Virtual Repository of Operations E E E E E WS DP E Driver Program Web Service Interface ESML Description Interface(s) Exploration/Interactive Applications Production/Batch Custom Program A1A1 E A3A3 E A` E Versatile/Reusable Mining Component Architecture of ADaM v4.0 (Eagle) Distributed Access 3 rd Party ADaM PLAN BUILDER ADaM V4.0

ITSC/University of Alabama in Huntsville ADaM Data Formats There are two data formats that work with ADaM Components ARFF Format –An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes Binary Image Format –Used to write image files

ITSC/University of Alabama in Huntsville ARFF Data Format ARFF files have two distinct sections. The first section is the Header information, which is followed by the Data information. The Header of the ARFF file contains the name of the relation, a list of the attributes (the columns in the data), and their types. An example header on the standard IRIS dataset looks like sepallength sepalwidth petallength petalwidth class 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa

ITSC/University of Alabama in Huntsville Binary Image Data Format –Contains a header with signature and size (X,Y,Z) followed by the image data –Sample code to write header: int header[4]; header[0] = 0xabcd; header[1] = mSize.x; header[2] = mSize.y; header[3] = mSize.z; if (fwrite (header, sizeof(int), 4, outfile) != 4) { fprintf (stderr, "Error: Could not write header to %s\n", filename); return(false); }

ITSC/University of Alabama in Huntsville ADaM Components Components arranged into FOUR groups : Image Processing (Binary Image format) –Contains typical image processing operations such as spatial filters Pattern Recognition (ARFF format) –Contains pattern recognition and mining operations for both supervised and unsupervised classification Optimization –Contains general purpose optimization operations such as genetic algorithms and stochastic hill climbing Translation –Contains utility operations to convert data from one format to another such as image to gif

ITSC/University of Alabama in Huntsville ADaM Mining Plan A sequence of selected operations The ADaM Plan Builder allows the user to select and sequence Mining Operations for a given problem One could use any scripting language to write a mining plan Opn 2 Opn 3 Opn1

ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Plan Menu allows one to: Create a new plan or Load an existing plan Remove a newly-added operation from a plan Operation Menu contains the list of operations one can select

ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Panel where Mining Plan can be viewed either as a text or a tree

ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Panel where Mining Plan can be viewed either as text or a tree Description about the Operation can be viewed in this panel All the parameters needed for the Operation are described here Sample values for Operation’s Parameters are show in this panel

ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Go Mine the data using the Mining Plan Allows user to select the operation and add it to the Mining Plan Utility function to create samples for training

ITSC/University of Alabama in Huntsville Demo! Training a classifier to identify cancerous breast cells using a Bayes Classifier Workflow: –Brief explanation on Bayes Classifier –Sampling the data (training and testing set) –Training the Bayes Classifier –Applying the Bayes Classifier –Interpretation of the Results

ITSC/University of Alabama in Huntsville Bayes Classifier STARTING POINT: BAYES THEOREM FOR CONDITIONAL PROBABILITY END POINT: BAYES THEOREM CLASSIFIER FOR SEGMENTATION TERM 1: PROBABILITY OF DATA POINT X BELONGING IN CLASS ( I ) TERM 2: PROBABILITY OCCURRENCE OF A CLASS BASED ON NUMBER OF CLASSES USED IN SEGMENTATION TERM 4: PROBABILITY THAT DATA POINT X BELONGS TO CLASS (I) TERM 3: NORMALIIZATION TERM TO KEEP VALUES BETWEEN 0 -1

ITSC/University of Alabama in Huntsville Data File Instances described by attributes and a class label (4 –cancerous, 2-non- Clump_Thickness Uniformity_of_Cell_Size Uniformity_of_Cell_Shape Marginal_Adhesion Single_Epithelial_Cell_Size Bare_Nuclei Bland_Chromatin Normal_Nucleoli Mitoses class {2,

ITSC/University of Alabama in Huntsville Demo!

Evaluating Results (Training Set) Confusion Matrix | 0 1 <--- Actual Class | | ^ | Classified As POD FAR CSI HSS Accuracy 324 of 341 ( Pct) Overall Accuracy based on Confusion Matrix Probability of Detection False Alarm Rate Skill Scores

ITSC/University of Alabama in Huntsville Evaluating Results (Test Set) Confusion Matrix | 0 1 <--- Actual Class | | ^ | Classified As POD FAR CSI HSS Accuracy 328 of 342 ( Pct) Overall Accuracy based on Confusion Matrix Probability of Detection False Alarm Rate Skill Scores