CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Authorship Verification Authorship Identification Authorship Attribution Stylometry.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
Credit Card Applicants’ Credibility Prediction with Decision Tree n Dan Xiao n Jerry Yang.
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
Introduction to Data Mining with Case Studies
An Extended Introduction to WEKA. Data Mining Process.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Data Mining: A Closer Look
DATA MINING AND MACHINE LEARNING Addison Euhus and Dan Weinberg.
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
Yoonjung Choi.  The Knowledge Discovery in Databases (KDD) is concerned with the development of methods and techniques for making sense of data.  One.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
Chapter 7 Decision Tree.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
An Exercise in Machine Learning
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
CLassification TESTING Testing classifier accuracy
Evaluation – next steps
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Course Title Database Technologies Instructor: Dr ALI DAUD Course Credits: 3 with Lab Total Hours: 45 approximately.
Weka Project assignment 3
Data Mining Applied to Document Imaging Jeff Rekoske.
CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Artificial Neural Network Building Using WEKA Software
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie 1.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Clustering, performance evaluation, and Term Project 1.Term Project 2.Resource for review.
1 Data Mining on New Road Prediction By Qing Liu Dec. 9, 2004.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
1 SBM411 資料探勘 陳春賢. 2 Lecture I Class Introduction.
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
An Introduction to WEKA
Data Mining: Concepts and Techniques Course Outline
An Introduction to WEKA
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Machine Learning with Weka
An Introduction to WEKA
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning with Weka
Dept. of Computer Science University of Liverpool
Lecture 10 – Introduction to Weka
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Neural Networks Weka Lab
Practice Project Overview
Presentation transcript:

CSc288 Term Project Data mining on predict Voice-over-IP Phones market Huaqin Xu

Agenda Abstract Introduction Methodology Result Conclusion Learning Experience References

Abstract This project based on the VoIP survey data sets. Weka explorer’s classifiers are chosen as data mining tool to build models to predict potential customers of VoIP phone and the most important features and services of two VoIP models.

Introduction Background  VoIP phone has a potential opportunity with the wide use of internet service.  Two VoIP phone models: Basic & Deluxe Data mining Scope  Customer  Product features and services

Methodology Data Mining Tools  C4.5/C5.0, Cubist  Weka  Microsoft SQL Server  SPSS Chose: Weka Explorer Why? Free, Easy, Good Interface, More choices……

Methodology Explorer Vs KnowledgeFlow

Methodology Datasets: Totally: 94 instances

Methodology Preprocessing  Split table Customer: 17 attributes Basic-model: 14 attributes Deluxe-model: 10 attributes  Processing Missing data Delete Replaced by “?”  Transfer data type SPSS  Excel  Weka

Methodology Algorithm selection  Classification  Clustering  Association Chose: NNge Why?  High accuracy rate  Simple, clear Rules AlgorithmsCorrect Instances (%) Naivebayes63.82 DecisionStump65.95 Id J NBTree79.78 ConjunctiveRule69.14 DecisionTable80.85 NNge87.23 OneR71.27 PART72.34 Prism88.29 Ridor71.27 JRip74.46 ZeroR63.83 AdaBoostM BayesNet60.63

NNge classifier  Nearest-neighbor like algorithm using non- nested generalized exemplars.  a rule based classifier  builds a sort of “hypergeometric” model.  shows promise as an ML method that performs well on a wide range of datasets Methodology

Result

Rules:  One of customer rules : class Would_Buy IF : cost in {10-20} ^ phone in {yes} ^ in {yes} ^ fax in {no} ^ chat in {yes,no} ^ other in {no} ^ service type in {Phone_cards_only} ^ price in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ voice_quality in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ service in {Somewhat_Dissatisfied} ^ convenience in {Somewhat_Satisfied} ^ promotion in {Somewhat_Dissatisfied} ^ Know VoIP in {yes,no} ^ marital status in {Single} ^ gender in {Male} (11)

Result Stat:  Classes allocation  Feature weights

Result Basic-model & Deluxe-model  Schema: meta.AttributeSelectedClassifier  Subschema: rules.NNge  Selected attributes: 3,6,8,10,11,12 : 6  Why? avoid overfitting

Result Evaluation Ten-fold cross-validation  Summary Correctly classified instances > 85%  Detailed Accuracy By Class TP, FP, Precision, Recall, F measure  Confusion Matrix Misclassified instances:12 instances/94 instances

Result

Conclusion Limitation  Small Datasets  Incomplete Data source Models  High accuracy rate  Help further Market Analysis  Help product design

Learning Experience Process a real data mining problem Know Classification algorithms better  Numeric, Nominal  Missing data  Overfitting Know Evaluation methods better  How to compare algorithms  Evaluation factors

Learning Experience Learn how to use Weka  Future work: learn how to modify source to perform better data mining Learn from classmates

References ”Data Mining - Concepts and Techniques" by Jiawei Han and Micheline Kamber, Morgan Kaufmann ”Data Mining - Concepts and Techniques"Jiawei Han “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” by Ian H. Witten and Eibe Frank, Morgan Kaufmann “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” Machine Learning---Weka Home Page Marketing Research by David A. Aaker, V. Kumer and George S. Day, eighth edition, Willey 2004.

Thank you