Haftu 201324487 Shamini 201424192 Thomas 201424190 Temesgen Seyoum 201425090.

Slides:



Advertisements
Similar presentations
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Advertisements

Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )
Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Using data sets to simulate evolution within complex environments Bruce Edmonds Centre for Policy Modelling Manchester Metropolitan University.
Indian Statistical Institute Kolkata
Knowledge Discovery in Databases MIS 637 Professor Mahmoud Daneshmand Fall 2012 Final Project: Red Wine Recipe Data Mining By Jorge Madrazo.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Evaluation.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Experimental Evaluation
Chapter 5 Data mining : A Closer Look.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Evaluating Performance for Data Mining Techniques
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Evaluating Classifiers
Anomaly detection Problem motivation Machine Learning.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Statistical Modeling with SAS/STAT Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 9, 2015.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.
Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Evaluating Results of Learning Blaž Zupan
Chapter 6 Cross Validation.
Flat clustering approaches
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 5-1 Data Mining Methods: Classification Most frequently used DM method Employ supervised.
Using Classification Trees to Decide News Popularity
Validation methods.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Rank-Based Approach to Optimal Score via Dimension Reduction Shao-Hsuan Wang National Taiwan University, Taiwan Nov
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Data Science Credibility: Evaluating What’s Been Learned
Logistic Regression: To classify gene pairs
Evaluating Classifiers
Ch8: Nonparametric Methods
DnDAF security views.
Using local variable without initialization is an error.
Roberto Battiti, Mauro Brunato
A Unifying View on Instance Selection
FCTA 2016 Porto, Portugal, 9-11 November 2016 Classification confusion within NEFCLASS caused by feature value skewness in multi-dimensional datasets Jamileh.
Learning Algorithm Evaluation
COSC 4335: Other Classification Techniques
Model Evaluation and Selection
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Objective 1: Use Weka’s WrapperSubsetEval (Naïve Bayes
Jia-Bin Huang Virginia Tech
Presentation transcript:

Haftu Shamini Thomas Temesgen Seyoum

 Wine Rating is a score assigned by one or more wine critics.  In most cases, wine ratings are set by single wine critic.  Wine ratings are done with a scale of: ◦ ◦ 0-10 ◦ 0-5

 We would like to automate the wine ratting system and replace the wine critics role with a data mining algorithm.  The problem is of classification type. We would be classifying wine quality from class of 1 to 10.  The problem is also of inference type as we would be interested in what kinds of attributes affect the quality of the wine the most.

 The data is gotten from the following link: ◦  Data set Characteristics: Multivariate  Attribute Characteristics: Real  Number of Instances: 4898  Number of Attributes: 12 Attributes Fixed acidity Volatile Acididty Citric acid Residual sugar Chlorides Free sulfur dioxide Total sulfur dioxide Density Ph Sulphates alchol Quality

 It has been indicated that it is uncertain that all input variables are really relevant,  As such, we would be dealing with subset selections among the 11 attributes.  We would be using K-fold cross validation.  We currently presume to use k=10, though we might resort to iteratively choosing of the optimal k.

 Once we finished with the modeling, we will use the accuracy, specificity, sensitivity evaluation parameters to judge our results.  We would include a confusion matrix to show the effectiveness of our model.  We would express the error rate for each iteration of the K-fold validation.

Thank You!