Using Clustering to Make Prediction Intervals For Neural Networks

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Naïve-Bayes Classifiers Business Intelligence for Managers.
Sampling: Final and Initial Sample Size Determination
Perimeter and Area.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
What is the standard deviation good for? Numerical data that come from an experiment carry inherently some “unit of measure”, for example How long a battery.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Evaluating Hypotheses
Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Robust Bayesian Classifier Presented by Chandrasekhar Jakkampudi.
 a fixed measure for a given population  ie: Mean, variance, or standard deviation.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Anomaly detection Problem motivation Machine Learning.
PATTERN RECOGNITION AND MACHINE LEARNING
Using Machine Learning for Epistemic Uncertainty Quantification in Combustion and Turbulence Modeling.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample.
Normal Distributions Z Transformations Central Limit Theorem Standard Normal Distribution Z Distribution Table Confidence Intervals Levels of Significance.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Chapter 6: Random Errors in Chemical Analysis CHE 321: Quantitative Chemical Analysis Dr. Jerome Williams, Ph.D. Saint Leo University.
Predicting Earthquakes By Lois Desplat. Why Predict Earthquakes?  To minimize the loss of life and property.  Unfortunately, current techniques do not.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.
What is a Confidence Interval?. Sampling Distribution of the Sample Mean The statistic estimates the population mean We want the sampling distribution.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
ES 07 These slides can be found at optimized for Windows)
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
1 ENGINEERING MEASUREMENTS Prof. Emin Korkut. 2 Statistical Methods in Measurements.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Simulation – Output Analysis. Point Estimate Point estimates are single value estimates of parameters of interest (in this case the mean and standard.
Big data classification using neural network
Deep Feedforward Networks
(5) Notes on the Least Squares Estimate
Model validation and prediction
Data Mining: Concepts and Techniques
Basic machine learning background with Python scikit-learn
Overview of Supervised Learning
Statistics in Applied Science and Technology
Machine Learning Week 1.
An Inteligent System to Diabetes Prediction
Pattern Classification via Density Estimation
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Obstacle avoidance using a Multi-Layer Perception
The Naïve Bayes (NB) Classifier
Forecasting Electricity Demand and Prices with Machine Learning
CS539 Project Report -- Evaluating hypothesis
Obstacle avoidance using a Multi-Layer Perception
Sampling: How to Select a Few to Represent the Many
Support Vector Machines 2
Presentation transcript:

Using Clustering to Make Prediction Intervals For Neural Networks Claus Benjaminsen ECE539 - final project fall 2005

What is a prediction interval? An interval within which the true target value is predicted to be Prediction intervals are often defined by Interval end values and an associated probability A Gaussian distribution with a certain mean and variance

Motivation Give user a more informative output Precision of prediction (width of interval) Max and min limits (end values of interval) Certainty of prediction (associated probability) Make user able to make better decisions

Approach 1 Use clustering to group training feature vectors Associate the mean training error of all the features within each cluster to the corresponding cluster center Estimate prediction interval for all training features in a cluster by scaling the associated error

Approach 2 Given a new input its membership to one of the cluster centers is determined The prediction interval for the new input is then the same as for the training data belonging to that cluster

Data set used Synthetic data set 1 input feature 1 output target 256 samples Divided into: 156 training samples 100 testing samples

Results Plot of the test data along with prediction intervals estimated by the clustering method

Results 2 cost_test_clus 0.001277 The cost function is calculated as the mean squared distance from the edge of the prediction interval to the test target for all targets not inside prediction intervals The clustering method has the best performance of the three different methods cost_test_clus 0.001277 cost_test_var 0.004856 cost_test_basis 0.002414

Discussion Model can become big, when input feature space is big and the number of training samples is high Problems when new inputs doesn’t “look” like any of the training inputs

Conclusion The clustering method can estimate prediction intervals, which yields a very good performance Data used is very well suited for clustering method – might not show same good result for other types of datasets This will have to be tested in the future!