Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.

Slides:



Advertisements
Similar presentations
Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang.
Advertisements

Evaluating Classifiers
Chapter 6 Sampling and Sampling Distributions
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
GATHERING DATA Chapter Experiment or Observe?
Introduction to Sampling (Dr. Monticino). Assignment Sheet  Read Chapter 19 carefully  Quiz # 10 over Chapter 19  Assignment # 12 (Due Monday April.
1 Chapter 15 System Errors Revisited Ali Erol 10/19/2005.
Evaluation.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Mutual Information Mathematical Biology Seminar
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
Chapter 7 Sampling and Sampling Distributions
Additional Topics in Regression Analysis
Resampling techniques
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Evaluation.
Ensemble Learning: An Introduction
Evaluating Hypotheses
Part III: Inference Topic 6 Sampling and Sampling Distributions
QMS 6351 Statistics and Research Methods Chapter 7 Sampling and Sampling Distributions Prof. Vera Adamchik.
Machine Learning: Ensemble Methods
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Chapter 7 Probability and Samples: The Distribution of Sample Means
For Better Accuracy Eick: Ensemble Learning
1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Issues with Data Mining
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
by B. Zadrozny and C. Elkan
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
PARAMETRIC STATISTICAL INFERENCE
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn.
CHAPTER 12 DETERMINING THE SAMPLE PLAN. Important Topics of This Chapter Differences between population and sample. Sampling frame and frame error. Developing.
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Revisiting Sampling Concepts. Population A population is all the possible members of a category Examples: the heights of every male or every female the.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
CpSc 881: Machine Learning Evaluating Hypotheses.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Classification Ensemble Methods 1
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1. 2 DRAWING SIMPLE RANDOM SAMPLING 1.Use random # table 2.Assign each element a # 3.Use random # table to select elements in a sample.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Chapter 6 Sampling and Sampling Distributions
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Data Science Credibility: Evaluating What’s Been Learned
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Statistical Inference
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Ch13 Empirical Methods.
Ensemble learning.
Presentation transcript:

Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville

Relational vs. Traditional Independent Data Sets

Simple Random Partitioning Example ► Divide Movies into two subsets, Training & Test set, by randomly selecting a movie without replacement and adding it to subset ► A movie may only appear in one subset ► A movie may only appear once in a subset ► For each movie add the corresponding Studio to the subset. ► A studio may appear in both subsets.

Test Bias ► Simple Random Partitioning causes training and test set dependency. (Studio in both sets) Studio Movie Training Set Test Set

Data Set ► Data set drawn from Internet Movie Database ( ► Contains Movies, Actors, Directors, Producers, and Studios ► Selected Movies released between 1996 and 2001  1382 movies, objects, and links ► Used various features to predict opening weekend box office receipts

Calculating Test Bias ► Discretized movie receipts with a positive value indicating more than $2 million. (prob(+)=.55) ► Added random attributes to studios ► Created models with the random attributes. ► Bias = random model accuracy – default error of.55

Concentrated Linkage

Linkage 01 Studio Movie Studio Movie Studio Movie Studio Movie

Concentrated Linkage

Autocorrelation

Autocorrelation 01 Studio Movie Studio Movie

Autocorrelation

High Linkage causes Dependence Theorem: Given simple random partitioning of relational data set S with single linkage and C’=1: prob ind (A,B) -> 0 as L -> 1

Bias Results

Solution – Subgraph Sampling ► Assign movies randomly to subsets as before ► Commit movie to subset iff the corresponding studio has not been placed in another subset or does not have high autocorrelation and linkage; otherwise discard the movie.

Results

Conclusion ► Using subgraphing combined with Linkage and Autocorrelation increases the evaluation accuracy of relational learners.

Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning David Jensen and Jennifer Neville

Feature Selection Bias in Relational Learning ► High values of linkage (L) and autocorrelation (C’) can  Reduce the effective sample size.  Introduce additional variance, lead to feature selection bias.

Feature Selection ► Feature is a mapping between raw data and a low-level inference. ► Feature selection is a process of choosing among features (e.g. identifying the best feature, choosing features based on certain conditions).

Relational Feature Selection ► Relational features are used for predicting the value of an attribute on one type of objects based on attributes of related objects. ► Relational features increase predictive power of inference procedures. ► But they can cause bias in selection process and lead to incorrect estimation.

An Example: Bias in Relational Feature Selection

Effects of Linkage and Autocorrelation ► Linkage and autocorrelation cause relational feature selection bias in a two-step chain:  Reduce the effective sample size of a data set => increase the variance of scores estimated.  Increased variance of an object increases the probability that features from the objects will be selected as the best feature.

Decreased Effective Sample Size ► A special case: data sets exhibit single linkage plus C’ = 1 and L ≥ 0.  The variance of scores estimated from relational features depends on |Y| rather on |X|.  For example, if receipts has C’ = 1, then relational features formed from studio depend on the number of studios rather than the number of movies.  We do not gain additional information as |X| increases.

Effective Sample Size (cont.) ► For a wider array of values for C’ and L, Jensen and Neville use simulation.  Effective sample size drops monotonically as C’ and L increase.  Decreasing in effective sample size will increase the variance of the features. ► Features with higher variance => bias in favor of these features.

Effective Sample Size (cont.)

How Can Feature Selection Bias? ► Why do features with higher variance lead to a bias? ► Features are usually formed by a local search over possible parameters of the feature. ► This local search is usually done prior to feature selection, so only the best feature from each feature “family” is compared.

Feature Selection Bias ► Bias increases as the variance of the score distributions increase. ► Thus, the estimated score of features formed from objects with high C’ and L will be more biased. ► For example, the studios have the highest variance that allow them to exceed the scores of weakly useful features on other objects.

Effects of Linkage and Autocorrelation High Linkage and Autocorrelation Decreased Effective Sample Size Increase the variance of scores estimated Bias increases as variance increases

Estimating Score Variance ► Correcting for high variance is to obtain accurate estimates of variance for each feature. ► Approach: bootstrap resampling.

Bootstrap Resampling ► A technique for estimating characteristics of the sampling distribution of a given parameter:  Generate multiple samples (pseudosamples) by drawing, with replacement, from the original data.  Pseudosamples have the same size as the original training set.  Estimate the variance of a parameter by estimating the parameter on pseudosamples, and then finding the variance of the resulting distribution of scores.

Bootstrap Resampling (cont.) Original Training Set sample Var sample Var sample Var Variance of the original training set can be computed based on the parameters of the pseudosamples.

Using Resampled Estimates ► Resampling can be used to estimate the variance of scores for particular features. ► The use of resampled estimates remains an open problem. For example: In feature selection, how to compare variance estimates of different features. ► A Research topic!

Conclusion ► High linkage and autocorrelation can cause bias for relational learning algorithms. ► Research ideas:  How to use the variance estimates of various features to avoid feature selection bias.  Avoiding feature selection bias by considering additional information such as prior estimates of the true score.