Andreas Buja Werner Stuetzle *

Slides:



Advertisements
Similar presentations
Prepared by: David Crockett Math Department Lesson Factoring the Difference of Two Squares -- Probability Without Replacement.
Advertisements

Statistics and Research methods Wiskunde voor HMI Bijeenkomst 5.
Chapter 13 – Boot Strap Method. Boot Strapping It is a computer simulation to generate random numbers from a sample. In Excel, it can simulate 5000 different.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Measures of Dispersion and Standard Scores
Math notebook, pencil, and possibly calculator. Definitions  An outcome is the result of a single trial of an experiment.  The sample space of an experiment.
Statistics 1: Introduction to Probability and Statistics Section 3-3.
Ka-fu Wong © 2003 Chap 8- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Confidence intervals. Population mean Assumption: sample from normal distribution.
CSE 3504: Probabilistic Analysis of Computer Systems Topics covered: Probability axioms Combinatorial problems (Sec )
Visualization of Clusters with a Density-Based Similarity Measure Rebecca Nugent Department of Statistics, Carnegie Mellon University Joint with: Werner.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Jeremy Tantrum, Department of Statistics, University of Washington joint work with Alejandro Murua & Werner Stuetzle Insightful Corporation University.
Visualization of Clusters with a Density-Based Similarity Measure Rebecca Nugent Department of Statistics, Carnegie Mellon University June 9, 2007 Joint.
February 27, 2007Stanford1 Generalized Single Linkage Clustering Werner Stuetzle Rebecca Nugent Department of Statistics University of Washington.
August 6, 2006JSM Seattle1 Generalized Single Linkage Clustering Werner Stuetzle Rebecca Nugent Department of Statistics University of Washington.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Calculating Probabilities for Chance Experiments with Equally Likely Outcomes.
1 Managerial Finance Professor Andrew Hall Statistics In Finance Probability.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 6 Sampling and Sampling.
Simple Event Probability is the chance or likelihood that an event will happen. It is the ratio of the number of ways an event can occur to the number.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
Microeconometric Modeling William Greene Stern School of Business New York University.
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Interval Estimation.
CY3A2 System identification1 Maximum Likelihood Estimation: Maximum Likelihood is an ancient concept in estimation theory. Suppose that e is a discrete.
1 A Comparison of Information Management using Imprecise Probabilities and Precise Bayesian Updating of Reliability Estimates Jason Matthew Aughenbaugh,
Section 7.4 Use of Counting Techniques in Probability.
Probability Rules.  P and 44  P ,48,51  P ,57,60.
Probability of Simple Events
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson0-1 Supplement 2: Comparing the two estimators of population variance by simulations.
ESTIMATING RATIOS OF MEANS IN SURVEY SAMPLING Olivia Smith March 3, 2016.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Based on “An Introduction to the Bootstrap” (Efron and Tibshirani)
Least-squares, Maximum likelihood and Bayesian methods
Bagging and Random Forests
Effect Sizes (continued)
STATISTICS POINT ESTIMATION
Chapter 4. Inference about Process Quality
CH 5: Multivariate Methods
Sample space diagrams 1. A blue and a red dice are thrown and the scores added together. (a) Complete the sample space diagram. (b) What is the probability.
Maximum Likelihood Estimation
ECE 5424: Introduction to Machine Learning
t distribution Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then,
Statistical Methods For Engineers
Quantifying uncertainty using the bootstrap
Statistical Analysis Professor Lynne Stokes
Regularization in Statistics
Pattern Classification via Density Estimation
Probability: Test Tomorrow
What are the effects of "Bagging"
Statistical Assumptions for SLR
POINT ESTIMATOR OF PARAMETERS
Basic analysis Process the data validation editing coding data entry
Lecture 1: Introduction to Machine Learning Methods
Distribution of the Sample Proportion
Ensemble Methods for Machine Learning: The Ensemble Strikes Back
Random Sampling Spider Web
Generalized Single Linkage Clustering
Chi-squared tests Goodness of fit: Does the actual frequency distribution of some data agree with an assumption? Test of Independence: Are two characteristics.
Regularization in Statistics
Probability: Test Tomorrow
probability with cards
C.2.10 Sample Questions.
C.2.8 Sample Questions.
Probability of Dependent and Independent Events
Sample vs Population (true mean) (sample mean) (sample variance)
C.2.8 Sample Questions.
Presentation transcript:

Bias and Variance of Bagging based on Subsampling with & without Replacement Andreas Buja Werner Stuetzle * Statistics Department Statistics Department The Wharton School Adjunct Professor, CSE University of Pennsylvania University of Washington * Supported by NSF grant DMS-9803226. Research performed while on sabbatical at AT&T Labs – Research Research motivated by Friedman & Hall paper ``On Bagging and Nonlinear Estimation" (available on the Web) and a counter-example to one of F & H's claims due to Yoram Gatt. 11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

T(F) T(Fn) F Fn ave = Tbag (Fn) 11/21/2018 Resamples Space of probability measures F T(F) Fn T(Fn) Resamples ave = Tbag (Fn) 11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

With repl.: g = n/m W/o repl.: g = n/m - 1 Equivalence: n/mwi = n/mw/o-1 11/21/2018

11/21/2018

11/21/2018

11/21/2018

11/21/2018

Squared plug-in bias, scenario 2 , n = 800 , black ~ wi, red ~ wo 0.2 0.4 0.6 0.8 1.0 0.000 0.002 0.004 0.006 0.008 alpha for sampling wo rep., alpha / (1-alpha) for sampling wi rep. squared plug-in bias Squared plug-in bias, scenario 2 , n = 800 , black ~ wi, red ~ wo 11/21/2018

Squared estimation bias, scenario 2 , n = 800 , black ~ wi, red ~ wo 0.2 0.4 0.6 0.8 1.0 0.000 0.002 0.004 0.006 0.008 alpha for sampling wo rep., alpha / (1-alpha) for sampling wi rep. squared estimation bias Squared estimation bias, scenario 2 , n = 800 , black ~ wi, red ~ wo 11/21/2018

11/21/2018

Squared plug-in bias, scenario 3 , n = 800 , black ~ wi, red ~ wo 0.2 0.4 0.6 0.8 1.0 0.000 0.005 0.010 0.015 alpha for sampling wo rep., alpha / (1-alpha) for sampling wi rep. squared plug-in bias Squared plug-in bias, scenario 3 , n = 800 , black ~ wi, red ~ wo 11/21/2018

Squared estimation bias, scenario 3 , n = 800 , black ~ wi, red ~ wo 0.2 0.4 0.6 0.8 1.0 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 alpha for sampling wo rep., alpha / (1-alpha) for sampling wi rep. squared estimation bias Squared estimation bias, scenario 3 , n = 800 , black ~ wi, red ~ wo 11/21/2018

11/21/2018

11/21/2018