ChiMerge Discretization

Slides:



Advertisements
Similar presentations
Statistics and Research methods Wiskunde voor HMI Bijeenkomst 5.
Advertisements

Estimating a Population Variance
Exponential Distribution. = mean interval between consequent events = rate = mean number of counts in the unit interval > 0 X = distance between events.
©Dawn Iacobucci, p.1 FYI: One-Way Chi-Square Eqn:df=I-1 Why/When? e.g., compare sample to population characteristics, or to previous study’s benchmark.
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
ChiMerge Technique Example
1 The Chi squared table provides critical value for various right hand tail probabilities ‘A’. The form of the probabilities that appear in the Chi-Squared.
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
More Discrete Probability Distributions
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Population Proportion The fraction of values in a population which have a specific attribute p = Population proportion X = Number of items having the attribute.
1 (Student’s) T Distribution. 2 Z vs. T Many applications involve making conclusions about an unknown mean . Because a second unknown, , is present,
© 2002 Thomson / South-Western Slide 8-1 Chapter 8 Estimation with Single Samples.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 11 Inferences About Population Variances n Inference about a Population Variance n.
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Advanced Higher Statistics Data Analysis and Modelling Hypothesis Testing Statistical Inference AH.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 7 - Sampling Distribution of Means.
SampleFK ChiMerge Discretization Statistical approach to Data Discretization Applies the Chi.
§ 5.3 Normal Distributions: Finding Values. Probability and Normal Distributions If a random variable, x, is normally distributed, you can find the probability.
Elementary Statistics Discrete Probability Distributions.
4.3 More Discrete Probability Distributions NOTES Coach Bridges.
Data Mining Spring 2007 Noisy data Data Discretization using Entropy based and ChiMerge.
Section 6.4 Inferences for Variances. Chi-square probability densities.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
Discrete Probability Distributions Chapter 4. § 4.3 More Discrete Probability Distributions.
Chi square and Hardy-Weinberg
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Statistics Unit Check your understanding…. Can you answer these? What does the standard deviation of a sample represent? What is the difference between.
Chi Square Chi square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such.
Chapter 7 Estimation. Chapter 7 ESTIMATION What if it is impossible or impractical to use a large sample? Apply the Student ’ s t distribution.
Cross Tabulation with Chi Square
Statistical Inferences for Population Variances
Objectives (BPS chapter 23)
Review 1. Describing variables.
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Ch8.4 P-Values The P-value is the smallest level of significance at which H0 would be rejected when a specified test procedure is used on a given data.
Variance and Standard Deviation Confidence Intervals
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Merging Merge. Keep track of smallest element in each sorted half.
Confidence Intervals and Hypothesis Tests for Variances for One Sample
AP Statistics: Chapter 7
Section 6-4 – Confidence Intervals for the Population Variance and Standard Deviation Estimating Population Parameters.
SA3202 Statistical Methods for Social Sciences
Since everything is a reflection of our minds,
Goodness-of-Fit Tests
Bivariate Testing (Chi Square)
Part IV Significantly Different Using Inferential Statistics
Construction Engineering 221
POINT ESTIMATOR OF PARAMETERS
Statistical Process Control
Chi2 (A.K.A X2).
Data Analysis Module: Chi Square
A G L O R H I M S T A Merging Merge.
Section 6-4 – Confidence Intervals for the Population Variance and Standard Deviation Estimating Population Parameters.
Chapter 7 Lecture 3 Section: 7.5.
Introduction to Probability Distributions
A G L O R H I M S T A Merging Merge.
A G L O R H I M S T A Merging Merge.
Introduction to Probability Distributions
Chapter 6 Confidence Intervals.
Statistical Inference for the Mean: t-test
Quadrat sampling & the Chi-squared test
Quadrat sampling & the Chi-squared test
Sampling Distributions Key to Statistical Inference
Chi Square Test of Homogeneity
Testing Claims about a Population Standard Deviation
Chi - square.
Presentation transcript:

ChiMerge Discretization Sample F K 1 2 3 7 4 8 5 9 6 11 23 37 39 10 45 46 12 59 Statistical approach to Data Discretization Applies the Chi Square method to determine the probability of similarity of data between two intervals.

ChiMerge Discretization Example Intervals Sample F K 1 {0,2} {2,5} 2 3 {5,7.5} Sort and order the attributes that you want to group (in this example attribute F). 3 7 1 4 8 1 {7.5,8.5} 5 9 1 {8.5,10} 6 11 2 {10,17} Start with having every unique value in the attribute be in its own interval. 7 23 2 {17,30} 8 37 1 {30,38} 9 39 2 {38,42} {42,45.5} 10 45 1 11 46 1 {45.5,52} {52,60} 12 59 1

ChiMerge Discretization Example Sample F K 1 Begin calculating the Chi Square test on every interval 2 3 3 7 1 4 8 1 Sample K=1 K=2 2 1 3 total 5 9 1 6 11 2 7 23 2 8 37 1 Sample K=1 K=2 3 1 4 total 2 9 39 2 10 45 1 11 46 1 12 59 1

ChiMerge Discretization Example Sample K=1 K=2 2 1 3 total E11 = (1/2)*1 = .05 E12 = (1/2)*1 = .05 E21 = (1/2)*1 = .05 E22 = (1/2)*1 = .05 X2 = (0-.5)2/.5 + (0-.5)2/.5 + (0-.5)2/.5 + (0-.5)2/.5 = 2 Sample K=1 K=2 3 1 4 total 2 E11 = (1/2)*2 = 1 E12 = (0/2)*2 = 0 E21 = (1/2)*2 = 1 E22 = (0/2)*2 = 0 X2 = (1-1)2/1+(0-0)2/0+ (1-1)2/1+(0-0)2/0 = 0 Threshold .1 with df=1 from Chi square distribution chart merge if X2 < 2.7024

ChiMerge Discretization Example Intervals Chi2 Sample F K 1 {0,2} {2,5} {5,7.5} {7.5,8.5} {8.5,10} {10,17} {17,30} {30,38} {38,42} {42,45.5} {45.5,52} {52,60} 2 Calculate all the Chi Square value for all intervals 2 3 2 1 7 3 8 4 9 5 2 Merge the intervals with the smallest Chi values 2 11 6 23 7 2 8 37 1 2 9 39 2 2 1 45 10 46 11 59 12

ChiMerge Discretization Example Intervals Chi2 Sample F K 1 {0,2} 2 {2,5} 2 3 4 3 7 1 4 8 1 {5,10} 5 9 1 5 Repeat 6 11 2 {10,30} 7 23 2 3 8 37 1 {30,38} 2 9 39 2 {38,42} 10 45 1 4 {42,60} 11 46 1 12 59 1

ChiMerge Discretization Example Intervals Chi2 Sample F K 1 {0,5} 2 3 1.875 3 7 1 4 8 1 {5,10} 5 9 1 5 Again 6 11 2 {10,30} 7 23 2 1.33 8 37 1 {30,42} 9 39 2 1.875 10 45 1 {42,60} 11 46 1 12 59 1

ChiMerge Discretization Example Sample F K Intervals Chi2 1 2 3 7 8 4 9 5 {0,5} 1.875 {5,10} 3.93 6 11 2 Until 7 23 2 {10,30} 8 37 1 9 39 2 3.93 10 45 1 {42,60} 11 46 1 12 59 1

ChiMerge Discretization Example Intervals Chi2 Sample F K 1 2 3 {0,10} 3 7 1 There are no more intervals that can satisfy the Chi Square test. 4 8 1 2.72 5 9 1 6 11 2 7 23 2 {10,30} 8 37 1 9 39 2 3.93 10 45 1 {42,60} 11 46 1 12 59 1