Download presentation
Presentation is loading. Please wait.
1
Statistics Review (It’s not so scary)
2
What are those things! Statistics – range of techniques and procedures for analyzing data, interpreting data, displaying data, and making decisions (Lane, 2006)
3
Two Headed Monster Statistics can be categorized as . . .
Descriptive – describe a set of data Inferential – use to draw conclusions about a set of data Inferential Statistics can be used to . . . Estimate a value (parameter) based on data Test a Hypothesis
4
I can’t describe it! Descriptive Statistics (Parameters)
Common parameters . . . Mean (μ) Median Mode Variance (σ2) Standard Deviation (σ)
5
Secret Formulas Mean μ = Σ (x) / n
Variance σ2 = Σ (x - μ)2 / n – 1 (sample variance) Standard Deviation σ = √ σ2
6
Do Exercise #1
7
It’s very close! Inferential Statistics Estimating involves . . .
Determining the probability that a value will occur Determining a value, given a probability Estimation always involves a Confidence Interval Confidence Interval = Confidence that result is legitimate and not due to random chance 95% Confidence is generally accepted level for ruling out random chance (which is 5% Error Rate)
8
For whom does the bell toll?
Inferential Statistics often uses a Normal Distribution A Normal Distribution is also called a Normal Curve or “Bell Curve” You can fit values to a Normal Distribution by converting them to “z scores” * Normal Distributions exist in data because of Central Tendency Central Tendency is the tendency of data values to “cluster” around the Mean The Probability that something will occur can be determined by looking at the associated area under a Normal Curve * - The XLMiner Data Mining software does this with a checkbox (called “Normalizing” or “Standardizing”)
9
I’m Not Normal! You can use a Normal Distribution/Curve to . . .
Find the Probability* that a Value will occur How to: Convert Value to Z Score (Standard Score) using z = (x – μ) / σ, then look up Probability in the z-score table Find an estimated Value when given a Probability* How to: Look up Probability in z-score table, then multiply resulting z * σ, then subtract (or add) result to μ * - Probabilities are represented by the area under a Normal Curve
10
Other Dismembered Parts
Variables – can be qualitative or quantitative * Qualitative values are called Categorical Variables (e.g., single, married, divorced) Quantitative values can be . . . Ordinal – higher numbers mean higher values (e.g., 1 to 5 Satisfaction Scale) Interval – same as ordinal but distances between values are equal (e.g., Age, Income, Credit Score) Ratio – same as interval but has “true” zero (rarely used) (e.g., Temperature in degrees Kelvin) * - It is common to convert data types when creating Data Mining models
11
Strange Experiments Variables are also grouped by . . .
Independent – value that is manipulated or changes by record (x) Called Input Variable(s) or Predictor(s) in Data Mining Dependent – variable whose value “depends” on the manipulation of the Independent Variable (y) Also called Output Variable or Result in Data Mining Input and Output variables are used for experiments, hypothesis testing, and creating Data Mining Models
12
Zombie Population Population – entire set of objects, observations, or scores that have something in common (e.g., all males under 18 years of age) (n) Sample – subset of a population (n - 1) Testing an entire population is usually impractical, so a sample is used to infer (i.e., draw conclusions) about a population
13
Do Exercise #2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.