Download presentation
Presentation is loading. Please wait.
1
Welcome!
2
Chapter 1 Data Analysis
3
Statistics – a rigorous system to find answers to problems and evidence to back it up. The science and art of collecting, analyzing, and drawing conclusions from data. 4 Major Parts Collecting Data – Define a good question, do Comparative Experiment or Sampling from a population Descriptive Analysis – Describe your observations using statistical values (mean, median, st.dev. etc.) tables, charts, and graphs – look for trends or relationships Probability – Tells us how confident we can be in our results. Inference – Making decisions or predictions based on the data and probability
4
Ask yourself the following questions before you begin collecting data.
Collecting Data: A refined problem statement and good design for data collection are critical. Ask yourself the following questions before you begin collecting data. What am I trying to show (prove)? How many units or subjects shall I collect data from? How shall I select these units or individuals to be studied?
5
Describing Data individual - is an object described in a set of data. Individuals can be people, animals, or things. Variable - an attribute/characteristic that can take different values for different individuals Categorical Variable assigns labels that place each individual into a particular group, called a category. Quantitative Variable takes number values that are quantities—counts or measurements.
6
Describing Data What are the individuals in this set of data?
What are the variables (attributes) of the individuals? Which variables are Categorical? Quantitative?
7
Describing Data What are the individuals in this set of data?
What are the variables (attributes) of the individuals? Which variables are Categorical? Quantitative?
8
Variable Types Categorical Variables (also called Qualitative) –
as said previously - assign labels that place each individual into a particular group, called a category. These variables can be “numerical” if the numbers are describing a label like, zip codes or area codes, dates etc.
9
Variable Type Quantitative (or numerical) Variables – as stated previously - measure quantities: (eg. heights, distances, counts etc.) these take on numerical values for which it makes sense to take an average. They can be either : -Continuous – numbers that can take on any value within some range (no gaps) -Discrete - numbers taking only specific values (usually whole numbers) for example: The number of homeruns hit in a season – you can’t have a ½ of a homerun. (has gaps) For more examples of Qualitative and Quantitative Variables follow this link
10
Describing Data 4. Which variables are discrete and which are continuous?
11
Describing Data Which variables are discrete and which are continuous?
12
Variable Types Quantitative Variables can be either
-ordinal – if an ordering exists eg. number of cars sold per year (years are categories), socio-economic status (low, middle, high class) by income etc. -nominal – if there is no natural order between the categories
13
Variable Types Bivariate data – two variables, Explanatory
(Independent) and Response (Dependent) Shows a correlation or cause and effect between two variables. Ex. Distance vs. time or temperature vs. pressure etc. Scatter plots and line graphs are usually used with bivariate data. Univariate data – one variable (one attribute or characteristic of an individual) Distribution graphs are used with univariate data.
14
Analyzing Data For a categorical variable what type of graph would be appropriate? Categorical variable “Preferred commication method\
15
Analyzing Data Categorical variable “Preferred communication method”
Bar graph of “Preferred communication method”
16
Displaying Categorical Data
To display the distribution of categorical data, make a bar graph Count Proportion Percent Use the Frequency when comparing data sets that have close to an equal number of individuals in each category
17
Displaying Categorical Data
To display the distribution of categorical data, make a bar graph Proportion Percent Use Relative Frequency when comparing data sets that do not have close to an equal number of individuals
18
Displaying Categorical Data
To display the distribution of categorical data, make a bar graph or a pie chart.
19
Analyzing Data What type of graph would be appropriate for a quantitative variable? Quantitative variable “Number of languages spoken”
20
Analyzing Data Quantitative variable “Number of languages spoken”
Dot plot of “Number of languages spoken”
21
Types of distribution graphs: Used with Continuous Data
Histograms – used with larger sets of data >20 Dot plots - used with smaller sets of data <20 Stem and Leaf plots – smaller sets of data <20 Box & whisker plots- larger sets of data >20 Used with Discrete Data Pie Graphs Bar Graphs
22
Distribution graphs- display the distribution of Univariate data, meaning there is one characteristic of the individuals. These graphs tell us what values the variable takes and how often it takes these values (frequency the data falls into each interval) (Ex. Liters machine fills each pop bottle in 3 seconds)
23
Describing Distributions –Descriptive Statistics
Measures of Shape, Center and Spread Shape – Symmetric, Skewed, Bimodal, Multimodal, Bell, Gaps etc. Center – Mean, Median, Mode Mean – arithmetic average of all data x̄ = 1/n ∑xi non resistant measure of center – outlier will pull mean towards itself -Median- middle number in an ordered set of numbers resistant measure of center not sensitive to outliers **If mean = median then distribution is symmetric -Mode- most common data point
24
Comparing the Mean and the Median
25
Measures of Spread and Variability
Range = Max – Min Inner Quartile Range, IQR, Q3-Q1 middle 50% of data. (used with any dist.) Variance – a measure of how data varies Standard Deviation – the average deviation or distance data lie above or below the mean (used with symmetric dist.)
26
Outlier – an observation or point that lies outside the overall pattern of the data
Outlier Analysis for either Skewed or Symmetric Distributions use 1.5(IQR) added to the Q3 and subtracted from Q1. For only Symmetric Distribution use 3 x standard deviation above and below the mean.
27
How to Analyze Data Examine each variable by itself.
Then study relationships among the variables.
28
How to Analyze Data Examine each variable by itself.
Then study relationships among the variables. Start with graphs
29
How to Analyze Data Examine each variable by itself.
Then study relationships among the variables. Start with graphs Add numerical summaries
30
Lets do # 59 page 49 Assign: #s 1.4.9,10, 11, 13, 21
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.