Presentation is loading. Please wait.

Presentation is loading. Please wait.

How can you best represent statistical information and draw conclusions from it?

Similar presentations


Presentation on theme: "How can you best represent statistical information and draw conclusions from it?"— Presentation transcript:

1 How can you best represent statistical information and draw conclusions from it?

2 What is statistics? Statistics is the branch of mathematics that is concerned with the collection, organization, display and interpretation of data.

3

4 S.1 Organizing Data How can data be shown on a table or in a graph and how can you read such data? What is categorical data? When should you use a pie chart and how are they made? How do you organize a frequency distribution?

5 Data types: categorical and numeric Categorical—any non numeric data Use frequency distributions Bar charts Pie charts Numeric—anything that can be measured and list by number Dotplots Stem and leaf Frequency distributions histograms

6 Does this data mean anything to you and can you answer questions about it in its current form? Example Leisure time activities WTAWGTWW CWTWATTW GWWCAWAW WWTWWT W=walkingT=weight trainingC=cycling G= gardeningA=aerobics

7 Displaying Catagoric Data How can you display and interpret catagoric data? catagoric—anything that can’t be measured and listed by number Frequency distributions Bar Charts Pie Charts

8 Frequency Distribution Displays all categories and a tally for each Relative frequency—the percentage as a decimal of time this category appears in the data CategoryTallyFrequencyRelative Frequency Walking Weight training Cycling Gardening Aerobics Leisure time activities W T A W G T W W C W T W A T T W G W W C A W A W W W T W WT / / / / / / / / / / / / / / / / / / ---- / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / 15 7 2 2 4 Total = 30.5 2 2 2 2 Total = 1

9 Bar Chart Graphs the frequency of categorical data Bars DO NOT touch Categories are on the x-axis Frequencies are on the y-axis Walking Wt Training Cycling Gardening Aerobic

10 Pie Charts (circle graphs) Used when there are not too many categories Rule of thumb 8 or fewer Each “slice” is determined by the relative frequency Degrees in slice = rel freq x 360

11 Homework Worksheet 1

12 S-2 Displaying Numeric Data EQ: How do you construct and read stem and leaf plots, dotplots, frequency distributions and histograms? Numeric—anything that can be measured and list by number Dotplots Stem and leaf Frequency distributions histograms

13 Dotplots Simple way to represent small amounts of data Each piece of data has its own dot Dots stack vertically above the position on the x-axis Depending on the data set, you may lose the exact value for each piece 512 615 524 632 645 575 592 716 618 521 682 675 549 523 651 5 6 7

14 Stem Plot Works for a small to moderate set of data Stems go in a vertical column Stems may be split low and high (0-4 and 5-9) Comparative or double stemplot—shows multiple data sets 51 61 52 63 64 57 59 71 61 52 68 67 54 52 65 567567 51 61 52 73 54 57 59 71 61 52 68 67 74 52 65 1 2 2 2 4 7 9 1 1 3 4 5 7 8 1 1 2 2 2 4 7 9 1 1 5 7 8 1 3 4

15 Histograms A bar chart for numeric data Center the rectangle over the indicated value on the x-axis—the bars touch Can be drawn off of the frequency or the relative frequency distribution # of partners in local law firmsfrequency relative frequency 120.1 230.15 360.3 46 530.15 Totals201

16 Shapes of Histograms Unimodal—has one peak Bimodal—has two peaks Multimodal—has more than two peaks

17 Types of Unimodal Curves Symmetric Normal or Bell Shaped Heavy tailed-- Having long tails Larger standard dev. Light Tailed-- Having short tails Smaller Standard dev.

18 Skewed Curves Lower (left) tailUpper (right) tail When there is an outlier to the right, the curve is skewed right When there is an outlier to the left, the curve is skewed left Skewness is judged by the tail not where the majority of the data lies.

19 Frequency Distributions Continuous and Discrete Data Discrete Data Individual data points The range is always from the set of integers or whole numbers Continuous Data Data that may include decimals

20 Frequency Distributions There are no natural breaks for continuous data We create our own Ex. The fuel efficiency of a particular car ranges from 25.3 to 29.8 mpg we decide to use an interval of.5 Note: Always start at an even increment lower than the lowest piece of data and go to an even increment higher than the highest piece of data Interval # Interval LowHigh 125.025.5 2 26.0 3 26.5 4 27.0 5 27.5 6 28.0 7 28.5 8 29.0 9 29.5 1029.530.0 In which interval would you place 27.5 mpg?

21 Homework Numeric Data Worksheet 2

22 Density Graphs When data is unevenly distributed You may want to use unequal groups or intervals This may only be done if you graph the density interval namelowhighfrequency relative frequencydensity 111020.090910.00826 2102030.136360.00649 3203040.181820.00587 4304030.136360.00333 5405060.272730.00535 65010010.045450.00045 710020020.090910.00045 8200100010.045450.00005 total22

23

24 S-3 Describing the Center of a Data Set EQ: What are the measures of central tendency and how can they be determined?

25 Center and Spread Two of the most critical descriptors of a data set Graphical methods such as those in the last chapter give a general impression of both Numerical methods give precise value that can be compared in detail

26 The three M’s Mean Median Mode Also known as the average Also called the middle Most Frequent

27 The Mean formula for the sample mean x= each piece of data x i = i indicates the position of the data from within the original data set n= number of pieces of data in the data set ∑ = Greek letter Sigma means to add what follows Always use more accuracy (more decimals) than any one piece of data has. µ is used for the population mean Greek letters are always used for population values

28 The Median The middle value in a list of ordered values Median has no symbol but is often abbreviated Med If n is odd then the median is the exact middle number If n is even then the median is the mean of the two middle numbers

29 Comparison and Contrast of the Mean and Median Median divides the data into two equal parts 50% of the data is on either side of the median Mean is where the fulcrum would cause the “data scale” to balance if the values had weight It is very sensitive to outliers

30 Balancing the “data scale” Normal/Bell curve mean median Skewed Left Skewed Right

31 Trimmed Mean Makes the mean less susceptible to outliers Order the data Remove the same number of pieces of data from each end Recalculate the mean % x n = number of pieces to be removed from EACH end A small to moderate trim is 5% to 25%

32 Trimmed Mean Example: Find the 15% Trimmed mean of: 3, 6, 8, 2, 9, 10, 7, 15, 4, 12, 20, 36, 15, 5, 3, 7, 10, 16, 17, 12 Order the numbers: 2, 3, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 12, 12, 15, 15, 16, 17, 20, 36, 20 items.15 = 3 4, 5, 6, 7, 7, 8, 9, 10, 10, 12, 12, 15, 15, 16 =

33 Weighted Mean is similar to an arithmetic mean (the most common type of average), where instead of each of the data points contributing equally to the final average, some data points contribute more than others.

34 Weighted Mean # of studentsClass average 1 st period2075 2 nd period3579

35 Homework worksheet 3

36

37 S-4 Spread What are the quartiles, percentiles, and box plots?

38 Range High - Low

39 IQR IQR = upper quartile (Q3) – lower quartile (Q1) Lower quartile—the median of the lower half Upper quartile—the median of the upper half IF n is odd, the exact median is excluded from the quartiles Used because it is resistant to outliers There is no special name for the population IQR Interquartile Range

40 Boxplot Can be used for many types of summarizations Iqr = Q3 – Q1 Outlier = data more than 1.5iqr from the end of the box Extreme=data more than 3iqr from the end of the box 25%

41 Outlier (closed circle) Extreme Outlier (open circle) Modified Boxplot

42 Percentages and percentiles: Percentage: “ the score “ * 100 total possible points Percentile:“The position of the score w/in an ordered list”*100 the total number of items EX: 10 students took a 90 point test 60, 65, 68, 74, 75, 80, 81, 81, 84, 90 (note: an ordered list) 1 2 3 4 5 6 7 8 9 10 What is the percent and the percentile for a score of 81? Percent: 81/90 *100=90% Percentile: 7/10*100= 70 ieth percentile

43 10 2 5 7 20 1 6 30 5 8 9 9 40 2 3 5 7 8 50 2 60 3 6 the median the first quartile the third quartile the interquartile range the mode the percentile for.271 the value closest to the 60 th percentile EXAMPLE: Given a stem and leaf plot FIND:

44 Homework worksheet 4

45 S-5 Measures of Variability How do the measures of variability help us to better understand what our data set might look like?

46 S-5 Measures of Variability Range = high – low Deviation from the mean= x i – if positive then x i is larger than the mean if negative then x i is smaller than the mean Mean deviation is the average of the deviations Sample Variance

47 Sample Standard Deviation “average distance” the items fall from the mean A small s or s 2 indicates low variability A high s or s 2 indicates large variability

48 Population Variance (knowing all the data) Population Standard Deviation compute to the same accuracy as the population

49 Uses of the IQR Standard deviation can be approximated by SD = IQR/1.35 If SD > IQR/1.35 it suggests heavier or longer tails than the normal curve

50 Example 20, 15, 12, 18, 17, 15, 17, 16, 18, 25 Reorder 12, 15, 15, 16, 17, 17, 18, 18 20, 25 range = iqr = sd = Median= 17 Q1= 15Q3= 18

51 continued Find the mean deviation and the standard deviation By hand ixixi X i -(x i - ) 2 112 215 3 416 517 6 718 8 920 1025 totals

52 Given 12, 15, 15, 16, 17, 17, 18, 18, 20, 25 Find the SD By iqr By calculator

53 Homework worksheet 5

54 S-6 Translation and Scale What is the difference in the impact of translation and scale change on data? In class project:

55

56 Hints for review #1 How many intervals should be used for a set of data? The book recommends

57 Homework

58 TEST 1

59

60 S-7 Data Collection How do you know which method of data collection is most appropriate?

61 Random Samples What methods of data collection constitute collecting a random sample?

62 Sampling Since time and money usually do not permit a scientist to collect the opinion or measure the effect on every person in the population, they take samples which should include all groups so they can make accurate statements about the entire population

63 Simple Random Sample Each object in the population has an equal chance of being selected for the sample Each object in the sample is chosen independently of any other object in the sample Independent—choosing one has no bearing on the choice of the next object Independent example All names are placed in a hat and 10 are chosen Dependent example Two names are drawn and they each ask 4 people to participate with them

64 Bias When one group is over-represented in sample Causes: Basis of selection Who responds Who asks the questions or how they are asked

65 Stratified Sample The population is divided into groups and a specified number are chosen from each group

66 River Project

67

68 The Normal Distribution How does normally distributed data begin to relate statistics to probability?

69 The Normal Distribution When most of the data falls close to the average and only a few pieces of data fall at a distance from the mean. This configuration is often called a bell shaped or normal curve. Research has found that when data is normally distributed: 68% of the data lies within one standard deviation of the mean 95% of the data lies within two standard deviations (13.5% lies in the one to two SD range) 99.7% of the data lies within three standard deviations (2.35% lies in the two to three SD range).15% of the data lies beyond each of the three standard deviation range

70 Normal curves are symmetric to the mean some are narrow and some are wide—this is determined by the value of one standard deviation. The area under a normal curve represents all the data—100% or 1. The area under any section represents the percentage and therefore probability that a given piece of data will fall to the left of this region of the curve.

71 Normal distributions have a direct link to Probability through something called z-scores. The z-score tells exactly how many full and partial standard deviations a particular piece of data falls from the mean. A negative number means the data is to the left of the mean, a positive number tell you the data is to the right of the mean. the formula for z-scores is The attached table gives the probability that a given value has a z-score less than a given value. (falls to the left of a particular spot on the normal curve)

72 Return to problem a Return to problem b and c

73 Examples: Find the z-score for each of the following: a) 45 when = 50 and = 4 Return to z-chart

74 b) 56 when = 60 and = 10 c) between 20 and 60 = 50 and = 10 Return to z-chart

75 Homework z-scores


Download ppt "How can you best represent statistical information and draw conclusions from it?"

Similar presentations


Ads by Google