Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.

Similar presentations


Presentation on theme: "1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious."— Presentation transcript:

1 1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.

2 2 Descriptive Statistics types of data tables and graphs numerical summary

3 3 What are Data? Any set of data contains information about some group of individuals. The information is organized in variables. Individuals are the objectives described by a set of data. Could be animals, people, or things. A variable is any characteristic of an individual. A variable can take different values for different individuals.

4 4 Population/Sample/Raw Data A population is a collection of all individuals about which information is desired. A sample is a subset of a population. Raw data: information collected but not been processed.

5 5 Example: Serum Cholesterol Levels The data set includes data about gender, age and serum cholesterol levels of sampled U.S. adults Who? What individuals do the data describe? Population/sample/raw data of study? What? How many variables do the data describe? What are they?

6 6 Types of Variables A qualitative variable places an individual into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense. Q. Which variable is categorical ? Quantitative?

7 7 A variable Categorical/ Qualitative Nominal variable Ordinal variable Numerical/ Quantitative Discrete variable Continuous variable Q: Does “average” make sense? Yes No Yes Q: Is there any natural ordering among categories?Q: Can all possible values be listed down?

8 8 Two Basic Strategies to Explore Data Begin by examining each variable by itself. Then move on to study the relationship among the variables. Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.

9 9 Summarizing Data Goal: to study or estimate the distributions of variables The distribution of a variable tells us what values/categories it takes and how often it takes those values/categories. Displaying distributions of data (sample) with graphs Describing distributions of data (sample) with numbers

10 10 Numerical Summaries for Categorical Variables Frequency (counts) Relative frequency (percentage) Frequency and relative frequency tables Eg. Table 2.7

11 11 Graphs for Categorical Variables Pie charts ** good for one variable; show Table 2.7 for ages 25-34 Bar charts ** good for one or two variables; show Table 2.7

12 12 Graphs for Quantitative Variables One-way scatter plots (dotplots) ** Good for small to medium datasets ** E.g. GDP% spent on health care Histograms ** Good for medium to large datasets ** E.g. GDP% spent on health care Boxplots ** Good for medium to large datasets ** E.g. GDP% spent on health care

13 13

14 14 How to Make a Histogram 1. Break the range of values of a variable into equal-width intervals. 2. Count the # of individuals in each interval. These counts are called frequencies and the corresponding %’s are called relative frequencies. 3. Draw the histogram: the variable on the horizontal axis and the count (or %) on the vertical axis.

15 15 What do We See from the Graphs? Important features we should look for: Overall pattern – Shape – Center tendency(the location data tend to cluster to) – Dispersion (the spread level of data) Outliers, the values that fall far outside the overall pattern (for quantitative variables only)

16 16 Overall Pattern—Shape How many peaks, called modes? A distribution with one major peak is called unimodal. Symmetric or skewed? – Symmetric if the large values are mirror images of small values – Skewed to the right if the right tail (large values) is much longer than the left tail (small values) – Skewed to the left if the left tail (small values) is much longer than the right tail (large values)

17 17 Numerical Summaries for Quantitative Variables To measure center: Mean and Median To measure dispersion: Interquartile Range (IQR) and Standard Deviation (SD) Five-number summaries and boxplots Outliers Read the handout for numerical measures.

18 About Boxplots The left (lower) edge of the box is Q 1, the right (upper) edge of the box is Q 3, and the line inside the box is the median Q 2. The whiskers extend out from opposite sides of the box to the minimum and maximum values*. The box represents the middle 50% of the data. The boxplot can show if the data are skewed or symmetric. It can also show the variability or dispersion in the data values. Boxplots are good for comparing different groups with respect to a given quantitative variable. *:When there exist outliers, they are first identified according to certain specific rules and they are graphed with special symbols like *. Then the whiskers don’t extend to the outliers but to the minimum and maximum values of the measurements that are not outliers. 18

19 19

20 20

21 21 Graphs for Two Quantitative Variables (Two-way) Scatter plots ** to see the relationship between two quantitative variables ** eg. Figure 2.10


Download ppt "1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious."

Similar presentations


Ads by Google