Objectives (IPS chapter 1.1) Displaying distributions with graphs Labels/Variables Two types of variables Ways to chart categorical data Bar graphs Pie charts Ways to chart quantitative data Line graphs: time plots Scales matter Histograms Stemplots Stemplots versus histograms Interpreting histograms
Variables In a study, we collect information—data—from individuals. Individuals can be people, animals, plants, or any object of interest. A variable is a characteristic that varies among individuals in a population or in a sample (a subset of a population). Example: age, height, blood pressure, ethnicity, leaf length, first language The distribution of a variable tells us what values the variable takes and how often it takes these values.
Two types of variables Variables can be either quantitative (or numerical)… Something that can be counted or measured for each individual and then added, subtracted, averaged, etc. across individuals in the population. Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own … or categorical. Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category. Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not
Cases, Labels, Variables, and Values Cases: are the objects described by a set of data. Cases may be customers, companies, subjects in a study, or other subjects. A label: a special variable used in some data sets to distinguish different cases. A variable is a characteristic of a case. Different cases can have different values for the variables. Example:(1) The cases are the individual students; (2) The first three (Student identification number, last name, first name) are labels. (3) Gender is a categorical variable; (4) Test 1 to Final are numerical variables.
Eg: How do you know if a variable is categorical or quantitative? Ask: What are the n individuals/units in the sample (of size “n”)? What is being recorded about those n individuals/units? Is that a number ( quantitative) or a statement ( categorical)? Categorical Each individual is assigned to one of several categories. Quantitative Each individual is attributed a numerical value. Label Each individual is assigned to one label. Individuals in sample DIAGNOSIS AGE AT DEATH Patient A Heart disease 56 Patient B Stroke 70 Patient C 75 Patient D Lung cancer 60 Patient E 80 Patient F Accident 73 Patient G Diabetes 69
Summary of variables Definition: quantitative data are discrete if the possible values are isolated points on the number line. quantitative data are continuous if the set of possible values forms an entire interval on the number line. Question: Are peoples’ heights continuous? What about ages? What about family size? (1) For categorical data: i) Frequency / Relative Frequency distribution; ii) Bar graph iii) Pie charts. (2) For numerical data: i) Frequency distribution; ii) Stemplot (stem-and-leaf plot); iii) Histogram. (3) For bivariate numerical data: Scatter plot