November 15
In Chapter 1: 1.1 What is Biostatistics? 1.2 Organization of Data? 1.3 Types of Measurements 1.4 Data Quality
Biostatistics Statistics is not merely a compilation of computational techniques Statistics –is a way of learning from data –is concerned with all elements of study design, data collection and analysis of numerical data –does require judgment Biostatistics is statistics applied to biological and health problems
Biostatisticians are: Data detectives –who uncover patterns and clues –This involves exploratory data analysis (EDA) and descriptive statistics Data judges –who judge and confirm clues –This involves statistical inference
Measurement PMeasurement (defined): the assigning of numbers and codes according to prior-set rules (Stevens, 1946). PThere are three broad types of measurements: PCategorical POrdinal PQuantitative
Measurement Scales PCategorical - classify observations into named categories, e.g., HIV status classified as “positive” or “negative” POrdinal - categories that can be put in rank order Pe.g., Stage of cancer classified as stage I, stage II, stage III, stage IV PQuantitative – true numerical values that can be put on a number line Pe.g., age (years) Pe.g., Serum cholesterol (mg/dL)
Illustrative Example: Weight Change and Heart Disease This study sought to to determine the effect of weight change on coronary heart disease risk. It studied 115,818 women 30- to 55-years of age, free of CHD over 14 years. Measurements included Body mass index (BMI) at study entry BMI at age 18 CHD case onset (yes or no) Source: Willett et al., 1995
Illustrative Example (cont.) Smoker (current, former, no) CHD onset (yes or no) Family history of CHD (yes or no) Non-smoker, light-smoker, moderate smoker, heavy smoker BMI (kgs/m 3 ) Age (years) Weight presently Weight at age 18 Quantitative Categorical Examples of Variables Ordinal
Variable, Value, Observation PObservation the unit upon which measurements are made, can be an individual or aggregate PVariable the generic thing we measure Pe.g., AGE of a person Pe.g., HIV status of a person PValue a realized measurement Pe.g.,“27” Pe.g.,“positive”
Data Collection Form Var1 (ID)1 Var2 (AGE) 27 Var3 (SEX)F Var4 (HIV)Y Var5(KAPOSISARC)Y Var6 (REPORTDATE)4/25/89 Var7 (OPPORTUNIS) N On this form, each questionnaire contains an observation Each question corresponds to a variable
U.S. Census Form
Data Table Each row corresponds to an observation Each column contains information on a variable Each cell in the table contains a value AGESEXHIVONSETINFECT 24MY12-OCT-07Y 14MN30-MAY-05Y 32FN11-NOV-06N
Illustrative Example: Cigarette Consumption and Lung Cancer Unit of observation in these data are individual regions, not individual people. cig1930 = per capita cigarette use in 1930 mortality = lung cancer mortality per 100,000 in 1950
Data Quality An analysis is only as good as its data GIGO ≡ garbage in, garbage out Does a variable measure what it purports to? –Validity = freedom from systematic error –Objectivity = seeing things as they are without making it conform to a worldview Consider how the wording of a question can influence validity and objectivity
Choose Your Ethos BS is manipulative and has a predetermined outcome. Science “bends over backwards” to consider alternatives. Blackburn, S. (2005). Oxford Univ. Press Frankfurt, H. G. (2005). Princeton University Press
Scientific Ethos “I cannot give any scientist of any age any better advice than this: The intensity of the conviction that a hypothesis is true has no bearing on whether it is true or not.” Peter Medawar