Lecture 1 Dustin Lueker
Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals) Hypothesis testing Inferential methods for two samples Simple linear regression and correlation STA 291 Summer 2008 Lecture 1
Research in all fields is becoming more quantitative ◦ Look at research journals ◦ Most graduates will need to be familiar with basic statistical methodology and terminology Newspapers, advertising, surveys, etc. ◦ Many statements contain statistical arguments Computers make complex statistical methods easier to use STA 291 Summer 2008 Lecture 1
Many times statistics are used in an incorrect and misleading manner Purposely misused ◦ Companies/people wanting to furthur their agenda Cooking the data Completely making up data Massaging the numbers Incidentally misused ◦ Using inappropriate methods Vital to understand a method before using it STA 291 Summer 2008 Lecture 1
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data Applicable to a wide variety of academic disciplines ◦ Physical sciences ◦ Social sciences ◦ Humanities Statistics are used for making informed decisions ◦ Business ◦ Government STA 291 Summer 2008 Lecture 1
Design Planning research studies How to best obtain the required data Assuring that our data is representational of the entire population Description Summarizing data Exploring patterns in the data Extract/condense information Inference Make predictions based on the data ‘Infer’ from sample to population Summarize results STA 291 Summer 2008 Lecture 1
Population ◦ Total set of all subjects of interest Entire group of people, animals, products, etc. about which we want information Elementary Unit ◦ Any individual member of the population Sample ◦ Subset of the population from which the study actually collects information ◦ Used to draw conclusions about the whole population STA 291 Summer 2008 Lecture 1
Variable ◦ A characteristic of a unit that can vary among subjects in the population/sample Ex: gender, nationality, age, income, hair color, height, disease status, state of residence, grade in STA 291 Parameter ◦ Numerical characteristic of the population Calculated using the whole population Statistic ◦ Numerical characteristic of the sample Calculated using the sample STA 291 Summer 2008 Lecture 1
Why take a sample? Why not take a census? Why not measure all of the units in the population? ◦ Accuracy May not be able to find every unit in the population ◦ Time Speed of response from units ◦ Money ◦ Infinite Population ◦ Destructive Sampling or Testing STA 291 Summer 2008 Lecture 1
University Health Services at UK conducts a survey about alcohol abuse among students ◦ 200 of the students are sampled and asked to complete a questionnaire ◦ One question is “have you regretted something you did while drinking?” What is the population? Sample? STA 291 Summer 2008 Lecture 1
Descriptive Statistics ◦ Summarizing the information in a collection of data Inferential Statistics ◦ Using information from a sample to make conclusions/predictions about the population STA 291 Summer 2008 Lecture 1
The Current Population Survey of about 60,000 households in the United States in 2002 distinguishes three types of families: Married- couple (MC), Female householder and no husband (FH), Male householder and no wife (MH) It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level ◦ Are these numbers statistics or parameters? The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5% ◦ Is this an example of descriptive or inferential statistics? STA 291 Summer 2008 Lecture 1
Univariate data ◦ Consists of observations on a single attribute Multivariate data ◦ Consists of observations on several attributes Special case Bivariate Data Consists of observations on two attributes STA 291 Summer 2008 Lecture 1
Quantitative or Numerical ◦ Variable with numerical values associated with them Qualitative or Categorical ◦ Variables without numerical values associated with them STA 291 Summer 2008 Lecture 1
Nominal ◦ Gender, nationality, hair color, state of residence Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green hair is greater/higher/better than orange hair Ordinal ◦ Disease status, company rating, grade in STA 291 Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than does another unit STA 291 Summer 2008 Lecture 1
Quantitative ◦ Age, income, height Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval scale STA 291 Summer 2008 Lecture 1
A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following ◦ Nominal (Qualitative): Requires assistance from staff? Yes No ◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque ◦ Interval (Quantitative): Number of teeth STA 291 Summer 2008 Lecture 1
A birth registry database collects the following information on newborns ◦ Birthweight: in grams ◦ Infant’s Condition: Excellent Good Fair Poor ◦ Number of prenatal visits ◦ Ethnic background: African-American Caucasian Hispanic Native American Other What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal) STA 291 Summer 2008 Lecture 1
Statistical methods vary for quantitative and qualitative variables Methods for quantitative data cannot be used to analyze qualitative data Quantitative variables can be treated in a less quantitative manner ◦ Height: measured in cm/in Interval (Quantitative) Can be treated at Qualitative Ordinal: Short Average Tall Nominal: 72in 60in-72in STA 291 Summer 2008 Lecture 1
Try to measure variables as detailed as possible ◦ Quantitative More detailed data can be analyzed in further depth ◦ Caution: Sometimes ordinal variables are treated at quantitative (ex: GPA) STA 291 Summer 2008 Lecture 1
A variable is discrete if it can take on a finite number of values ◦ Gender ◦ Nationality ◦ Hair color ◦ Disease status ◦ Grade in STA 291 ◦ Favorite MLB team Qualitative variables are discrete STA 291 Summer 2008 Lecture 1
Continuous variables can take an infinite continuum of possible real number values ◦ Time spent studying for STA 291 per day 43 minutes 2 minutes minutes minutes Can be subdivided into more accurate values Therefore continuous STA 291 Summer 2008 Lecture 1
Number of children in a family Distance a car travels on a tank of gas % grade on an exam STA 291 Summer 2008 Lecture 1
Quantitative variables can be discrete or continuous Age, income, height? ◦ Depends on the scale Age is potentially continuous, but usually measured in years (discrete) STA 291 Summer 2008 Lecture 1
Each possible sample has the same probability of being selected The sample size is usually denoted by n STA 291 Summer 2008 Lecture 1
Population of 4 students: Alf, Buford, Charlie, Dixie Select a SRS of size n = 2 to ask them about their smoking habits ◦ 6 possible samples of size 2 A,B A,C A,D B,C B,D C,D STA 291 Summer 2008 Lecture 1
Each of the size possible samples has to have the same probability of being selected ◦ How could we do this? Roll a die Random number generator Random number generator STA 291 Summer 2008 Lecture 1
Convenience sample ◦ Selecting subjects that are easily accessible to you Volunteer sample ◦ Selecting the first two subjects who volunteer to take the survey What are the problems with these samples? ◦ Proper representation of the population ◦ Bias Examples Mall interview Street corner interview STA 291 Summer 2008 Lecture 1