Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Similar presentations


Presentation on theme: "Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)"— Presentation transcript:

1 Lecture 1 Dustin Lueker

2  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)  Hypothesis testing  Inferential methods for two samples  Simple linear regression and correlation STA 291 Summer 2008 Lecture 1

3  Research in all fields is becoming more quantitative ◦ Look at research journals ◦ Most graduates will need to be familiar with basic statistical methodology and terminology  Newspapers, advertising, surveys, etc. ◦ Many statements contain statistical arguments  Computers make complex statistical methods easier to use STA 291 Summer 2008 Lecture 1

4  Many times statistics are used in an incorrect and misleading manner  Purposely misused ◦ Companies/people wanting to furthur their agenda  Cooking the data  Completely making up data  Massaging the numbers  Incidentally misused ◦ Using inappropriate methods  Vital to understand a method before using it STA 291 Summer 2008 Lecture 1

5  Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data  Applicable to a wide variety of academic disciplines ◦ Physical sciences ◦ Social sciences ◦ Humanities  Statistics are used for making informed decisions ◦ Business ◦ Government STA 291 Summer 2008 Lecture 1

6 Design Planning research studies How to best obtain the required data Assuring that our data is representational of the entire population Description Summarizing data Exploring patterns in the data Extract/condense information Inference Make predictions based on the data ‘Infer’ from sample to population Summarize results STA 291 Summer 2008 Lecture 1

7  Population ◦ Total set of all subjects of interest  Entire group of people, animals, products, etc. about which we want information  Elementary Unit ◦ Any individual member of the population  Sample ◦ Subset of the population from which the study actually collects information ◦ Used to draw conclusions about the whole population STA 291 Summer 2008 Lecture 1

8  Variable ◦ A characteristic of a unit that can vary among subjects in the population/sample  Ex: gender, nationality, age, income, hair color, height, disease status, state of residence, grade in STA 291  Parameter ◦ Numerical characteristic of the population  Calculated using the whole population  Statistic ◦ Numerical characteristic of the sample  Calculated using the sample STA 291 Summer 2008 Lecture 1

9  Why take a sample? Why not take a census? Why not measure all of the units in the population? ◦ Accuracy  May not be able to find every unit in the population ◦ Time  Speed of response from units ◦ Money ◦ Infinite Population ◦ Destructive Sampling or Testing STA 291 Summer 2008 Lecture 1

10  University Health Services at UK conducts a survey about alcohol abuse among students ◦ 200 of the students are sampled and asked to complete a questionnaire ◦ One question is “have you regretted something you did while drinking?”  What is the population? Sample? STA 291 Summer 2008 Lecture 1

11  Descriptive Statistics ◦ Summarizing the information in a collection of data  Inferential Statistics ◦ Using information from a sample to make conclusions/predictions about the population STA 291 Summer 2008 Lecture 1

12  The Current Population Survey of about 60,000 households in the United States in 2002 distinguishes three types of families: Married- couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)  It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level ◦ Are these numbers statistics or parameters?  The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5% ◦ Is this an example of descriptive or inferential statistics? STA 291 Summer 2008 Lecture 1

13  Univariate data ◦ Consists of observations on a single attribute  Multivariate data ◦ Consists of observations on several attributes  Special case  Bivariate Data  Consists of observations on two attributes STA 291 Summer 2008 Lecture 1

14  Quantitative or Numerical ◦ Variable with numerical values associated with them  Qualitative or Categorical ◦ Variables without numerical values associated with them STA 291 Summer 2008 Lecture 1

15  Nominal ◦ Gender, nationality, hair color, state of residence  Nominal variables have a scale of unordered categories  It does not make sense to say, for example, that green hair is greater/higher/better than orange hair  Ordinal ◦ Disease status, company rating, grade in STA 291  Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.)  One unit can have more of a certain property than does another unit STA 291 Summer 2008 Lecture 1

16  Quantitative ◦ Age, income, height  Quantitative variables are measured numerically, that is, for each subject a number is observed  The scale for quantitative variables is called interval scale STA 291 Summer 2008 Lecture 1

17  A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following ◦ Nominal (Qualitative): Requires assistance from staff?  Yes  No ◦ Ordinal (Qualitative): Plaque score  No visible plaque  Small amounts of plaque  Moderate amounts of plaque  Abundant plaque ◦ Interval (Quantitative): Number of teeth STA 291 Summer 2008 Lecture 1

18  A birth registry database collects the following information on newborns ◦ Birthweight: in grams ◦ Infant’s Condition:  Excellent  Good  Fair  Poor ◦ Number of prenatal visits ◦ Ethnic background:  African-American  Caucasian  Hispanic  Native American  Other  What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal) STA 291 Summer 2008 Lecture 1

19  Statistical methods vary for quantitative and qualitative variables  Methods for quantitative data cannot be used to analyze qualitative data  Quantitative variables can be treated in a less quantitative manner ◦ Height: measured in cm/in  Interval (Quantitative)  Can be treated at Qualitative  Ordinal:  Short  Average  Tall  Nominal:  72in  60in-72in STA 291 Summer 2008 Lecture 1

20  Try to measure variables as detailed as possible ◦ Quantitative  More detailed data can be analyzed in further depth ◦ Caution: Sometimes ordinal variables are treated at quantitative (ex: GPA) STA 291 Summer 2008 Lecture 1

21  A variable is discrete if it can take on a finite number of values ◦ Gender ◦ Nationality ◦ Hair color ◦ Disease status ◦ Grade in STA 291 ◦ Favorite MLB team  Qualitative variables are discrete STA 291 Summer 2008 Lecture 1

22  Continuous variables can take an infinite continuum of possible real number values ◦ Time spent studying for STA 291 per day  43 minutes  2 minutes  27.487 minutes  27.48682 minutes  Can be subdivided into more accurate values  Therefore continuous STA 291 Summer 2008 Lecture 1

23  Number of children in a family  Distance a car travels on a tank of gas  % grade on an exam STA 291 Summer 2008 Lecture 1

24  Quantitative variables can be discrete or continuous  Age, income, height? ◦ Depends on the scale  Age is potentially continuous, but usually measured in years (discrete) STA 291 Summer 2008 Lecture 1

25  Each possible sample has the same probability of being selected  The sample size is usually denoted by n STA 291 Summer 2008 Lecture 1

26  Population of 4 students: Alf, Buford, Charlie, Dixie  Select a SRS of size n = 2 to ask them about their smoking habits ◦ 6 possible samples of size 2  A,B  A,C  A,D  B,C  B,D  C,D STA 291 Summer 2008 Lecture 1

27  Each of the size possible samples has to have the same probability of being selected ◦ How could we do this?  Roll a die  Random number generator Random number generator STA 291 Summer 2008 Lecture 1

28  Convenience sample ◦ Selecting subjects that are easily accessible to you  Volunteer sample ◦ Selecting the first two subjects who volunteer to take the survey  What are the problems with these samples? ◦ Proper representation of the population ◦ Bias  Examples  Mall interview  Street corner interview STA 291 Summer 2008 Lecture 1


Download ppt "Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)"

Similar presentations


Ads by Google