The Distribution of Single Variables
Two Types of Variables Continuous variables – Equal intervals of measurement – Known zero-point that is meaningful Discrete variables – Simply counts of attributes – Generate category frequencies
For continuous variables, Equal intervals means that the distance between any two adjacent values is identical e.g., the difference between 21 and 22 years of age is identical in years to the difference between 33 and 34 years of age Meaningful zero-point means it makes sense e.g., a GPA of 0.00 means you have completed no coursework successfully
For discrete variables, All we can do is count the number of observations that fall into its various categories e.g., the number of males and females in this class
Using SAS to Produce Statistics for Single Variables Using SAS to Produce Statistics for Single Variables libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc freq data=mydata.cities; table citysize; title1 'One-Way Frequency Distribution'; title2; title3 'PPD 404'; run;
One-Way Frequency Distribution PPD 404 SIZE OF CITY, DICHOTOMY Cumulative Cumulative CITYSIZE Frequency Percent Frequency Percent Small Large
libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc chart data=mydata.cities; vbar spending / discrete; title1 'Vertical Bar Chart'; run;
Vertical Bar Chart Frequency 23 + ***** | ***** 22 + ***** ***** | ***** ***** 21 + ***** ***** | ***** ***** 20 + ***** ***** | ***** ***** 19 + ***** ***** | ***** ***** 18 + ***** ***** ***** | ***** ***** ***** 17 + ***** ***** ***** | ***** ***** ***** 16 + ***** ***** ***** | ***** ***** ***** 15 + ***** ***** ***** | ***** ***** ***** 14 + ***** ***** ***** | ***** ***** ***** 13 + ***** ***** ***** | ***** ***** ***** 12 + ***** ***** ***** | ***** ***** ***** 11 + ***** ***** ***** | ***** ***** ***** 10 + ***** ***** ***** | ***** ***** ***** 9 + ***** ***** ***** | ***** ***** ***** 8 + ***** ***** ***** | ***** ***** ***** 7 + ***** ***** ***** | ***** ***** ***** 6 + ***** ***** ***** | ***** ***** ***** 5 + ***** ***** ***** | ***** ***** ***** 4 + ***** ***** ***** | ***** ***** ***** 3 + ***** ***** ***** | ***** ***** ***** 2 + ***** ***** ***** | ***** ***** ***** 1 + ***** ***** ***** | ***** ***** ***** Low Medium High POLICE EXPENDITURE, TRICHOTOMY
libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc univariate data=mydata.cities plot normal; var populat; title1 'Univariate and EDA Statistics'; run;
Univariate and EDA Statistics PPD 404 Univariate Procedure Variable=POPULAT NUMBER OF RESIDENTS, IN 1,000S Moments N 63 Sum Wgts 63 Mean Sum Std Dev Variance Skewness Kurtosis USS CSS CV Std Mean T:Mean= Pr>|T| Num ^= 0 63 Num > 0 63 M(Sign) 31.5 Pr>=|M| Sgn Rank 1008 Pr>=|S| W:Normal Pr<W
Quantiles(Def=5) 100% Max % % Q % % Med % % Q % 72 0% Min 56 5% 60 1% 56 Range 7840 Q3-Q1 541 Mode 56 Extremes Lowest Obs Highest Obs 56( 30) 1511( 56) 56( 24) 1949( 55) 58( 46) 2816( 54) 60( 21) 3367( 53) 65( 51) 7896( 52)
Univariate and EDA Statistics PPD 404 Stem Leaf # Boxplot * * | *-----* Multiply Stem.Leaf by 10**+3
Univariate and EDA Statistics PPD 404 Univariate Procedure Variable=POPULAT NUMBER OF RESIDENTS, IN 1,000S Normal Probability Plot * | | | | | | * ++++ | *++++ | ** | * | ++++********** 250+ * * *** ******************
Exercise 2 Identify which of the following are variables and which are constants. a. Size of cities b. Denver c. Gender status d. Computer literacy e. Male f. College graduate g. Political party preference h. Grades on an examination
Answers Identify which of the following are variables and which are constants. V a. Size of cities C b. Denver V c. Gender status V d. Computer literacy C e. Male C f. College graduate V g. Political party preference V h. Grades on an examination
Identify which of the following are discrete variables and which are continuous variables. a. Region of the country: North, South, etc. b. TV viewing: number of hours per week c. Agency size: number of full-time employees d. Crime rate: serious crimes per 1,000 population e. Your hometown: Pasadena, Newport Beach, etc. f. Political conservatism: percent voting Republican g. Contest results: first place, second place, etc. h. Opinion: five-point scale from "Strongly Agree" to "Strongly Disagree"
Answers Identify which of the following are discrete variables and which are continuous variables. Da. Region of the country: North, South, etc. Cb. TV viewing: number of hours per week Cc. Agency size: number of full-time employees Cd. Crime rate: serious crimes per 1,000 population De. Your hometown: Pasadena, Newport Beach, etc. Cf. Political conservatism: percent voting Republican C or Dg. Contest results: first place, second place, etc. C or Dh. Opinion: five-point scale from "Strongly Agree" to "Strongly Disagree"