Download presentation
Presentation is loading. Please wait.
Published byJuniper Johns Modified over 8 years ago
1
The Distribution of Single Variables
2
Two Types of Variables Continuous variables – Equal intervals of measurement – Known zero-point that is meaningful Discrete variables – Simply counts of attributes – Generate category frequencies
3
For continuous variables, Equal intervals means that the distance between any two adjacent values is identical e.g., the difference between 21 and 22 years of age is identical in years to the difference between 33 and 34 years of age Meaningful zero-point means it makes sense e.g., a GPA of 0.00 means you have completed no coursework successfully
4
For discrete variables, All we can do is count the number of observations that fall into its various categories e.g., the number of males and females in this class
5
Using SAS to Produce Statistics for Single Variables Using SAS to Produce Statistics for Single Variables libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc freq data=mydata.cities; table citysize; title1 'One-Way Frequency Distribution'; title2; title3 'PPD 404'; run;
6
One-Way Frequency Distribution PPD 404 SIZE OF CITY, DICHOTOMY Cumulative Cumulative CITYSIZE Frequency Percent Frequency Percent ------------------------------------------------------ Small 45 71.4 45 71.4 Large 18 28.6 63 100.0
7
libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc chart data=mydata.cities; vbar spending / discrete; title1 'Vertical Bar Chart'; run;
8
Vertical Bar Chart Frequency 23 + ***** | ***** 22 + ***** ***** | ***** ***** 21 + ***** ***** | ***** ***** 20 + ***** ***** | ***** ***** 19 + ***** ***** | ***** ***** 18 + ***** ***** ***** | ***** ***** ***** 17 + ***** ***** ***** | ***** ***** ***** 16 + ***** ***** ***** | ***** ***** ***** 15 + ***** ***** ***** | ***** ***** ***** 14 + ***** ***** ***** | ***** ***** ***** 13 + ***** ***** ***** | ***** ***** ***** 12 + ***** ***** ***** | ***** ***** ***** 11 + ***** ***** ***** | ***** ***** ***** 10 + ***** ***** ***** | ***** ***** ***** 9 + ***** ***** ***** | ***** ***** ***** 8 + ***** ***** ***** | ***** ***** ***** 7 + ***** ***** ***** | ***** ***** ***** 6 + ***** ***** ***** | ***** ***** ***** 5 + ***** ***** ***** | ***** ***** ***** 4 + ***** ***** ***** | ***** ***** ***** 3 + ***** ***** ***** | ***** ***** ***** 2 + ***** ***** ***** | ***** ***** ***** 1 + ***** ***** ***** | ***** ***** ***** -------------------------------------------- Low Medium High POLICE EXPENDITURE, TRICHOTOMY
9
libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc univariate data=mydata.cities plot normal; var populat; title1 'Univariate and EDA Statistics'; run;
10
Univariate and EDA Statistics PPD 404 Univariate Procedure Variable=POPULAT NUMBER OF RESIDENTS, IN 1,000S Moments N 63 Sum Wgts 63 Mean 587.4127 Sum 37007 Std Dev 1114.554 Variance 1242231 Skewness 5.090201 Kurtosis 30.74326 USS 98756687 CSS 77018305 CV 189.7395 Std Mean 140.4206 T:Mean=0 4.183237 Pr>|T| 0.0001 Num ^= 0 63 Num > 0 63 M(Sign) 31.5 Pr>=|M| 0.0001 Sgn Rank 1008 Pr>=|S| 0.0001 W:Normal 0.468356 Pr<W 0.0001
11
Quantiles(Def=5) 100% Max 7896 99% 7896 75% Q3 641 95% 1949 50% Med 278 90% 906 25% Q1 100 10% 72 0% Min 56 5% 60 1% 56 Range 7840 Q3-Q1 541 Mode 56 Extremes Lowest Obs Highest Obs 56( 30) 1511( 56) 56( 24) 1949( 55) 58( 46) 2816( 54) 60( 21) 3367( 53) 65( 51) 7896( 52)
12
Univariate and EDA Statistics PPD 404 Stem Leaf # Boxplot 7 9 1 7 6 6 5 5 4 4 3 3 4 1 * 2 8 1 * 2 1 59 2 0 1 2 1 | 0 555556666777778889 18 +--+--+ 0 111111111111111111111111222222333344444 39 *-----* ----+----+----+----+----+----+----+---- Multiply Stem.Leaf by 10**+3
13
Univariate and EDA Statistics PPD 404 Univariate Procedure Variable=POPULAT NUMBER OF RESIDENTS, IN 1,000S Normal Probability Plot 7750+ * | | 6250+ | | 4750+ | | 3250+ * ++++ | *++++ | +++++ 1750+ ++++ ** | +++++ * | ++++********** 250+ * * *** ****************** +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
14
Exercise 2 Identify which of the following are variables and which are constants. a. Size of cities b. Denver c. Gender status d. Computer literacy e. Male f. College graduate g. Political party preference h. Grades on an examination
15
Answers Identify which of the following are variables and which are constants. V a. Size of cities C b. Denver V c. Gender status V d. Computer literacy C e. Male C f. College graduate V g. Political party preference V h. Grades on an examination
16
Identify which of the following are discrete variables and which are continuous variables. a. Region of the country: North, South, etc. b. TV viewing: number of hours per week c. Agency size: number of full-time employees d. Crime rate: serious crimes per 1,000 population e. Your hometown: Pasadena, Newport Beach, etc. f. Political conservatism: percent voting Republican g. Contest results: first place, second place, etc. h. Opinion: five-point scale from "Strongly Agree" to "Strongly Disagree"
17
Answers Identify which of the following are discrete variables and which are continuous variables. Da. Region of the country: North, South, etc. Cb. TV viewing: number of hours per week Cc. Agency size: number of full-time employees Cd. Crime rate: serious crimes per 1,000 population De. Your hometown: Pasadena, Newport Beach, etc. Cf. Political conservatism: percent voting Republican C or Dg. Contest results: first place, second place, etc. C or Dh. Opinion: five-point scale from "Strongly Agree" to "Strongly Disagree"
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.