Presentation is loading. Please wait.

Presentation is loading. Please wait.

2-1 Data Summary and Display 2-1 Data Summary and Display.

Similar presentations


Presentation on theme: "2-1 Data Summary and Display 2-1 Data Summary and Display."— Presentation transcript:

1

2

3 2-1 Data Summary and Display

4 2-1 Data Summary and Display

5 2-1 Data Summary and Display
Population Mean For a finite population with N measurements, the mean is The sample mean is a reasonable estimate of the population mean.

6 2-1 Data Summary and Display
Sample Variance and Sample Standard Deviation

7 2-1 Data Summary and Display
The sample variance is The sample standard deviation is

8 2-1 Data Summary and Display
Computational formula for s2

9 2-1 Data Summary and Display
Population Variance When the population is finite and consists of N values, we may define the population variance as The sample variance is a reasonable estimate of the population variance.

10 2-2 Stem-and-Leaf Diagram
Steps for Constructing a Stem-and-Leaf Diagram

11 2-2 Stem-and-Leaf Diagram

12 2-2 Stem-and-Leaf Diagram

13 2-2 Stem-and-Leaf Diagram

14 2-2 Stem-and-Leaf Diagram

15 2-2 Stem-and-Leaf Diagram
Percentiles, quartiles, and the median range Median = (40th + 41st )/2=( )/2=161.5 Q1 = (n+1)/4=20.25  btn 20th & 21st Q1= ( )/2 = 144 Q2 = median Q3 = 3(n+1)/4 = 60.75 Q3 = ( )/2 = 181 IQR = interquartile range = Q3-Q1

16 2-2 Stem-and-Leaf Diagram

17 2-3 Histograms A histogram is a more compact summary of data than a stem-and-leaf diagram. To construct a histogram for continuous data, we must divide the range of the data into intervals, which are usually called class intervals, cells, or bins. If possible, the bins should be of equal width to enhance the visual information in the histogram.

18 2-3 Histograms

19 2-3 Histograms

20 2-3 Histograms

21 2-3 Histograms

22 2-3 Histograms An important variation of the histogram is the Pareto chart. This chart is widely used in quality and process improvement studies where the data usually represent different types of defects, failure modes, or other categories of interest to the analyst. The categories are ordered so that the category with the largest number of frequencies is on the left, followed by the category with the second largest number of frequencies, and so forth.

23 2-3 Histograms

24 2-4 Box Plots The box plot is a graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of observations that lie unusually far from the bulk of the data. Whisker Outlier Extreme outlier

25 2-4 Box Plots

26 2-4 Box Plots 2nd quartile = median = 161.5 1st quartile = 143.5
3rd quartile = 181 IQR = Q3 – Q1 = 181 – = 37.5 1.5 IQR = 56.25 Q IQR = IQR = Q3 – Q1 = 181 – = 37.5 1.5 IQR = 56.25 Q IQR = – = 87.25

27 2-4 Box Plots

28

29

30

31

32

33 SAS code and output OPTIONS NODATE NOOVP NONUMBER; DATA STRENGTH;
INPUT STRENGTH CARDS; PROC UNIVARIATE DATA=STRENGTH PLOT NORMAL FREQ; VAR STRENGTH; histogram strength/vscale=count; TITLE 'DESCRIPTIVE STATISTICS AND GRAPHS'; /* PROC CHART DATA=STRENGTH; VBAR STRENGTH; VBAR STRENGTH/TYPE=PCT; HBAR STRENGTH/TYPE=CPCT DISCRETE; TITLE 'HISTOGRAM'; */ RUN; QUIT;

34 DESCRIPTIVE STATISTICS AND GRAPHS
UNIVARIATE 프로시저 변수: STRENGTH 적률 N 가중합 평균 관측치 합 표준 편차 분산 왜도 첨도 제곱합 수정 제곱합 변동계수 평균의 표준 오차 기본 통계 측도 위치측도 변이측도 평균 표준 편차 중위수 분산 최빈값 범위 사분위 범위 위치모수 검정: Mu0=0 검정 통계량 p 값 스튜던트의 t t Pr > |t| <.0001 부호 M Pr >= |M| <.0001 부호 순위 S Pr >= |S| <.0001 정규성 검정 검정 통계량 p 값 Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D >0.1500 Cramer-von Mises W-Sq Pr > W-Sq >0.2500 Anderson-Darling A-Sq Pr > A-Sq >0.2500 분위수(정의 5) 분위수 추정값 100% 최댓값 99% 95% 90% 75% Q 50% 중위수 25% Q 10% 5% 1% 0% 최솟값

35 DESCRIPTIVE STATISTICS AND GRAPHS
SAS code and output DESCRIPTIVE STATISTICS AND GRAPHS UNIVARIATE 프로시저 변수: STRENGTH 극 관측치 -----최소 최대---- 값 관측치 값 관측치 빈도 수 백분율 백분율 백분율 값 빈도 셀 누적 값 빈도 셀 누적 값 빈도 셀 누적

36 DESCRIPTIVE STATISTICS AND GRAPHS
SAS code and output DESCRIPTIVE STATISTICS AND GRAPHS UNIVARIATE 프로시저 변수: STRENGTH 줄기 잎 # 상자그림 | | | | | | *--+--* | | | | | | | 값: (줄기.잎)*10**+1 정규 확률도 *+ | *++ | ***+ | *+ | *** | *** | *** | **** | **** | ***** | ****+ | ****+ | **+ | *** | ** | * | +++ * 75++*

37 SAS code and output

38 2-5 Time Series Plots A time series or time sequence is a data set in which the observations are recorded in the order in which they occur. A time series plot is a graph in which the vertical axis denotes the observed value of the variable (say x) and the horizontal axis denotes the time (which could be minutes, days, years, etc.). When measurements are plotted as a time series, we often see trends, cycles, or other broad features of the data

39 2-5 Time Series Plots

40 2-5 Time Series Plots

41 2-5 Time Series Plots

42 SAS code and output OPTIONS NODATE NOOVP NONUMBER LS=80;
DATA STRENGTH; INPUT STRENGTH N=_N_; CARDS; SYMBOL INTERPOL=JOIN VALUE=DOT HEIGHT=1 LINE=1; PROC GPLOT DATA=STRENGTH; PLOT STRENGTH*N; TITLE 'TIME SERIES GRAPH FOR STRENGTH'; RUN; QUIT;

43 SAS code and output

44 2-6 Multivariate Data The dot diagram, stem-and-leaf diagram, histogram, and box plot are descriptive displays for univariate data; that is, they convey descriptive information about a single variable. Many engineering problems involve collecting and analyzing multivariate data, or data on several different variables. In engineering studies involving multivariate data, often the objective is to determine the relationships among the variables or to build an empirical model.

45 2-6 Multivariate Data

46 2-6 Multivariate Data

47 2-6 Multivariate Data Sample Correlation Coefficient
The strength of a linear relationship between two variables

48 2-6 Multivariate Data Strong when 0.8≤ r ≤ 1, weak 0 ≤ r ≤ 0.5, and moderate otherwise

49 2-6 Multivariate Data

50 2-6 Multivariate Data

51 2-6 Multivariate Data

52 2-6 Multivariate Data

53 2-6 Multivariate Data

54 SAS code and output OPTIONS NODATE NOOVP NONUMBER LS=80; DATA SHAMPOO;
INPUT FOAM SCENT COLOR RESIDUE REGION QUALITY; CARDS; PROC CORR DATA=SHAMPOO; VAR FOAM SCENT COLOR RESIDUE REGION QUALITY; TITLE 'CORRELATIONS OF VARIABLES'; PROC SGSCATTER DATA=SHAMPOO; MATRIX FOAM SCENT COLOR RESIDUE REGION QUALITY; TITLE 'MATRIX OF SCATTER PLOTS FOR THE SHAMPOO DATA'; PROC GPLOT DATA=SHAMPOO; PLOT QUALITY*FOAM=REGION; TITLE 'SCATTER PLOT OF SHAMPOO QUALITY VS. FORM'; RUN; QUIT:

55 SAS code and output CORRELATIONS OF VARIABLES CORR 프로시저
6 개의 변수: FOAM SCENT COLOR RESIDUE REGION QUALITY 단순 통계량 변수 N 평균 표준편차 합 최솟값 최댓값 FOAM SCENT COLOR RESIDUE REGION QUALITY 피어슨 상관 계수, N = 24 H0: Rho=0 가정하에서 Prob > |r| FOAM SCENT COLOR RESIDUE REGION QUALITY FOAM SCENT COLOR RESIDUE REGION QUALITY

56 SAS code and output

57 SAS code and output

58


Download ppt "2-1 Data Summary and Display 2-1 Data Summary and Display."

Similar presentations


Ads by Google