Pendahuluan Pertemuan 1 Matakuliah : D0722 - Statistika dan Aplikasinya Tahun : 2010 Pendahuluan Pertemuan 1
menerangkan statistik deskriptif Learning Outcomes Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : memberikan definisi skala pengukuran, sampel, populasi , data dan pengumpulan data menerangkan statistik deskriptif
Using Statistics (Two Categories) Descriptive Statistics Collect Organize Summarize Display Analyze Inferential Statistics Predict and forecast values of population parameters Test hypotheses about values of population parameters Make decisions
Types of Data - Two Types Qualitative - Categorical or Nominal: Examples are- Color Gender Nationality Quantitative - Measurable or Countable: Examples are- Temperatures Salaries Number of points scored on a 100 point exam
Scales of Measurement Nominal Scale - groups or classes Gender Ordinal Scale - order matters Ranks Interval Scale - difference or distance matters – has arbitrary zero value. Temperatures Ratio Scale - Ratio matters – has a natural zero value. Salaries
Samples and Populations A population consists of the set of all measurements for which the investigator is interested. A sample is a subset of the measurements selected from the population. A census is a complete enumeration of every item in a population.
Why Sample? Census of a population may be: Impossible Impractical Too costly
12-6 Index Numbers An index number is a number that measures the relative change in a set of measurements over time. For example: the Dow Jones Industrial Average (DJIA), the Consumer Price Index (CPI), the New York Stock Exchange (NYSE) Index.
Index Numbers Index Index 1984 121 100.0 64.7 1985 121 100.0 64.7 Year Price 1984-Base 1991-Base 1984 121 100.0 64.7 1985 121 100.0 64.7 1986 133 109.9 71.1 1987 146 120.7 78.1 1988 162 133.9 86.6 1989 164 135.5 87.7 1990 172 142.1 92.0 1991 187 154.5 100.0 1992 197 162.8 105.3 1993 224 185.1 119.8 1994 255 210.7 136.4 1995 247 204.1 132.1 1996 238 196.7 127.3 1997 222 183.5 118.7 Y e a r P i c n d I x ( 1 9 8 2 = ) o f N t u l G s 5 Original Index (1984) Index (1991)
Summary Measures: Population Parameters Sample Statistics Measures of Central Tendency Median Mode Mean Measures of Variability Range Interquartile range Variance Standard Deviation Other summary measures: Skewness Kurtosis
Measures of Central Tendency or Location Median Middle value when sorted in order of magnitude 50th percentile Mode Most frequently- occurring value Mean Average
Arithmetic Mean or Average The mean of a set of observations is their average - the sum of the observed values divided by the number of observations. Population Mean Sample Mean m = å x N i 1 x n i = å 1
Percentiles and Quartiles Given any set of numerical observations, order them according to magnitude. The Pth percentile in the ordered set is that value below which lie P% (P percent) of the observations in the set. The position of the Pth percentile is given by (n + 1)P/100, where n is the number of observations in the set.
Quartiles – Special Percentiles Quartiles are the percentage points that break down the ordered data set into quarters. The first quartile is the 25th percentile. It is the point below which lie 1/4 of the data. The second quartile is the 50th percentile. It is the point below which lie 1/2 of the data. This is also called the median. The third quartile is the 75th percentile. It is the point below which lie 3/4 of the data.
Measures of Variability or Dispersion Range Difference between maximum and minimum values Interquartile Range Difference between third and first quartile (Q3 - Q1) Variance Average*of the squared deviations from the mean Standard Deviation Square root of the variance Definitions of population variance and sample variance differ slightly.
Example - Range and Interquartile Range (Data is used from Example ) Sorted Sales Sales Rank 9 6 1 6 9 2 12 10 3 10 12 4 13 13 5 15 14 6 16 14 7 14 15 8 14 16 9 16 16 10 17 16 11 16 17 12 24 17 13 21 18 14 22 18 15 18 19 16 19 20 17 18 21 18 20 22 19 17 24 20 Range Maximum - Minimum = 24 - 6 = 18 Minimum Q1 = 13 + (.25)(1) = 13.25 First Quartile See slide # 19 for the template output Q3 = 18+ (.75)(1) = 18.75 Third Quartile Interquartile Range Q3 - Q1 = 18.75 - 13.25 = 5.5 Maximum
Variance and Standard Deviation Population Variance Sample Variance n N å ( x - x ) 2 å ( x - m ) 2 s = 2 i = 1 s = ( ) 2 i = 1 n - 1 N ( ) ( ) 2 N n 2 x x å å N å = n = - i 1 å x - i 1 x 2 2 N n = = i = 1 i = 1 ( ) N n - 1 s = s 2 s = s 2
Group Data and the Histogram Dividing data into groups or classes or intervals Groups should be: Mutually exclusive Not overlapping - every observation is assigned to only one group Exhaustive Every observation is assigned to a group Equal-width (if possible) First or last group may be open-ended
Frequency Distribution Table with two columns listing: Each and every group or class or interval of values Associated frequency of each group Number of observations assigned to each group Sum of frequencies is number of observations N for population n for sample Class midpoint is the middle value of a group or class or interval Relative frequency is the percentage of total observations in each class Sum of relative frequencies = 1
Cumulative Frequency Distribution x F(x) F(x)/n Spending Class ($) Cumulative Frequency Cumulative Relative Frequency 0 to less than 100 30 0.163 100 to less than 200 68 0.370 200 to less than 300 118 0.641 300 to less than 400 149 0.810 400 to less than 500 171 0.929 500 to less than 600 184 1.000 The cumulative frequency of each group is the sum of the frequencies of that and all preceding groups.
Histogram A histogram is a chart made of bars of different heights. Widths and locations of bars correspond to widths and locations of data groupings Heights of bars correspond to frequencies or relative frequencies of data groupings
Histogram Example Frequency Histogram
Histogram Frequency A histogram is a chart made of bars of different heights. Widths and locations of bars correspond to widths and locations of data groupings Heights of bars correspond to frequencies or relative frequencies of data groupings
Skewness and Kurtosis Skewness Kurtosis Measure of asymmetry of a frequency distribution Skewed to left Symmetric or unskewed Skewed to right Kurtosis Measure of flatness or peakedness of a frequency distribution Platykurtic (relatively flat) Mesokurtic (normal) Leptokurtic (relatively peaked)
Skewness Skewed to left
Skewness Symmetric
Skewness Skewed to right
Kurtosis Platykurtic - flat distribution
Kurtosis Mesokurtic - not too flat and not too peaked
Kurtosis Leptokurtic - peaked distribution
Methods of Displaying Data Pie Charts Categories represented as percentages of total Bar Graphs Heights of rectangles represent group frequencies Frequency Polygons Height of line represents frequency Ogives Height of line represents cumulative frequency Time Plots Represents values over time
Pie Chart
Bar Chart Fig. 1-11 Airline Operating Expenses and Revenues 2 Average Revenues Average Expenses 1 8 6 4 2 American Continental Delta Northwest Southwest United USAir A i r l i n e
Frequency Polygon and Ogive Relative Frequency Polygon Ogive 5 4 3 2 1 . Relative Frequency Sales 5 4 3 2 1 . Cumulative Relative Frequency Sales
Time Plot y e P r d u c ( b m 1 - 4 ) O S A J M F D N 8 . 5 7 6 o n t h i l s f T y e P r d u c ( b m 1 - 4 )
Exploratory Data Analysis - EDA Techniques to determine relationships and trends, identify outliers and influential observations, and quickly describe or summarize data sets. Stem-and-Leaf Displays Quick-and-dirty listing of all observations Conveys some of the same information as a histogram Box Plots Median Lower and upper quartiles Maximum and minimum
Example Stem-and-Leaf Display 1 122355567 2 0111222346777899 3 012457 4 11257 5 0236 6 02
Box Plot Elements of a Box Plot * o Q1 Q3 Inner Fence Outer Q1-3(IQR) Median Q1 Q3 Inner Fence Outer Interquartile Range Smallest data point not below inner fence Largest data point not exceeding inner fence Suspected outlier Outlier Q1-3(IQR) Q1-1.5(IQR) Q3+1.5(IQR) Q3+3(IQR)
Example: Box Plot
Ringkasan Skala pengukuran: nominal, ordinal, interval, rasio Penyajian data : histogram frekuensi Angka indeks Statistik deskriptif : ukuran pemusatan dan penyebaran