Download presentation
Presentation is loading. Please wait.
Published byLee Gaines Modified over 9 years ago
1
Gebze Technical University Department of Architecture
MAT120 Asst. Prof. Ferhat PAKDAMAR (Civil Engineer) M Blok - M106 Spring – 2014/2015 Week 13
2
Subjects Week Subjects Methods 1 11.02.2015 Introduction 2 18.02.2015
Set Theory and Fuzzy Logic. Term Paper 3 Real Numbers, Complex numbers, Coordinate Systems. 4 Functions, Linear equations 5 Matrices 6 Matrice operations 7 MIDTERM EXAM MT 8 Limit. Derivatives, Basic derivative rules 9 Term Paper presentations Dead line for TP 10 Integration by parts, 11 Area and volume Integrals 12 Introduction to Numerical Analysis 13 Introduction to Statistics. 14 Review 15 16 FINAL EXAM FINAL
3
“Statistics is a way to get information from data”
What is Statistics? “Statistics is a way to get information from data” Statistics Data Information Information: Knowledge communicated concerning some particular fact. Data: Facts, especially numerical facts, collected together for reference or information. Statistics is a tool for creating new understanding from a set of numbers. Definitions: Oxford English Dictionary
4
Key Statistical Concepts…
Keller: Stats for Mgmt&Econ, 7th Ed. April 21, 2017 Key Statistical Concepts… Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters, per Example 12.5 Sample — A sample is a set of data drawn from the population. — Potentially very large, but less than the population. E.g. a sample of 765 voters exit polled on election day.
5
Key Statistical Concepts…
Parameter — A descriptive measure of a population. Statistic — A descriptive measure of a sample.
6
Key Statistical Concepts…
Population Sample Subset Statistic Parameter Populations have Parameters, Samples have Statistics.
7
Descriptive Statistics…
…are methods of organizing, summarizing, and presenting data in a convenient and informative way. These methods include: Graphical Techniques Numerical Techniques The actual method used depends on what information we would like to extract. Are we interested in… • measure(s) of central location? and/or • measure(s) of variability (dispersion)? Descriptive Statistics helps to answer these questions…
8
Statistical Inference…
Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample. Population Sample Inference Statistic Parameter What can we infer about a Population’s Parameters based on a Sample’s Statistics?
9
Definitions… A variable is some characteristic of a population or sample. E.g. student grades. Typically denoted with a capital letter: X, Y, Z… The values of the variable are the range of possible values for a variable. E.g. student marks (0..100) Data are the observed values of a variable. E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
10
Interval Data… Interval data
• Real numbers, i.e. heights, weights, prices, etc. • Also referred to as quantitative or numerical. Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on.
11
Nominal Data… Nominal Data
• The values of nominal data are categories. E.g. responses to questions about marital status, coded as: Single = 1, Married = 2, Divorced = 3, Widowed = 4 Because the numbers are arbitrary arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!) Nominal data are also called qualitative or categorical.
12
Ordinal Data… Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: E.g. College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like: excellent > poor or fair < very good That is, order is maintained no matter what numeric values are assigned to each category.
13
Graphical & Tabular Techniques for Nominal Data…
The only allowable calculation on nominal data is to count the frequency of each value of the variable. We can summarize the data in a table that presents the categories and their counts called a frequency distribution. A relative frequency distribution lists the categories and the proportion with which each occurs.
14
Nominal Data (Tabular Summary)
15
Nominal Data (Frequency)
Bar Charts are often used to display frequencies…
16
Nominal Data It all the same information, (based on the same data).
Just different presentation.
17
Graphical Techniques for Interval Data
There are several graphical methods that are used when the data are interval (i.e. numeric, non-categorical). The most important of these graphical methods is the histogram. The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.
18
Building a Histogram… Collect the Data
Create a frequency distribution for the data. Draw the Histogram.
19
Scatter Diagram… Example 2.9 A real estate agent wanted to know to what extent the selling price of a home is related to its size… Collect the data Determine the independent variable (X – house size) and the dependent variable (Y – selling price) Use Excel to create a “scatter diagram”…
20
Scatter Diagram… It appears that in fact there is a relationship, that is, the greater the house size the greater the selling price…
21
Patterns of Scatter Diagrams…
Linearity and Direction are two concepts we are interested in Positive Linear Relationship Negative Linear Relationship Weak or Non-Linear Relationship
22
Numerical Descriptive Techniques…
Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient of Variation Measures of Relative Standing Percentiles, Quartiles Measures of Linear Relationship Covariance, Correlation, Least Squares Line
23
Measures of Central Location…
The arithmetic mean, a.k.a. average, shortened to mean, is the most popular & useful measure of central location. It is computed by simply adding up all the observations and dividing by the total number of observations: Sum of the observations Number of observations Mean =
24
Arithmetic Mean… Sample Mean Population Mean
25
Statistics is a pattern language…
Population Sample Size N n Mean
26
Measures of Variability…
Measures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value? For example, two sets of class grades are shown. The mean (=50) is the same in each case… But, the red class has greater variability than the blue class.
27
Statistics is a pattern language…
Population Sample Size N n Mean Variance
28
Variance… population mean The variance of a population is:
The variance of a sample is: population size sample mean Note! the denominator is sample size (n) minus one !
29
Application… Example 4.7. The following sample consists of the number of jobs six randomly selected students applied for: 17, 15, 23, 7, 9, 13. Finds its mean and variance. What are we looking to calculate? The following sample consists of the number of jobs six randomly selected students applied for: 17, 15, 23, 7, 9, 13. …as opposed to or 2
30
Sample Mean & Variance…
Sample Variance Sample Variance (shortcut method)
31
Standard Deviation… The standard deviation is simply the square root of the variance, thus: Population standard deviation: Sample standard deviation:
32
Methods of Collecting Data…
There are many methods used to collect or obtain data for statistical analysis. Three of the most popular methods are: • Direct Observation • Experiments, and • Surveys.
33
Sampling… Recall that statistical inference permits us to draw conclusions about a population based on a sample. Sampling (i.e. selecting a sub-set of a whole population) is often done for reasons of cost (it’s less expensive to sample 1,000 television viewers than 100 million TV viewers) and practicality (e.g. performing a crash test on every automobile produced is impractical). In any case, the sampled population and the target population should be similar to one another.
34
Sampling Plans… A sampling plan is just a method or procedure for specifying how a sample will be taken from a population. We will focus our attention on these three methods: Simple Random Sampling, Stratified Random Sampling, and Cluster Sampling.
35
The Normal Distribution…
The normal distribution is the most important of all probability distributions. The probability density function of a normal random variable is given by: It looks like this: Bell shaped, Symmetrical around the mean … Burada videoları göster SHOW THE VIDEOS Carl Friedrich Gauss
36
The Normal Distribution…
The normal distribution is fully defined by two parameters: its standard deviation and mean Important things to note: The normal distribution is bell shaped and symmetrical about the mean Unlike the range of the uniform distribution (a ≤ x ≤ b) Normal distributions range from minus infinity to plus infinity
37
Standard Normal Distribution…
A normal distribution whose mean is zero and standard deviation is one is called the standard normal distribution. As we shall see shortly, any normal distribution can be converted to a standard normal distribution with simple algebra. This makes calculations much easier. 1
38
Calculating Normal Probabilities…
We can use the following function to convert any normal random variable to a standard normal random variable… Some advice: always draw a picture!
39
Determining Normal Probabilities
When value do not fall directly on σ landmarks: 1. State the problem 2. Standardize the value(s) (z score) 3. Sketch, label, and shade the curve 4. Use Table B
40
Step 1: State the Problem
Chapter 7 4/21/2017 Step 1: State the Problem What percentage of solidification of concrete are less than 40 minutes? Let X ≡ solidification length We know from prior research: X ~ N(39, 2) minutes Pr(X ≤ 40) = ? Basic Biostat
41
Step 2: Standardize Standard Normal variable ≡ “Z” ≡ a Normal random variable with μ = 0 and σ = 1, Z ~ N(0,1) Use Table B to look up cumulative probabilities for Z minutes 𝑍= 41−39 2 =1 X ~ N(39, 2)
42
Example: A Z variable of 1.96 has cumulative probability 0.9750.
43
Step 2 (cont.) Turn value into z score:
z-score = no. of σ-units above (positive z) or below (negative z) distribution mean μ
44
Steps 3 & 4: Sketch & Table B
4. Use Table B to lookup Pr(Z ≤ 0.5) = minutes
45
Probabilities Between Points
a represents a lower boundary b represents an upper boundary Pr(a ≤ Z ≤ b) = Pr(Z ≤ b) − Pr(Z ≤ a)
46
Between Two Points Pr(-2 ≤ Z ≤ 0.5) = Pr(Z ≤ 0.5) − Pr(Z ≤ -2) = − .6687 .6915 .0228 -2 0.5 0.5 -2
47
Values Corresponding to Normal Probabilities
State the problem Find Z-score corresponding to percentile (Table B) Sketch Unstandardize:
48
z percentiles zp ≡ the Normal z variable with cumulative probability p
Use Table B to look up the value of zp Look inside the table for the closest cumulative probability entry Trace the z score to row and column
49
e.g., What is the 97.5th percentile on the Standard Normal curve?
z.975 = 1.96 Notation: Let zp represents the z score with cumulative probability p, e.g., z.975 = 1.96
50
Chapter 7 4/21/2017 Step 1: State Problem Question: What solidification of concrete length is smaller than 97.5% of solidification of concrete? Let X represent solidification of concrete length We know from prior research that X ~ N(39, 2) A value that is smaller than .975 of solidification of concrete has a cumulative probability of.025 Basic Biostat
51
Chapter 7 4/21/2017 Step 2 (z percentile) Less than 97.5% (right tail) = greater than 2.5% (left tail) z lookup: z.025 = −1.96 z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 –1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233 Basic Biostat
52
Unstandardize and sketch
minutes The 2.5th percentile is 35 minutes
53
EXAMPLES Bus time! Deal or No Deal?
54
Calculating Normal Probabilities…
Example: The time required to build a computer is normally distributed with a mean of 50 minutes and a standard deviation of 10 minutes: What is the probability that a computer is assembled in a time between 45 and 60 minutes? Algebraically speaking, what is P(45 < X < 60) ?
55
Calculating Normal Probabilities…
P(45 < X < 60) ? …mean of 50 minutes and a standard deviation of 10 minutes…
56
Calculating Normal Probabilities…
We can use Normal distribution table in Appendixes in the books to look-up probabilities P(0 < Z < z) We can break up P(–.5 < Z < 1) into: P(–.5 < Z < 0) + P(0 < Z < 1) The distribution is symmetric around zero, so we have: P(–.5 < Z < 0) = P(0 < Z < .5) Hence: P(–.5 < Z < 1) = P(0 < Z < .5) + P(0 < Z < 1)
57
Calculating Normal Probabilities…
How to use the Normal Distribution Table … This table gives probabilities P(0 < Z < z) First column = integer + first decimal Top row = second decimal place P(0 < Z < 0.5) P(0 < Z < 1) P(–.5 < Z < 1) = = .5328
58
Using the Normal Table …
P(0 < Z < 1.6) = .4452 What is P(Z > 1.6) ? z 1.6 P(Z > 1.6) = .5 – P(0 < Z < 1.6) = .5 – .4452 = .0548
59
Using the Normal Table …
P(0 < Z < 2.23) What is P(Z < -2.23) ? P(Z < -2.23) P(Z > 2.23) z -2.23 2.23 P(Z < -2.23) = P(Z > 2.23) = .5 – P(0 < Z < 2.23) = .0129
60
Using the Normal Table …
P(0 < Z < 1.52) P(Z < 0) = .5 What is P(Z < 1.52) ? z 1.52 P(Z < 1.52) = .5 + P(0 < Z < 1.52) = = .9357
61
Using the Normal Table …
P(0 < Z < 0.9) What is P(0.9 < Z < 1.9) ? P(0.9 < Z < 1.9) z 0.9 1.9 P(0.9 < Z < 1.9) = P(0 < Z < 1.9) – P(0 < Z < 0.9) =.4713 – .3159 = .1554
62
Finding Values of Z… Other Z values are Z.05 = 1.645 Z.01 = 2.33
63
Using the values of Z Because z.025 = 1.96 and - z.025= -1.96, it follows that we can state P(-1.96 < Z < 1.96) = .95 Similarly P( < Z < 1.645) = .90
64
EXAMPLES Bus time! Deal or No Deal?
65
Have a nice week!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.