Probability and Statistics in Geology Probability and statistics are an important aspect of Earth Science. Understanding the details, population of a data.

Slides:



Advertisements
Similar presentations
Copyright © Allyn & Bacon (2007) Statistical Analysis of Data Graziano and Raulin Research Methods: Chapter 5 This multimedia product and its contents.
Advertisements

Random Sampling and Data Description
Psychology: A Modular Approach to Mind and Behavior, Tenth Edition, Dennis Coon Appendix Appendix: Behavioral Statistics.
Table of Contents Exit Appendix Behavioral Statistics.
Measures of Dispersion
Introduction to Data Analysis
Statistics: Data Analysis and Presentation Fr Clinic II.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Sampling Distributions
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Chapter 7 Probability and Samples: The Distribution of Sample Means
Today: Central Tendency & Dispersion
Math 116 Chapter 12.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Inference for regression - Simple linear regression
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Measures of Dispersion CUMULATIVE FREQUENCIES INTER-QUARTILE RANGE RANGE MEAN DEVIATION VARIANCE and STANDARD DEVIATION STATISTICS: DESCRIBING VARIABILITY.
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
Points in Distributions n Up to now describing distributions n Comparing scores from different distributions l Need to make equivalent comparisons l z.
Chapter 2 Describing Data.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
Skewness & Kurtosis: Reference
Chapter 6: Random Errors in Chemical Analysis CHE 321: Quantitative Chemical Analysis Dr. Jerome Williams, Ph.D. Saint Leo University.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
The Central Tendency is the center of the distribution of a data set. You can think of this value as where the middle of a distribution lies. Measure.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
For starters - pick up the file pebmass.PDW from the H:Drive. Put it on your G:/Drive and open this sheet in PsiPlot.
PCB 3043L - General Ecology Data Analysis.
Data Analysis, Presentation, and Statistics
Statistics Unit 9 only requires us to do Sections 1 & 2. * If we have time, there are some topics in Sections 3 & 4, that I will also cover. They tie in.
Introduction to statistics I Sophia King Rm. P24 HWB
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Chapter 7: The Distribution of Sample Means
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Consider your grade in a class - Let’s say that your semester grade is based on the following 4 test scores. 85, 80, 70 and 95 What is your grade for the.
THE NORMAL DISTRIBUTION
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
Statistical Methods Michael J. Watts
Statistical Methods Michael J. Watts
PCB 3043L - General Ecology Data Analysis.
Description of Data (Summary and Variability measures)
Sampling Distribution Models
CHAPTER 2: Basic Summary Statistics
Geology Geomath Chapter 7 - Statistics tom.h.wilson
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Probability and Statistics in Geology Probability and statistics are an important aspect of Earth Science. Understanding the details, population of a data sample How rounded are these pebbles ? Where did they come from ? How likely is an earthquake here in Northridge ?

Probability and Statistics in Geology Statistics Histograms Probability Error Analysis Regression ] Discuss next week

Are Statistics Always Right ? Can They be Misleading ? Toss a coin 6 times....”Heads or tails ? ” What is most unlikely ? What is more likely ?.....six tails heads and 3 tails So is HTHTHT more likely than TTTTTT ?...No, both are equally unlikely!

Are Statistics Always Right ? Can They be Misleading ? The result 3 heads and 3 tails is more likely only because There are many combinations where this can occur (e.g. HHTHTT, or HTHTTH, or HHHTTT...) Let's try it...

What is a Statistic ? Is this a statistic ? “ In 1970, the oil refining capacity of Belgium was 32.6 million tonnes per year” This is actually, just a fact – not a statistic

What is a Statistic ? Consider a pebbly beach How could you determine the composition, mass, length, shape of these particular pebbles ? Would these sizes be the same on every beach ?

What is a Statistic ? - Specimen Let's pick up a pebble and look at it – this is a specimen This pebble could probably give us the composition but would it be inclusive of all the pebbles ? Is it typical ? How could be improve this specimen ?

What is a Statistic ? - Sample We could pick up 100 pebbles, this is a sample from the beach This should give you a much better idea of your beach rocks Could we do any better ?

What is a Statistic ? - Population Or we could sample ALL the pebbles on the beach! This is the population of all pebbles Now measure the composition, size, shape, of each Is this a realistic plan ?

What is a Statistic ? - Population Specimen: One object Sample:A subset number of objects Population:All the objects These terms are often misused in science and literature.

Faults in Southern California Above is a map of faults found in southern California If we just study the San Jacinto fault, what is this called statistically ? If we study the system, San Jacinto, Elsinor, and San Andreas what is this called statistcally ?

So What is a Statistic ? Is the average mass of a pebble a statistic ? This depends on whether this average is determine From a sample of pebbles or the total population... If we take the average of the total population – this considered a parameter and is now a simple fact The average of a sample, however, is a statistic.

So What is a Statistic ? A statistic is an attempt to estimate the average mass of all the pebbles by calculating the average mass of some of the pebbles Statistics are generally based on a sample of the population

Election Polls Polling question: “ Who did the best job in the debate ?” Obama 54% McCain 30% Estimates of voter intentions obtained before an election are statistics...a sample of the population

Election Polls Obama 365 McCain 162 The final result of an election, however, is an election parameter The final result is a fact, a measure of the entire voting population Obama 66,882,230 McCain 58,343,671

Back to the Pebbly Beach Average, Mean, and Median Pebble#Mass (g) The typical mass of pebbles on a particular beach can be described by the mean, ( same as the average ) w w = 1/N    w i i = 1 N The mean is the “total mass of the sample” divided by The “number of pebbles” - What is mean of these pebbles ?

Back to the Pebbly Beach Average, Mean, and Median Pebble#Mass (g) Another way of finding the typical mass of pebbles is to use the median value. Median means “middle” and is the weight of the middle Pebble if all are lined up (ranked) from lightest to heaviest. You must have an odd number of pebbles to get the median In the above example, pebble #6 has a mass of 374 g which gives the median value of this pebble sample

Back to the Pebbly Beach Average, Mean, and Median Will the median always be the same as the mean ? With an even number of pebbles (100), you can average The 50 th and 51 st pebbles. Pebble#Mass (g)

Back to the Pebbly Beach - Dispersion What about other aspects of the distribution of pebbles ? How can we tell if the pebbles are similar in size (i.e. well or poorly sorted) Pebble#Mass (g) We could give the total range of sizes – known as the dispersion But how much does this tell us about all the sample pebbles ?

Back to the Pebbly Beach - Dispersion The heaviest and lightest pebbles may not be “typical” One way to get an accurate measure of how similar your Pebbles are is to use the mean square of the standard deviation Pebble#Mass (g)  2 = (mass - w) 2 This measures the deviation from the mean – also known as the variance - the bar indicates the average of all calculations The standard deviation is the square root of this value.

Back to the Pebbly Beach - Dispersion Pebble#Mass (g)  2 = (mass - w) 2 Why do we square this difference ? Some will be negative, we just want the deviation of each From the average value. If  2 is small – then the masses are similar and well sorted If  2 is large – then the masses are widely varying and are poorly sorted

Visualizing Distribution of Data How can you display graphically the distribution of a large number of pebbles ? Which sizes occur most often ? Which are fairly rare ?

Visualizing Distribution of Data: Histogram A histogram displays the pebble mass count in bins ( 10 bins shown ) We first count the number of occurences (frequency) in each bin and list them in a table called the frequency distribution Then plot this frequency as a bar chart against mass Pebble mass (g) Frequency Range(g)Number Frequency Distribution

Histograms in Matlab (or Octave) To plot histograms in Matlab: >> x = 200:25:500 % set bin range and increment, here 25 >> y = pebblefile(:,2) % read column 2 of file of pebble masses >> hist(y,x) % plots histogram shown above for data (y) and bins (x) Pebble mass (g) Frequency Pebble# Mass (g) Count of all pebbles

Visualizing Distribution of Data Marine seismic study, Weeraratne et al., 2007 We're interested in earthquake paths which come from every possible azimuth within 360 o (the back azimuth). How can we graphically represent the distribution of cyclical data or direction ?

Visualizing Distribution of Data: Rose Diagrams A rose diagram is like plotting a histogram on a polar graph. The direction is represented by The angle around the plot and The frequency is proportional To distance from the center. Here frequency ranges from 0 to 6 and an angle of 30 o is the most frequent occuring 6 times. A list of fault dip angles could be plotted in this way.

Plotting Rose Diagrams in Matlab (or Octave) To plot rose diagrams in Matlab: >> dip = faultdipfile(:,1) % reads first column of data input >> dipradians = dip.*pi./180 % converts angles to radians >> bins = 100 % specify the number of bins >> rose(dipradians,bins) % plot the rose diagram

Probability What is Probability ? If I measure a large number of data points, how often do I obtain a particular result ? Pebble mass (g) Frequency For the pebbles masses measured here, the most probably mass is 350 grams This mass value occurs in 22 (frequency) out of 100 cases or 22% of the time. Thus the estimated probability of picking up a pebble in this area with a mass of 350 grams is 22%.

Probability What is Probability ? If I measure a large number of data points, how often do I obtain a particular result ? Pebble mass (g) Frequency For the pebbles masses measured here, the most probably mass is 350 grams This mass value occurs in 22 (frequency) out of 100 cases or 22% of the time. Thus the estimated probability of picking up a pebble in this area with a mass of 350 grams is 22%.

Probability What is Probability ? If I measure a large number of data points, how often do I obtain a particular result ? Pebble mass (g) Frequency For the pebbles masses measured here, the most probably mass is 350 grams This mass value occurs in 22 (frequency) out of 100 cases or 22% of the time. Thus the estimated probability of picking up a pebble in this area with a mass of 350 grams is 22%.

Probability We can then add another column to the data which shows the probability for each bin size You can now plot probability in a histogram Pebble mass (g) Probability Range(g)Number Probability Frequency Distribution & Probability

Probability: What is Normal ? You can compare your data distribution to theoretical estimates The most common distribution used is a normal distribution also known as a Gaussian distribution. Pebble mass (g) Probability Range(g)Number Probability Frequency Distribution & Probability

Gaussian Distribution P(x) = e [-(x-x) 2 /2  2 ] sqrt(2  2 ) The Gaussian distribution is written as above and describes the relative probability of obtaining the value, x. Here  is the standard deviation and x is the average of all x

x P(x) Gaussian Distribution This is a Gaussian distribution for x mean = 5.0 and  = 2.0 You are more likely to obtain a value between 4-6 where the graph is high And less likely to obtain a value between 1-2, or 9-10 P(x) = e [-(x-x) 2 /2  2 ] sqrt(2  2 )

Gaussian Distribution We can quantify this by looking at the area under the curve, the total area under the curve is 1.0 The area under the curve between is shown in gray. This area is much smaller than the dark gray block between x P(x)

Gaussian Distribution The area under the curve between 3-7 is and is termed 1.0  (this is known as the 68% confidence limit) The area under the curve between 1-9 is and is termed  2.0  (this is known as the 95% confidence limit) x P(x) 1.0  2.0  To quantify these “areas” we use established values for multiples of the standard deviation from the mean

Linear Regression: How to Fit a Line to Scattered Data Now that we've learned statistical analysis of a single variable We can also consider statistical analysis of two related variables. We may be able to approximate this relationship by a straight line. How do we find this line ? Which line is best ? Pebble diameter Distance from shore (m)

Linear Regression: How to Fit a Line to Scattered Data The line draw to the right is one possibility. How can we determine whether this line is better than another – in a quantitative way ? Pebble diameter Distance from shore (m) DyDy We can calculate the mean square deviation by looking the distance each point is from the predicted line The devation of one point is shown by D y and is estimated in the “y direction” only.

Linear Regression: How to Fit a Line to Scattered Data This gives you the deviation of one point from the line. To obtain the mean square deviation, we take the average of D y for all points We calculate this using the same equation for standard deviation which we used before. Pebble diameter Distance from shore (m) DyDy  2 = ( D y - D y) 2 The line with the smallest s will have the best fit to the data