Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fundamentals of Data Analysis Lecture 3 Basics of statistics.

Similar presentations


Presentation on theme: "Fundamentals of Data Analysis Lecture 3 Basics of statistics."— Presentation transcript:

1 Fundamentals of Data Analysis Lecture 3 Basics of statistics

2 Program for today F Basic terms and definitions F Discrete distributions F Continuous distributions F Normal distribution

3 Topics for discussion n What are the applications of statistics in modern physics? n How important is the drawing of conclusions based on statistical analysis ?

4 What is the statistics ? Definition of Statistics: 1. A collection of quantitative data pertaining to a subject or group. Examples are blood pressure statistics etc. 2. The science that deals with the collection, tabulation, analysis, interpretation, and presentation of quantitative data

5 What is the statistics ? Two phases of statistics:  Descriptive Statistics: o Describes the characteristics of a product or process using information collected on it.  Inferential Statistics (Inductive): o Draws conclusions on unknown process parameters based on information contained in a sample. o Uses probability

6 Probability n When we cannot rely on the assumption that all sample points are equally likely, we have to determine the probability of an event experimentally. We perform a large number of experiments N and count how often each of the sample points is obtained. The ratio of the number of occurrences of a certain sample point to the total number of experiments is called the relative frequency.

7 Probability n The probability is then assigned the relative frequency of the occurrence of a sample point in this long series of repetitions of the experiment. This is based on the axiom, called the "law of large numbers", which says that the relative frequency approaches the true (theoretical) probability of the outcome if the experiment is repeated over and over again. How important is the drawing of conclusions based on statistical analysis.

8 Probability where n(E) is the number of times, the event E took place out of a total of N experiments. From this definition we can see that the probability is a number between 0 and 1. When the probability is 1, then we know that a particular outcome is certain.

9 Probability For a discrete random variable definition of probability is intuitive: where n(x) is the number of occurences of the desired value of the random variable x (successes) in N samples (N   ).

10 Probability n For a continuous random variable, this definition requires the identification of a small range of variation Δx (Δx  0), for which the probability is determined : n For a continuous random variable it is preferable to use the probability density function:

11 Histogram The histogram is the most important graphical tool for exploring the shape of data distributions. And a good way to visualize trends in population data. The more a particular value occurs, the larger the corresponding bar on the histogram.

12 Histogram Constructing a histogram Step 1: Find range of distribution, largest - smallest values Step 2: Choose number of classes, 5 to 20 Step 3: Determine width of classes, one decimal place more than the data, class width = range/number of classes Step 4: Determine class boundaries Step 5: Draw frequency histogram

13 Histogram Number of groups or cells  If number of observations < 100 – 5 to 9 cells  Between 100-500 – 8 to 17 cells  Greater than 500 – 15 to 20 cells

14 Analysis of histogram

15 Calculating the average for ungrouped data and for grouped data:

16 Analysis of histogram Calculating the median for ungrouped data and for grouped data:

17 Analysis of histogram BoundariesMidpointFrequencyComputation 23.6-26.525.04100 26.6-29.528.0361008 29.6-32.531.0511581 32.6-35.534.0632142 35.6-38.537.0582146 38.6-41.540.0522080 41.6-44.543.0341462 44.6-47.546.016736 47.6-50.549.06294 Total32011549

18 Measures of dispersion n Range n Standard deviation n Variance

19 Measures of dispersion The range is the simplest and easiest to calculate of the measures of dispersion. R = X max - X min

20 Measures of dispersion Standard deviation inside the probe:

21 Measures of dispersion For a discrete random variable definition of variation is as follows: when for continous is:

22 Parameters of a distribution n Parameter is a characteristic of a population, i.o.w. it describes a population n Statistic is a characteristic of a sample, used to make inferences on the population parameters that are typically unknown, called an estimator

23 Parameters of a distribution  Population - Set of all items that possess a characteristic of interest  Sample - Subset of a population

24 Parameters of a distribution Expected value (EV) discrete random variable: and for continuous random variable:

25 Random numbers

26 Normal distribution Characteristics of the normal curve:  It is symmetrical -- Half the cases are to one side of the center; the other half is on the other side.  The distribution is single peaked, not bimodal or multi- modal  Also known as the Gaussian distribution

27 Normal distribution Characteristics of the normal curve:  It is symmetrical -- Half the cases are to one side of the center; the other half is on the other side.  The distribution is single peaked, not bimodal or multi- modal  Also known as the Gaussian distribution

28 Normal distribution  Probability density function:  N(μ,σ)  N(0,1) - standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1

29 Normal distribution

30 Exponential distribution n Probability density function n Cumulative distribution function Cumulative distribution function is given by: F(x) = P(-oo, x) for

31 Thanks for attention !


Download ppt "Fundamentals of Data Analysis Lecture 3 Basics of statistics."

Similar presentations


Ads by Google