CIS 2033 Base on text book: A Modern Introduction to

Slides:



Advertisements
Similar presentations
Introduction to Non Parametric Statistics Kernel Density Estimation.
Advertisements

CIS Based on text book: F.M. Dekking, C. Kraaikamp, H.P.Lopulaa, L.E.Meester. A Modern Introduction to Probability and Statistics Understanding.
1. Frequency Distribution & Relative Frequency Distribution 2. Histogram of Probability Distribution 3. Probability of an Event in Histogram 4. Random.
Ch. 17 Basic Statistical Models CIS 2033: Computational Probability and Statistics Prof. Longin Jan Latecki Prepared by: Nouf Albarakati.
POINT ESTIMATION AND INTERVAL ESTIMATION
Random Sampling and Data Description
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
© 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. 1 Slide Slide Slides Prepared by Juei-Chao Chen Fu Jen Catholic University Slides Prepared.
STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
JMB Chapter 1EGR Spring 2010 Slide 1 Probability and Statistics for Engineers  Descriptive Statistics  Measures of Central Tendency  Measures.
Dr. Asawer A. Alwasiti.  Chapter one: Introduction  Chapter two: Frequency Distribution  Chapter Three: Measures of Central Tendency  Chapter Four:
Describing Data: Displaying and Exploring Data. 4-2 GOALS 1. Develop and interpret a dot plot. 2. Construct and interpret box plots. 3. Compute and understand.
CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics Instructor Longin Jan Latecki C22: The Method of Least Squares.
Chapter 2, Part A Descriptive Statistics: Tabular and Graphical Presentations n Summarizing Categorical Data n Summarizing Quantitative Data Categorical.
The hypothesis that most people already think is true. Ex. Eating a good breakfast before a test will help you focus Notation  NULL HYPOTHESIS HoHo.
Random Sampling Approximations of E(X), p.m.f, and p.d.f.
Math 3033 Wanwisa Smith 1 Base on text book: A Modern Introduction to Probability and Statistics Understanding Why and How By: F.M. Dekking, C. Kraaikamp,
CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics Instructor:
MDH Chapter 1EGR 252 Fall 2015 Slide 1 Probability and Statistics for Engineers  Descriptive Statistics  Measures of Central Tendency  Measures of Variability.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Histograms PowerPoint Prepared by Alfred P. Rovai.
Chapter 15: Exploratory data analysis: graphical summaries CIS 3033.
14.6 Descriptive Statistics (Graphical). 2 Objectives ► Data in Categories ► Histograms and the Distribution of Data ► The Normal Distribution.
Chapter 1: Exploring Data AP Statistics. Statistics Main Idea: The world would like to describe, discuss, etc. an entire “group,” i.e. all elements Problem:
Descriptive Statistics
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Continuous Distributions
Linear Algebra Review.
Chapter 16: Exploratory data analysis: numerical summaries
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Chapter 1 Overview and Descriptive Statistics
Chapter 2: Methods for Describing Data Sets
Graphical Presentation of data
Chapter 16: Exploratory data analysis: Numerical summaries
Introduction to Summary Statistics
Chapter 2 Descriptive Statistics: Tabular and Graphical Methods
CIS 2033 based on Dekking et al
Chapter 2 Describing Distributions of Data
Topic 5: Exploring Quantitative data
Introduction to Instrumentation Engineering
Introduction to Summary Statistics
Histograms REVIEWED Histograms are more than just an illustrative summary of the data sample. Typical examples are shown below (in R: see help(hist) for.
MATH 3033 based on Dekking et al
2-1 Data Summary and Display 2-1 Data Summary and Display.
CIS 2033 based on Dekking et al
Exploratory data analysis: numerical summaries
Example: Sample exam scores, n = 20 (“sample size”) {60, 60, 70, 70, 70, 70, 70, 70, 70, 70, 80, 80, 80, 80, 90, 90, 90, 90, 90, 90} Because there are.
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Introduction to Probability and Statistics Thirteenth Edition
Range, Width, min-max Values and Graphs
C19: Unbiased Estimators
Mathematical Foundations of BME
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Chapter 8 Estimation.
Chapter 7 Estimation: Single Population
Producing good data through sampling and experimentation
C19: Unbiased Estimators
MATH 3033 based on Dekking et al
MATH 3033 based on Dekking et al
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

CIS 2033 Base on text book: A Modern Introduction to Probability and Statistics Understanding Why and How By: F.M. Dekking, C. Kraaikamp, H.P.Lopulaa, L.E.Meester Temple University Spring 2012 Slides by: Wanwisa Smith Modified by: Dr. Longin Jan Latecki

Chapter 15 Exploratory data analysis: graphical summaries The set of observations is called a dataset. By exploring the dataset we can gain insight into what probability model suits the phenomenon. To graphically represent univariate datasets, consisting of repeated measurements of one particular quantity, we discuss the classical histogram, the more recently introduced kernel density estimates and the empirical distribution function. To represent a bivariate dataset, which consists of repeated measurements of two quantities, we use the scatterplot.

15.2 Histograms: The term histogram appears to have been used first by Karl Pearson.

How to construct the histogram? Denote a generic (univariate) dataset of size n by First we divide the range of the data into intervals. These intervals are called bins and denoted by The length of an interval Bi is denoted by ǀBiǀ and is called the bin width. We want the area under the histogram on each bin Bi to reflect the number of elements in Bi. Since the total area 1 under the histogram then corresponds to the total number of elements n in the dataset, the area under the histogram on a bin Bi is equal to the proportion of elements in Bi: The height of the histogram on bin Bi must be equal to

Choice of the bin width Consider a histogram with bins of equal width. In that case the bins are of the from where r is some reference point smaller than the minimum of the dataset and b denotes the bin width. Mathematical research, however, has provided some guide- line for a data-based choice for b or m.

15.3 Kernel density estimates

A kernel K is a function K:RR and a kernel K typically satisfies the following conditions.

Examples of Kernel Construction

Scaling the kernel K Then put a scaled kernel around each element xi in the dataset Scale the kernel K into the function

The bandwidth is too big The bandwidth is too small

The function g in blue is a mixture of two Gaussians The function g in blue is a mixture of two Gaussians. We draw 200 samples from it, which are shown as blue dots. We use the samples to generate the histogram (yellow) and its kernel density estimate f (red). The Matlab script is twoGaussKernelDensity1.m

15.4 The empirical distribution function Another way to graphically represent a dataset is to plot the data in a cumulative manner. This can be done by using the empirical cumulative distribution function .

Empirical distribution function Continued

15.5 Scatterplot In some situation we might wants to investigate the relationship between two or more variable. In the case of two variables x and y, the dataset consists of pairs of observations: We call such a dataset a bivariate dataset in contrast to the univariate. The plot the points (Xi, Yi) for i = 1, 2, …,n is called a scatterplot.