Lecture 2 Estimating the population mean

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Lecture 2 Estimating the population mean

Harjoitusten palauttamisesta Harjoitukset palautetaan yhtenä dokumenttina. Nimi ja opiskelijanumero jokaiseen palautukseen. Jos tehtävät tekee kynällä & paperilla, niin paperien skannaaminen on ok kunhan laatu on riittävä.

Estimating the population mean Why spend time on estimating the mean? Highlights the difference between an object and an estimate of it. Helps to appreciate the Law of Large Numbers. Helps to appreciate the Central Limit Theorem.

Basic concepts Population Sample, random sampling Random variable Distribution Moments

Population The group or collection of all possible entities of interest (“students in the emetrics class of Otto”) In the developments here, think of a very large (infinitely large) population.

Sample A subset of the population (“all students in the 𝑛 𝑡ℎ row/column of the class”).

Random sampling Random sampling: each object in the population (“student”) has the same probability of being selected into the sample. Any two objects give no information about each other:  Independently distributed. Before being chosen, they are in expectation equal:  Identically distributed.

Random variable Numerical summary of a random outcome (“height of student”).

Distribution All the values that the variable, say Y, may get + The probability of getting each of those values. Example: coin tosses, lottery numbers, height of students in Otto’s class. Conditional distribution: The distribution of Y conditional on another variable, say, X (“height of students in Otto’s class, conditional on gender”).

Moments How to describe a distribution? mean Conditional first (and higher) moment: 𝐸 𝑌|𝑋 = 𝜇 𝑌|𝑋 Higher moments: variance, skewness, kurtosis, ...

HS päivän lehti 4.1.2017

Estimating the mean of a population Estimator = function of a sample of data drawn randomly from the population. Estimate = numerical value of the estimator, given a particular sample.

Estimating the mean of a population Population mean 𝜇 𝑌 = 1 𝑁 𝑖=1 𝑁 𝑌 𝑖 Sample mean 𝑌 = 1 𝑛 𝑖=1 𝑛 𝑌 𝑖 , n < N 𝑌 is a natural estimate of 𝜇 𝑌 .

Estimating the mean of a population 2 questions: What are the properties of 𝑌 ? 2. Why use 𝑌 and not some other estimator?

Properties of 𝑌 𝑌 is a random variable. Its properties are determined by the sampling distribution (“otantajakauma”). The individual observations which are used to calculate 𝑌 were chosen randomly.

Properties of 𝑌  𝑌 is random. Q: what happens if you take a different random sample? The distribution of 𝑌 over different samples of the same size (n) is called the sampling distribution.

Properties of 𝑌 Sampling distribution: all the values that 𝑌 can take given n + The probability of each of these values. The mean and variance of 𝑌 are the mean and variance of its sampling distribution. The sampling distribution is very important.

Properties of 𝑌 If 𝐸 𝑌 = 𝜇 𝑌 , then 𝑌 is an unbiased (harhaton) estimate of 𝜇 𝑌 . (Note any estimator 𝜇 𝑌 ). If 𝑌 → 𝜇 𝑌 when 𝑛→∞, then 𝑌 is a consistent (tarkentuva) estimate of 𝜇 𝑌 . This is the case, due to the Law of Large Numbers (“suurten lukujen laki”), under certain conditions.

Law of Large Numbers: conditions 𝑌 𝑖 are independently and identically distributed. 𝐸 𝑌 𝑖 = 𝜇 𝑌 No large outliers / 𝑣𝑎𝑟 𝑌 𝑖 <∞

Properties of 𝑌 How precise is 𝑌 , and how does this depend on n? In other words, how large is the variance of 𝑌 ? Central Limit Theorem (”Keskeinen raja-arvolause”).

Central Limit Theorem Suppose the sample is random and i.i.d. 𝐸 𝑌 𝑖 = 𝜇 𝑌 𝑣𝑎𝑟 𝑌 𝑖 = 𝜎 𝑌 2 , 0< 𝜎 𝑌 2 <∞. Then, as 𝑛→∞, distribution of ( 𝑌 − 𝜇 𝑌 )/ 𝜎 𝑌 2 becomes arbitrarily well approximated by the standard normal distribution.

Central Limit Theorem CLT is about the distribution of the estimate of the mean. CLT applies no matter what the underlying distribution is. Examples: coin tosses (binary), age (only positive values / integers observed), …

Properties of 𝑌 Result 𝐸 𝑌 = 𝜇 𝑌 Var 𝑌 = 𝜎 2 /𝑛

Height of students in class ind/group 1 2 3 4 5 6 7 168 165 172 170 174 160 173 164 178 177 171 186 180 190 184 185 175

Data in a histogram

same with some statistics ind/group 1 2 3 4 5 6 7 AVG SD 168 165 172 170 174 160 168.43 4.69 173 164 178 177 170.29 6.55 171 186 180 174.86 9.25 190 184 185 175 182.29 8.06 174.75 171.75 172.25 173.75 178.00 174.50 172.75 173.96 10.24 9.00 10.21 9.32 5.89 11.12 11.24 8.80

7 times sample of 4

10 times sample of 4 100 times sample of 4

𝑌 as a least squares estimator 𝑌 minimizes the sum of squared residuals. 𝑚𝑖𝑛 𝑚 𝑖=1 𝑁 ( 𝑌 𝑖 −𝑚) 2 Optimizing (see App. 3.2) yields 𝑚 = 1 𝑛 𝑖=1 𝑁 𝑌 𝑖 = 𝑌

𝑌 as a least squares estimator 𝑌 has smaller variance than all other linear unbiased estimators.  𝑌 is more efficient than other (linear) estimators.  𝑌 is BLUE (best linear unbiased estimator).

Choosing an objective / loss function Least squares Absolute deviations Min / max. May depend on context: think of a basket ball team. think of #incubators relative to need.

Comparing means Two means, 𝑌 1 and 𝑌 2 . (height of male / female students). Are they (not) different?   is 𝑌 1 - 𝑌 2 =0? What else do you know? You have an estimate of the variances of the means.

Comparing means 𝑌 1 and 𝑌 2 are independently distributed.  their difference is normally distributed.  variance of 𝑌 1 − 𝑌 2 is 𝜎 1 2 𝑛 1 + 𝜎 2 2 𝑛 2

Female height distr Male height distr