Statistics Chi-Square

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Statistics Chi-Square https://www.123rf.com/photo_6622261_statistics-and-analysis-of-data-as-background.html

Hypothesis Tests So far we’ve tested hypotheses about means: μ = value use a z-test μ1= μ2 use a t-test μ1= μ2 = μ3 use an F-test (ANOVA) ? ? ?

WHAT ABOUT OTHER TYPES OF DATA AND OTHER HYPOTHESES ??? Hypothesis Tests WHAT ABOUT OTHER TYPES OF DATA AND OTHER HYPOTHESES ???

Tests for Count Data Another type of data would be counts (frequencies) in categories: Sibling Study Brothers Sisters Total Observed # 31 21 52

Tests for Count Data The hypothesis you would be testing would not be about mean values… Sibling Study Brothers Sisters Total Observed # 31 21 52

Tests for Count Data An alternate hypothesis might be: Are brothers and sisters equally likely? Ha: #bro ≠ #sis Sibling Study Brothers Sisters Total Observed # 31 21 52

Tests for Count Data The hypothesis can be based on science or history or some other “educated guess” Sibling Study Brothers Sisters Total Observed # 31 21 52

Tests for Count Data A null hypothesis you might have about this data would be: H0: #bro = #sis Sibling Study Brothers Sisters Total Observed # 31 21 52

… We’ll use a test called “Chi-Square” “X 2” Tests for Count Data … We’ll use a test called “Chi-Square” “X 2” Sibling Study Brothers Sisters Total Observed # 31 21 52

Tests for Count Data A Chi-Square is shaped like an F distribution (both are squared)

Tests for Count Data A Chi-Square needs the original data and some “hypothesized” data Sibling Study Brothers Sisters Total Observed # 31 21 52 Hypothesized # 26

Tests for Count Data The “hypothesized” data are called “expected” values Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26

Tests for Count Data The hypothesized values must add up to the original total count Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26

Tests for Count Data They will come from the null (want-to-disprove) hypothesis H0: #bro = #sis Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26

Tests for Count Data To calculate the ChiSq, we use the formula: (𝑶−𝑬) 𝟐 𝑬

(𝑶−𝑬) 𝟐 𝑬 Calculate O-E Sibling Study Brothers Sisters Total Chi-Square PROJECT QUESTION (𝑶−𝑬) 𝟐 𝑬 Calculate O-E Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26 O-E 5 -5

(𝑶−𝑬) 𝟐 𝑬 Square the O-E values Chi-Square PROJECT QUESTION (𝑶−𝑬) 𝟐 𝑬 Square the O-E values Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26 (O-E)2 25

(𝑶−𝑬) 𝟐 𝑬 Divide them by E Sibling Study Brothers Sisters Total Chi-Square PROJECT QUESTION (𝑶−𝑬) 𝟐 𝑬 Divide them by E Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26 (O-E)2/E 25/26

(𝑶−𝑬) 𝟐 𝑬 Add them up: 25/26 + 25/26 = 1.923076923 Chi-Square PROJECT QUESTION (𝑶−𝑬) 𝟐 𝑬 Add them up: 25/26 + 25/26 = 1.923076923 Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26 (O-E)2/E 25/26

Ok, so our CHiSq statistic is 1.923076923 We need a probability! Chi-Square PROJECT QUESTION Ok, so our CHiSq statistic is 1.923076923 We need a probability! Sibling Study Brothers Sisters Total Observed # 31 21 52 Expected # 26 (O-E)2/E 25/26

Tests for Count Data Open the spreadsheet

In the row called “P Chi-Sq” Move your curser to the cell next to it Tests for Count Data PROJECT QUESTION In the row called “P Chi-Sq” Move your curser to the cell next to it

Go to: “Formulas” “More Functions” “Statistical” “CHISQ.TEST” Tests for Count Data PROJECT QUESTION Go to: “Formulas” “More Functions” “Statistical” “CHISQ.TEST”

Excel calls observed data “actual” (nobody else does…) Tests for Count Data PROJECT QUESTION Excel calls observed data “actual” (nobody else does…)

Don’t include the total column Tests for Count Data PROJECT QUESTION Don’t include the total column

There’s your probability value! Do you reject H0? Tests for Count Data PROJECT QUESTION There’s your probability value! Do you reject H0?

Do you reject H0? Nope, the values are not different enough Tests for Count Data PROJECT QUESTION Do you reject H0? Nope, the values are not different enough

Do you reject H0? We “fail to reject H0” Tests for Count Data PROJECT QUESTION Do you reject H0? We “fail to reject H0”

Tests for Count Data PROJECT QUESTION What could we do?

What could we do? Increase n Tests for Count Data PROJECT QUESTION What could we do? Increase n

Tests for Count Data The t-test and ANOVA F-test were designed to be powerful (reject H0 a lot) even with small sample sizes

Tests for Count Data A Chi-Square test is not very powerful It only rejects the hypothesis when the data are very VERY different

Tests for Count Data This means it is a very conservative test – nobody is going to think you cheated if you use a Chi-square!

Tests for Count Data It also means we don’t have to set a level of practical significance…

Tests for Count Data How different do the data have to be to be “statistically significant” (allow you to reject the hypothesized data) Add 1 to the “Brothers” and subtract 1 from the “Sisters”

Tests for Count Data PROJECT QUESTION 9.6% - not yet! Try again!

Tests for Count Data PROJECT QUESTION 5.2% - so close!!!! Try again!

Tests for Count Data PROJECT QUESTION 2.7% - finally !!!!

Tests for Count Data The data have to be: Before they are significantly different! Sibling Study Brothers Sisters Total Observed # 34 18 52 Expected # 26

Tests for Count Data What could you do to make it more likely that you would find a significant difference?

Tests for Count Data What could you do to make it more likely that you would find a significant difference? Use a bigger sample size “n”

Questions?

Tests for Count Data Most Chi-Squared tests don’t have specific hypothesized values

Tests for Count Data Most Chi-Squared tests don’t have specific hypothesized values The expected values come from the table of observed data

Tests for Count Data Ha: p1 ≠ p2 H0: p1 = p2

Tests for Count Data To compare two ways to removing plaque clogging arteries, Dr. Eric J. Topol and colleagues conducted a study http://www.stat.wmich.edu/s216/xsq/xsq.html

Tests for Count Data They randomly assigned 1,012 heart patients to have either directional coronary atherectomy or balloon angioplasty http://www.stat.wmich.edu/s216/xsq/xsq.html

Tests for Count Data Is there evidence of a significant difference in the two approaches in the proportion of deaths or heart attacks within 6 months of treatment?   http://www.stat.wmich.edu/s216/xsq/xsq.html

What would be Dr. Topol’s alpha-level? Chi-Square PROJECT QUESTION What would be Dr. Topol’s alpha-level?

What would be his alternate hypothesis? Chi-Square PROJECT QUESTION What would be his alternate hypothesis?

Chi-Square PROJECT QUESTION What would be his alternate hypothesis? Ha: p death or heart attack for directional atherectomy ≠ p death or heart attack for balloon angioplasty

What would be his null hypothesis? Chi-Square PROJECT QUESTION What would be his null hypothesis?

Chi-Square PROJECT QUESTION What would be his null hypothesis? H0: p death or heart attack for directional atherectomy = p death or heart attack for balloon angioplasty

Died or suffered a heart attack Did not die or suffer a heart attack Chi-Square PROJECT QUESTION Here are Dr. Topol’s results: Died or suffered a heart attack Did not die or suffer a heart attack Directional Atherectomy 44 468 Balloon Angioplasty 23 477

Died or suffered a heart attack Did not die or suffer a heart attack Chi-Square PROJECT QUESTION Do you think there is a practically significant difference? Died or suffered a heart attack Did not die or suffer a heart attack Directional Atherectomy 44 468 Balloon Angioplasty 23 477

Died or suffered a heart attack Did not die or suffer a heart attack Chi-Square PROJECT QUESTION How would you calculate the expected values??? Died or suffered a heart attack Did not die or suffer a heart attack Directional Atherectomy 44 468 Balloon Angioplasty 23 477

First calculate row and column totals and the grand total: Tests for Count Data PROJECT QUESTION First calculate row and column totals and the grand total:

First calculate row and column totals and the grand total: Tests for Count Data PROJECT QUESTION First calculate row and column totals and the grand total:

Calculate expected values using: Exp = (RowTot)(ColTot)/GrandTot Tests for Count Data PROJECT QUESTION Calculate expected values using: Exp = (RowTot)(ColTot)/GrandTot

Exp(DirAth/Died) = (RowTot)(ColTot)/GrandTot = (512)(67)/1012 Tests for Count Data PROJECT QUESTION Exp(DirAth/Died) = (RowTot)(ColTot)/GrandTot = (512)(67)/1012

Fill in the other cells: Tests for Count Data PROJECT QUESTION Fill in the other cells:

The expected values are: Tests for Count Data PROJECT QUESTION The expected values are:

Now we need a Chi- Square: Tests for Count Data PROJECT QUESTION Now we need a Chi- Square:

Now we need a Chi-Square: Tests for Count Data PROJECT QUESTION Now we need a Chi-Square:

Now we need a Chi-Square: Tests for Count Data PROJECT QUESTION Now we need a Chi-Square:

Can Dr. Topol reject the null hypothesis? Tests for Count Data PROJECT QUESTION Can Dr. Topol reject the null hypothesis?

Chi-Square PROJECT QUESTION What could Dr. Topol do to make it more likely that he would find a significant difference?

Questions?

Tests for Count Data How would we tell which are different?

Tests for Count Data The hi-lo-close graph!

Tests for Count Data Or a 3D column chart

Tests for Count Data Note: Excel won’t handle an expected value of “0” – you must leave these out of your analysis

How Excel Should Be (but isn’t…) Antarctic Cyclones Enter the observed data in blue cells in the table: 40-49S 50-59S 60-79S Fall 370 526 980 Winter 452 624 1200 Spring 273 513 995 Summer 422 1059 1751

Tests for Count Data PROJECT QUESTION Are there significant differences in cyclone count for different latitude and season categories?

Tests for Count Data PROJECT QUESTION

Questions?