Lecture 23: Cross validation

Slides:



Advertisements
Similar presentations
Genetic Statistics Lectures (5) Multiple testing correction and population structure correction.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Aaron Lorenz Department of Agronomy and Horticulture
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
Today Ensemble Methods. Recap of the course. Classifier Fusion
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Validation.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER.
Washington State University
Statistical Genomics Zhiwu Zhang Washington State University Lecture 16: CMLM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 7: Impute.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 11: Power, type I error and FDR.
Genome Wide Association Studies Zhiwu Zhang Washington State University.
Bootstrap and Model Validation
7. Performance Measurement
Washington State University
A Multi-stage Approach to Detect Gene-gene Interactions Associated with Multiple Correlated Phenotypes Zhou Xiangdong,Keith Chan, Danhong Zhu Department.
Evaluating Classifiers
26134 Business Statistics Week 5 Tutorial
Lecture 28: Bayesian Tools
Washington State University
Washington State University
Lecture 22: Marker Assisted Selection
Washington State University
Genome Wide Association Studies using SNP
Washington State University
Washington State University
Washington State University
Washington State University
Hint: Numerator Denominator. Vascular Technology Lecture 34: Test Validation (Statistical Profile and Correlation) HHHoldorf.
Washington State University
Washington State University
Quantifying uncertainty using the bootstrap
Washington State University
Washington State University
The effect of using sequence data instead of a lower density SNP chip on a GWAS EAAP 2017; Tallinn, Estonia Sanne van den Berg, Roel Veerkamp, Fred van.
Lecture 23: Cross validation
CHAPTER- 17 CORRELATION AND REGRESSION
Washington State University
Regression Model Building
Regression Model Building
BA 275 Quantitative Business Methods
Canine hip dysplasia is predictable by genotyping
Washington State University
Lecture 16: Likelihood and estimates of variances
Washington State University
Bootstrapping Jackknifing
Model Evaluation and Selection
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Lecture 11: Power, type I error and FDR
Washington State University
Chapter 8: Relationships among Variables
SENSATIONAL SEVENS PART 1
Lecture 11: Power, type I error and FDR
Washington State University
Lecture 18: Heritability and P3D
SENSATIONAL SEVENS PART 2
Washington State University
Lecture 17: Likelihood and estimates of variances
Washington State University
Lecture 23: Cross validation
CS639: Data Management for Data Science
Lecture 22: Marker Assisted Selection
Washington State University
Business Statistics - QBM117
Introduction to Machine learning
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Lecture 23: Cross validation Statistical Genomics Lecture 23: Cross validation Zhiwu Zhang Washington State University

Outline Cross validation K-fold validation Jack knife Re-sampling Two ways of calculating accuracy Bias and correction

GLM and Stepwise regression Models for GWAS&GS GLM and Stepwise regression y = PC + SNP + e QTNs + FarmCPU: -2LL BLINK: -2LL y = PC + u(Kinship) + e QTNs BLUP/gBLUP MAS y = PC + QTNs + e SUPER Complementary y = PC + u(Kinship) + SNP + e QTNs + MLM and MLMM

Which method does not involve with QTNs? CMLM SUPER MLMM FarmCPU BLINK The answer is A

Which method does not involve with kinship? CMLM SUPER MLMM FarmCPU BLINK The answer is E

Which method uses QTNs to build kinship? MLM CMLM ECMLM MLMM FarmCPU BLINK The answer is E

Which model can be used for genomic selection? 1 2 1 and 2 3 and 4 2 and 3 1 and 4 The answer is C 3 4

All the models can be used for GS if remove the term of ___ SNP QTNs U Kinship PC Y The answer is B

Negative prediction accuracy Theor Appl Genet. 2013 Jan;126(1):13-22 Genomewide predictions from maize single-cross data. Massman JM1, Gordillo A, Lorenzana RE, Bernardo R.

Five fold Cross validation Inference Reference By Yao Zhou

Until every individuals get predicted Jack Knife Until every individuals get predicted Inference Inference

Jack Knife: extreme case of K=N N: number of individuals K: number of folds Leave-one-out cross-validation Inference (training) contain only one individuals Not possible to calculate correlation between observed and predicted within inference Evaluation of accuracy must be hold until every individuals receive predictions. Resampling is not available

Re-sampling: stochastic validation Sample partial population, e.g., 20%, as inference (testing), and leave the rest as reference (Training) Instantly evaluate accuracy of inference Repeated for multiple times Average accuracy across replicates Some individuals may never be in the testing

Two ways of calculating correlation

Artefactual negative hold accuracy

Hold bias relates to number of fold

Problem of instant accuracy

Small sample causes bias

Correction of instant accuracy

Highlight GS by GWAS Over fitting Cross validation K-fold validation Jack knife Re-sampling Two ways of calculating accuracy Bias and correction