Permutation Analysis Benjamin Neale, Michael Neale, Manuel Ferreira.

Slides:



Advertisements
Similar presentations
Review bootstrap and permutation
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
ACE estimates under different sample sizes Benjamin Neale Michael Neale Boulder 2006.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Modeling and Simulation Monte carlo simulation 1 Arwa Ibrahim Ahmed Princess Nora University.
Neuroinformatics 1: review of statistics Kenneth D. Harris UCL, 28/1/15.
Multiple Testing, Permutation, False Discovery Benjamin Neale Pak Sham Shaun Purcell.
PSY 307 – Statistics for the Behavioral Sciences
Mutual Information Mathematical Biology Seminar
DEPENDENT SAMPLES t-TEST What is the Purpose?What Are the Assumptions?How Does it Work?
Chapter 14 Simulation. Monte Carlo Process Statistical Analysis of Simulation Results Verification of the Simulation Model Computer Simulation with Excel.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Power and Sample Size Adapted from: Boulder 2004 Benjamin Neale Shaun Purcell I HAVE THE POWER!!!
Chapter 14 Inferential Data Analysis
+ AP Statistics: Chapter 11 Pages Rohan Parikh Azhar Kassam Period 2.
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Statistical Computing
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
MS 305 Recitation 11 Output Analysis I
B AD 6243: Applied Univariate Statistics Hypothesis Testing and the T-test Professor Laku Chidambaram Price College of Business University of Oklahoma.
Experimental Design: One-Way Correlated Samples Design
1 Experimental Design. 2  Single Factor - One treatment with several levels.  Multiple Factors - More than one treatment with several levels each. 
LT 4.2 Designing Experiments Thanks to James Jaszczak, American Nicaraguan School.
1 A Robust Framework for Detecting Structural Variations February 6, 2008 Seunghak Lee 1, Elango Cheran 1, and Michael Brudno 1 1 University of Toronto,
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
AP STATISTICS LESSON SIMULATING EXPERIMENTS.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Comp. Genomics Recitation 3 The statistics of database searching.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
Conducting A Study Designing Sample Designing Experiments Simulating Experiments Designing Sample Designing Experiments Simulating Experiments.
Aim: How can we assign experimental units to treatments in a way that is fair to all of the treatments?
Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
M ONTE C ARLO SIMULATION Modeling and Simulation CS
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Chapter Twelve The Two-Sample t-Test. Copyright © Houghton Mifflin Company. All rights reserved.Chapter is the mean of the first sample is the.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Association between genotype and phenotype
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Experimental Statistics - week 3
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
The Broad Institute of MIT and Harvard Differential Analysis.
From Wikipedia: “Parametric statistics is a branch of statistics that assumes (that) data come from a type of probability distribution and makes inferences.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
A simple method to localise pleiotropic QTL using univariate linkage analyses of correlated traits Manuel Ferreira Peter Visscher Nick Martin David Duffy.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Differences Among Groups
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Chapter 4 Selected Nonparemetric Techniques: PARAMETRIC VS. NONPARAMETRIC.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
8.1 Normal Approximations
Genome Wide Association Studies using SNP
Regression-based linkage analysis
Linkage in Selected Samples
Power and Sample Size I HAVE THE POWER!!! Boulder 2006 Benjamin Neale.
ACE estimates under different sample sizes
Part /11/22 Vincent Guillemot
Fig. 2. Illustration of the procedure for finding spatiotemporal regions correspoding to significant differences between conditions using nonparametric.
Presentation transcript:

Permutation Analysis Benjamin Neale, Michael Neale, Manuel Ferreira

Who came up with permutation? Hint: it’s a statistical tool R. A. Fisher Proposed as validation for Student’s t-test in 1935 in Fisher’s The Design of Experiments

Basic Principle 1.Under the null, all data comes from the same distribution 2.We calculate our statistic, such as mean difference 3.We then shuffle the data with respect to group and recalculate the statistic (mean difference) 4.Repeat step 3 multiple times 5.Find out where our statistic lies in comparison to the null distribution

Real Example Case-Control data, and we want to find out if there is a mean difference casecontrol Mean Mean difference.541

Permutation One casecontrol Mean Mean difference =.329

Simulation example I simulated 70 data points from a single distribution—35 cases and 35 controls Mean difference of -.21 I then permuted randomly assigning case or control status Empirical significance=#hits/#permutations

Distribution of mean differences from permutations -.21

Distribution of mean differences from permutations

Empirical Significance #hits is any permuted dataset that had a mean difference >.21 or <-.21 #permutations is the trials permuted datasets we generate Result(#hits/#permutations) = 2024/5000 =.4048 T test results =.3672

General principles of permutation Disrupt the relationship being tested –Mean difference between group: switch groups –Test for linkage in siblings: randomly reassign the ibd sharing –If matched control then within pair permute –Shaun will describe tests for association

General advantages Does not rely on distributional assumptions Corrects for hidden selection Corrects for hidden correlation

How would we do QTL permutation in Mx? 1.We analyze our real data and record χ 2 2.For sibpairs we shuffle the ibd probabilities for each sibpair 3.We reanalyze the data and record the new χ 2 4.We generate a distribution of χ 2 for the permuted sets 5.Place our statistic on the distribution 6.Repeat for all locations in genome

Some caveats Computational time can be a challenge Determining what to maintain and what to permute Variable pedigrees also pose some difficulties Need sufficient data for combinations Unnecessary when no bias, but no cost to doing it

Some Exercises What would we permute to determine if the MZ correlation is equal to the DZ correlation in a univariate analysis? What would we permute to determine if QTL dominance is significant in linkage analysis