Fitness effects of HIV mutations Lucy Crooks Theoretical Biology, ETH Zurich.

Slides:



Advertisements
Similar presentations
Evolution in population
Advertisements

Mixed Designs: Between and Within Psy 420 Ainsworth.
1-Way Analysis of Variance
Social network partition Presenter: Xiaofei Cao Partick Berg.
Exact and heuristics algorithms
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
CHAPTER 17 Evolution of Populations
Algorithms, games, and evolution Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani Nasim Mobasheri Spring 2015.
Part V The Generalized Linear Model Chapter 16 Introduction.
Statistics for Managers Using Microsoft® Excel 5th Edition
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
Lecture 11 Today: 4.2 Next day: Analysis of Unreplicated 2 k Factorial Designs For cost reasons, 2 k factorial experiments are frequently unreplicated.
Reduced Support Vector Machine
Regression III: Robust regressions
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Genetic Algorithms Learning Machines for knowledge discovery.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
Statistics 200b. Chapter 5. Chapter 4: inference via likelihood now Chapter 5: applications to particular situations.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Gene Frequency and Natural Selection Team Brainstormers (BS) Spring Feb 2015.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Model Building III – Remedial Measures KNNL – Chapter 11.
FOR 373: Forest Sampling Methods Simple Random Sampling What is it? How to do it? Why do we use it? Determining Sample Size Readings: Elzinga Chapter 7.
Estimating fitness landscapes John Pinney
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Extension to Multiple Regression. Simple regression With simple regression, we have a single predictor and outcome, and in general things are straightforward.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
BASIC FACTS ABOUT MALARIA n Four Plasmodium species cause human malaria: P. falciparum (the most virulent), P. vivax, P. malariae, and P. ovale. Human.
Class 4 Simple Linear Regression. Regression Analysis Reality is thought to behave in a manner which may be simulated (predicted) to an acceptable degree.
1 Modelling the interactions between HIV and the immune system in hmans R. Ouifki and D. Mbabazi 10/21/2015AIMS.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Emergent Robustness in Software Systems through Decentralized Adaptation: an Ecologically-Inspired ALife Approach Franck Fleurey, Benoit Baudry, Benoit.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
GENETIC ALGORITHMS.  Genetic algorithms are a form of local search that use methods based on evolution to make small changes to a popula- tion of chromosomes.
1.Behavior geneticists study the genetic basis of behavior and personality differences among people. 2.The more closely people are biologically related,
ERCIM May 2001 Analysis of variance, general balance and large data sets Roger Payne Statistics Department, IACR-Rothamsted, Harpenden, Herts AL5 2JQ.
Adjusted from slides attributed to Andrew Ainsworth
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Evidence for Positive Epistasis in HIV-1 Sebastian Bonhoeffer, Colombe Chappe, Neil T. Parkin, Jeanette M. Whitcomb, Christos J. Petropoulos.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Robustness in biology Eörs Szathmáry Eötvös University Collegium Budapest.
Robust Estimators.
A comparison of methods for characterizing the event-related BOLD timeseries in rapid fMRI John T. Serences.
Biologically inspired algorithms BY: Andy Garrett YE Ziyu.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
PCB 3043L - General Ecology Data Analysis.
7-7 Imaginary and Complex Numbers. Why Imaginary Numbers? n What is the square root of 9? n What is the square root of -9? no real number New type of.
Feature Selection and Extraction Michael J. Watts
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Chapter 23 The Evolution of Populations. Modern evolutionary theory is a synthesis of Darwinian selection and Mendelian inheritance Evolution happens.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
SPM short course – Mai 2008 Linear Models and Contrasts Jean-Baptiste Poline Neurospin, I2BM, CEA Saclay, France.
PSY 626: Bayesian Statistics for Psychological Science
Genetic Algorithm and Their Applications to Scheduling
PSY 626: Bayesian Statistics for Psychological Science
Evolution of Populations
Design Issues Lecture Topic 6.
Presentation transcript:

Fitness effects of HIV mutations Lucy Crooks Theoretical Biology, ETH Zurich

Drug-resistance in HIV Combination therapy (3 drugs) most effective if resistance mutations have negative effects under some drugs How mutations interact to affect fitness also has an influence through recombination negative epistasis can accelerate evolution epistasis is also relevant for theories of the evolution of sexual reproduction

Aims estimate the fitness effects of HIV-1 mutations estimate interactions between these effects 17,000 sequences + fitness in 16 treatments

Data 400 positions 1,800 mutations 180,000 pairwise interactions complex mutational patterns

My Approach randomly split interaction terms into subsets fit a series of models each with main effects for all mutations remove terms with high p-values (t-test of coefficient) repeat until few enough interactions to fit into one model GLM with variance  mean p-value cut-off = 0.4 significance tested by change in deviance (p>0.05)

Approach (2) fit remaining terms into one model sequentially remove sets of terms with highest p-values repeat until only significant terms remain (p>0.05)

Technical details each model run as separate job fitting done in R with model matrix generated in perl method is iterative weighted least squares using QR decomposition (calls fortran routine dqrls) 1 processor, exclusive node use CPU time = 5 hours

Preliminary results fitness effects of mutations in absence of drugs

Preliminary results (2) epistasis in the absence of drugs

Outlook simplify the model test robustness of subset approach repeat analysis for 15 drug treatments find funding!