Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.

Slides:



Advertisements
Similar presentations
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Efficient Algorithms for Imputation of Missing SNP Genotype Data A.Mihajlović, V. Milutinović,
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick.
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Classification and risk prediction
Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome.
Population structure identification BNFO 602 Roshan.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
PCA, population structure, and K-means clustering BNFO 601.
BNFO 602 Lecture 1 Usman Roshan.
Genome-wide association studies BNFO 601 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
CSE182-L17 Clustering Population Genetics: Basics.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
BNFO 602 Lecture 2 Usman Roshan. Bioinformatics problems Sequence alignment: oldest and still actively studied Genome-wide association studies: new problem,
Genome-wide association studies Usman Roshan. Recap Single nucleotide polymorphism Genome wide association studies –Relative risk, odds risk (or odds.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Genotype Susceptibility And Integrated Risk Factors for Complex Diseases Weidong Mao Dumitru Brinza Nisar Hundewale Stefan Gremalshi Alexander Zelikovsky.
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics.
Single Nucleotide Polymorphisms Mrs. Stewart Medical Interventions Central Magnet School.
Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Informative SNP Selection Based on Multiple Linear Regression
CATALYST Recall and Review: – What are chromosomes? – What are genes? – What are alleles? How do these terms relate to DNA? How do these terms relate to.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Genotype Calling Jackson Pang Digvijay Singh Electrical Engineering, UCLA.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Using a Single Nucleotide Polymorphism to Predict Bitter Tasting Ability Lab Overview.
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Using a Single Nucleotide Polymorphism to Predict Bitter Tasting Ability Lab Overview.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Genome-wide association studies
Schematic of the single variant polymorphism (SNP) genotyping assay.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
Notes: Human Genome (Right side page)
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Chapter 14 Human Heredity
Clustering Usman Roshan.
Evolution and Populations –Essential Questions p
Basic machine learning background with Python scikit-learn
Example of a common SNP in dogs
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
By Michael Fraczek and Caden Boyer
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
CATALYST Recall and Review: How do these terms relate to DNA?
Pedigrees A Pedigree allows you to trace an inherited (genetic) disease through a family. The pattern of a pedigree helps determine: If the disease is.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Clustering Usman Roshan CS 675.
Inheritance & Variance Traits Vocabulary
Presentation transcript:

Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least 1% of the population to be a SNP. Occur every 100 to 300 bases along the 3 billion-base human genome. Many have no effect on cell function but some could affect disease risk and drug response.

Toy example

SNPs on the chromosome

Bi-allelic SNPs Most SNPs have one of two nucleotides at a given position For example: –A/G denotes the varying nucleotide as either A or G. We call each of these an allele –Most SNPs have two alleles (bi-allelic)

SNP genotype We inherit two copies of each chromosome (one from each parent) For a given SNP the genotype defines the type of alleles we carry Example: for the SNP A/G one’s genotype may be –AA if both copies of the chromosome have A –GG if both copies of the chromosome have G –AG or GA if one copy has A and the other has G –The first two cases are called homozygous and latter two are heterozygous

SNP genotyping

Real SNPs SNP consortium: snp.cshl.org SNPedia:

Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick random humans with and without cancer (say breast cancer) –Perform SNP genotyping –Look for associated SNPs –Also called genome-wide association study

Case-control example Study of 100 people: –Case: 50 subjects with cancer –Control: 50 subjects without cancer Count number of alleles and form a contingency table #Allele1#Allele2 Case1090 Control298

Effect of population structure on genome-wide association studies Suppose our sample is drawn from a population of two groups, I and II Assume that group I has a majority of allele type I and group II has mostly the second allele. Further assume that most case subjects belong to group I and most control to group II This leads to the false association that the major allele is associated with the disease

Effect of population structure on genome-wide association studies We can correct this effect if case and control are equally sampled from all sub-populations To do this we need to know the population structure

Population structure prediction Treated as an unsupervised learning problem (i.e. clustering)

Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective is to find C 1 and C 2 that minimize where m i is the mean of class C i

K-means algorithm for two clusters Input: Algorithm: 1.Initialize: assign x i to C 1 or C 2 with equal probability and compute means: 2.Recompute clusters: assign x i to C 1 if ||x i -m 1 ||<||x i -m 2 ||, otherwise assign to C 2 3.Recompute means m 1 and m 2 4.Compute objective 5.Compute objective of new clustering. If difference is smaller than then stop, otherwise go to step 2.

K-means Is it guaranteed to find the clustering which optimizes the objective? It is guaranteed to find a local optimal We can prove that the objective decreases with subsequence iterations

Proof sketch of convergence of k-means Justification of first inequality: by assigning x j to the closest mean the objective decreases or stays the same Justification of second inequality: for a given cluster its mean minimizes squared error loss