Genotype Calling Jackson Pang Digvijay Singh Electrical Engineering, UCLA.

Slides:



Advertisements
Similar presentations
Microarray Technique, Analysis, and Applications in Dermatology Jennifer Villaseñor-Park 1 and Alex G Ortega-Loayza 2 1 Department of Dermatology, University.
Advertisements

Lecture 2 Strachan and Read Chapter 13
Applications of genome sequencing projects 1) Molecular Medicine 2) Energy sources and environmental applications 3) Risk assessment 4) Bioarchaeology,
applications of genome sequencing projects
Applications of HGP Genetic testing Forensics. Testing for a pathogenic mutation in a certain gene in an individual that indicate a persons risk of developing.
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Which Phenotypes Can be Predicted from a Genome Wide Scan of Single Nucleotide Polymorphisms (SNPs): Ethnicity vs. Breast Cancer Mohsen Hajiloo, Russell.
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
INTRODUCTION Genome-wide association studies are now feasible. Measuring allele frequencies of pools of cases and controls, instead of between individuals,
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Gene Order Polymorphism in Yeast Dina Faddah Vision Lab Meeting- February 18, 2005.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Evaluating Performance for Data Mining Techniques
Understanding Genetics of Schizophrenia
Genetic and Molecular Epidemiology Lecture III: Molecular and Genetic Measures Jan 19, 2009 Joe Wiemels HD 274 (Mission Bay)
Gene expression & Clustering (Chapter 10)
Chapter 5: Hybridisation & applications
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Data Type 1: Microarrays
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Microarray Technology
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Microarrays and Their Uses Brad Windle, Ph.D
CS177 Lecture 10 SNPs and Human Genetic Variation
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong.
How are we different? …at the DNA level.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Polysubstance Abuse–Vulnerability Genes: Genome Scans for Association, Using 1,004 Subjects and 1,494 Single-Nucleotide Polymorphisms Uhl, Liu, Walther,
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

Genotype Calling Matt Schuerman. Biological Problem How do we know an individual’s SNP values (genotype)? Each SNP can have two values (A/B) Each individual.
The International Consortium. The International HapMap Project.
Advances in Genetic Technology Class Notes Make sure you study this along with our first PowerPoint on Transgenics and your class Article notes.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Introduction to Oligonucleotide Microarray Technology
Unit 1 – Living Cells.  The study of the human genome  - involves sequencing DNA nucleotides  - and relating this to gene functions  In 2003, the.
Notes: Human Genome (Right side page)
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
“TaqMan genotyping Assay’’
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Approximation Algorithms for the Selection of Robust Tag SNPs
SNPs and CNPs By: David Wendel.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Genotype Calling Jackson Pang Digvijay Singh Electrical Engineering, UCLA

Outline Introduction – Single Nucleotide Polymorphism – SNP Genotyping – SNP Microarrays The Problem Our Solutions Results Summary

Introduction: Single Nucleotide Polymorphism (SNP) A variation in the DNA sequence Mutation of a single base pair – Represents over 80% of human genetic variation Has far-reaching effects – Disease Risk – Reaction to Chemicals – Response to Vaccines – Personalized Medicine Picture from:

Introduction: SNP Genotyping Also termed Genotype Calling – Detection of SNPs – The SNPs are genotyped or “called” Many methods exist – Sequencing – Enzyme-based Digest parts of genome – Hybridization Use of probe data Picture from: Olivier M. The Invader Assay for SNP Genotyping,2005.

Introduction: SNP Microarrays Hybridization-based – Mix control and case population DNA samples in a certain ratios SNP Microarrays – Lots of probes on a chip – Simultaneous inspection of thousands of SNPs – We use the Affymetrix 100k chip’s data from the HapMap website Picture from:

The Problem Introduction The Problem – Genotyping Calling from Affymetrix Data – Clustering Our Solutions Results Summary

The Problem: Genotyping Calling from Affymetrix Data SNP has alleles A and B Affymetrix Microarray’s probe data – Abundance measure for alleles A and B Two chromosomes – Each individual is “called” as AA, AB or BB Facilitates association – Calculates allele freq. in cases and controls for target SNPs DNA Sample for Individuals SNP Microarray (Affymetrix 10 0k Chip) Abundance Measures of Allel e A and B for each Individual Genotype Call each SNP for ev ery Individual

The Problem: Clustering Processed Probe Data from Affymetrix 100k Microarray for certain SNPs The x and y axes represent abundance measures for Alleles A and B The data is for 270 individuals from a population

Our Solutions Introduction The Problem Our Solutions – Sector-sweep Clustering – K-Means Clustering Results Summary

Our Solutions: Sector-Sweep Clustering Basic intuition is that all the AB genotypes will lie close to a 45 degree slope i.e. similar abundance values for A and B The algorithm uses a sector or slice which sweeps from the 45 degree line to 0 degrees An upper and lower slope threshold are determined by the sweeping sector. It takes 2 parameters: – The width of the slice (5 degrees) – The density drop (0.75) Outputs – Upper and lower slope thresholds The points above the upper threshold are classified as B The points below the threshold are marked as A All other points are marked as AB

Our Solutions: K-means Clustering Basic intuition is to partition a set of points into K groups such that the sum of squares from points to the assigned cluster centers is minimized Takes in 3 parameters – The point vector – # centers, # starting sets, #iterations Outputs – An array of coordinates for centers – Membership array – Withiness Classification of centers as genotypes 1.The classes are relative slope ratios of the centers Slope ratio of cluster 1 : center1(y) / center1 (x) Highest = B; Lowest = A, Middle = H (AB) 2.Dividing the quadrant into 3 equal sections and classifying based on the cluster’s center location Worse than slope ratios

Results Introduction The Problem Our Solutions Results – Implementation Details – Sector-Sweep – K-means Summary

Results: Implementation Details Data – Affymetrix 100k Chip data used from HapMap – Data for 270 individuals Coding Paradigm – All coding done in R – Oligo and Bioconductor packages for data processing Goodness of Solution – Comparison to Hapmap Genotyped Data Affymetrix 100k Data from H apMap Website Process and Amplify data usi ng Oligo and Bioconductor R Code for Genotyping Algori thms Compare Genotyping Result s with HapMap

Results: Sector-Sweep Performance

Results: Genotype Calling based on K- means with Relative Slope Classifier

Results: K-means Performance over all HapMap SNPs Relative Slope Classifier

Summary SNPs – Contribute to disease and reaction to drugs – Characterize 80% of human genetic variations Genotype Calling – Genotyping or “calling” of SNPs – Microarray technology being employed (Affymetrix Chip) A Clustering Problem – Need to cluster microarray data to indentify SNP alleles Solutions and Results – Sector-sweep Clustering – K-means Clustering – Comparison with HapMap Genotyped data

Questions? Thanks for your patience!