Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.

Slides:



Advertisements
Similar presentations
Applications of genome sequencing projects 1) Molecular Medicine 2) Energy sources and environmental applications 3) Risk assessment 4) Bioarchaeology,
Advertisements

applications of genome sequencing projects
What is an association study? Define linkage disequilibrium
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
Polymorphisms: Clinical Implications By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of Medicine, KSU.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
MALD Mapping by Admixture Linkage Disequilibrium.
Genomics, Cancers & Infectious Diseases Qunyuan Zhang Division of Statistical Genomics Washington University School of Medicine.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Office hours Wednesday 3-4pm 304A Stanley Hall. Fig Association mapping (qualitative)
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Restriction Fragment Length Polymorphisms (RFLPs) By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of.
Gene expression array and SNP array
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Next-Generation Sequencing
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
CS177 Lecture 10 SNPs and Human Genetic Variation
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Gene Hunting: Linkage and Association
Announcements: Proposal resubmission deadline 4/23 (Thursday).
Genome-Wide Association Study (GWAS)
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Analysis of Next Generation Sequence Data BIOST /06/2015.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
동물 분자 유전체 연구의 최신 동향 National Institute of Animal Science Animal Genomics & Bioinformatics 정호영
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Genome-Wides Association Studies (GWAS) Veryan Codd.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Single Nucleotide Polymorphisms (SNPs
Of Sea Urchins, Birds and Men
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
Recombination (Crossing Over)
Patterns of Linkage Disequilibrium in the Human Genome
Linking Genetic Variation to Important Phenotypes
EDEXCEL GCSE BIOLOGY GENETICS Part 2
BF528 - Whole Genome Sequencing and Genomic Variation
Single nucleotide polymorphism array analysis can distinguish different genetic mechanisms that lead to loss of heterozygosity (LOH). Single nucleotide.
Presentation transcript:

Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment and Assembly Applications: structural changes, GWAS

The chromosome

SNP Variations in DNA sequence. Single Nucleotide Polymorphism (SNP) --- a single letter change in the DNA. Common SNPs occur every few hundred bases. Each form is called an “allele”. Almost all SNPs have only two alleles. Allele frequencies are often different between ethnic groups. pedia/commons/thumb/2/2e/Dn a-SNP.svg/180px-Dna- SNP.svg.png

Correlations between SNPs Why measure the SNP alleles? crossing_over.gif DNA change in two ways during evolution: Point mutation  SNPs Recombination This happens in large segments.  Alleles of adjacent SNPs are highly dependent. Haplotype: A group of alleles linked closely enough to be inherited mostly as a unit.

Why SNP? html.en Figure 1: This diagram shows two ancestral chromosomes being scrambled through recombination over many generations to yield different descendant chromosomes. If a genetic variant marked by the A on the ancestral chromosome increases the risk of a particular disease, the two individuals in the current generation who inherit that part of the ancestral chromosome will be at increased risk. Adjacent to the variant marked by the A are many SNPs that can be used to identify the location of the variant.

Why SNP? Nature Genetics 26, (2000) Figure 1. Schematic model of trait aetiology. The phenotype under study, Ph, is influenced by diverse genetic, environmental and cultural factors (with interactions indicated in simplified form). Genetic factors may include many loci of small or large effect, G Pi, and polygenic background. Marker genotypes, Gx, are near to (and hopefully correlated with) genetic factor, G p, that affects the phenotype. Genetic epidemiology tries to correlate G x with Ph to localize G p. Above the diagram, the horizontal lines represent different copies of a chromosome; vertical hash marks show marker loci in and around the gene, G p, affecting the trait. The red P i are the chromosomal locations of aetiologically relevant variants, relative to Ph. SNPs The gene deciding pheonotype

SNP array The SNP array Affymetrix.com

SNP array The SNP array Affymetrix.com 40 probes per SNP (20 for forward strand and 20 for reverse strand.) PM/MM strategy. Data summary (generating AA/AB/BB calls) omitted here.

SNP array Genotype calls Association analysis Linkage analysis Loss of Heterozygosity Signal strength Copy number abberation

CNA --- Background Copy Number Aberration (CNA): A form of chromosomal aberration Deviation from the regular 2 copies for some segments of the chromosomes One of the key characteristics of cancer CNA in cancer: Reduce the copy number of tumor-suppressor genes Increase the copy number of oncogenes Possibly related to metastasis

CNA --- the statistician’s task High density arrays allow us to identify “focused CNA”: copy number change in small DNA segments. With the high per-probeset noise, how to achieve high sensitivity AND specificity?

CNA – maximizing sensitivity/specificity Two approaches that complement each other:  Reducing noise at the single probeset level: Based on dose-response (Huang et al., 2006) Based on sequence properties (Nannya et al., 2005)  Segmentation methods. Smoothing; Hidden Markov Model-based methods; Circular Binary Segmentation … …

HMM data segmentation Fridlyand et al. Journal of Multivariate Analysis, June 2004, V. 90, pp Amplified Normal Deleted

Forward-backword fragment assembling

Some example: Top: model cell line, 3 copy segment in chromosome 9 Bottom: Cancer sample

Keith W. Brown and Karim T.A. Malik, 2001, Expert Reviews in Molecular Medicine LOH Loss of Heterozygosity (LOH) Happens in segments of DNA.

Discov Med Jul;12(62): LOH On SNP array, LOH will yield identical calls (AA or BB, rather than AB) for a number of consecutive SNPs.

GWAS © Pasieka, Science Photo Libraryhttp://

GWAS

Nature Genetics 41, (2009) GWAS Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer

DNA sequencing

Background

When a reference genome is available --- Alignment Can rely on existing reference genome as a blue print. Align the short reads onto the reference genome. Need a few fold coverage to cover most regions. Sequence a whole new genome? --- Assembly Overlaps are required to construct the genome. The reads are short  need ~30 fold coverage. If 3G data per run, need 30 runs for a new genome similar to human size. Alignment and Assembly

Hash table-based alignment. Similar to BLAST in principle. (1) Find potential locations: (2) Local alignment.

Alignment and Assembly From read to graph:

Alignment and Assembly

de Bruijn graph assembly Red: read error.

Alignment and Assembly de Bruijn graph assembly

Alignment and Assembly de Bruijn graph assembly

Whole gnome/exome/transcriptome sequencing

Genomics Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations) Could be associated with disease: Rare variants (burden testing by collapsing by gene) De novo mutations (need family tree) Rare Mendelian disorders Structural variants in cancer

Identification of translocations from discordant paired-end reads. Cancer Genetics 206 (2014) 432e440 Structural changes

CNV by depth of coverage Cancer Genetics 206 (2014) 432e440 Structural changes

Cancer Genetics 206 (2014) 432e440 Structural changes

Genotype calling

Medical Genomics Nature Reviews Genetics 11, 415 Example: Extreme-case sequencing to find rare variants associated with a disease.

GWAS