Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.

Similar presentations


Presentation on theme: "Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment."— Presentation transcript:

1 Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment and Assembly Applications: structural changes, GWAS

2 The chromosome

3 SNP Variations in DNA sequence. Single Nucleotide Polymorphism (SNP) --- a single letter change in the DNA. Common SNPs occur every few hundred bases. Each form is called an “allele”. Almost all SNPs have only two alleles. Allele frequencies are often different between ethnic groups. http://upload.wikimedia.org/wiki pedia/commons/thumb/2/2e/Dn a-SNP.svg/180px-Dna- SNP.svg.png

4 Correlations between SNPs Why measure the SNP alleles? http://www.evolutionpages.com/images/ crossing_over.gif DNA change in two ways during evolution: Point mutation  SNPs Recombination This happens in large segments.  Alleles of adjacent SNPs are highly dependent. Haplotype: A group of alleles linked closely enough to be inherited mostly as a unit.

5 Why SNP? http://www.hapmap.org/originhaplotype. html.en Figure 1: This diagram shows two ancestral chromosomes being scrambled through recombination over many generations to yield different descendant chromosomes. If a genetic variant marked by the A on the ancestral chromosome increases the risk of a particular disease, the two individuals in the current generation who inherit that part of the ancestral chromosome will be at increased risk. Adjacent to the variant marked by the A are many SNPs that can be used to identify the location of the variant.

6 Why SNP? Nature Genetics 26, 151 - 157 (2000) Figure 1. Schematic model of trait aetiology. The phenotype under study, Ph, is influenced by diverse genetic, environmental and cultural factors (with interactions indicated in simplified form). Genetic factors may include many loci of small or large effect, G Pi, and polygenic background. Marker genotypes, Gx, are near to (and hopefully correlated with) genetic factor, G p, that affects the phenotype. Genetic epidemiology tries to correlate G x with Ph to localize G p. Above the diagram, the horizontal lines represent different copies of a chromosome; vertical hash marks show marker loci in and around the gene, G p, affecting the trait. The red P i are the chromosomal locations of aetiologically relevant variants, relative to Ph. SNPs The gene deciding pheonotype

7 SNP array The SNP array Affymetrix.com

8 SNP array The SNP array Affymetrix.com 40 probes per SNP (20 for forward strand and 20 for reverse strand.) PM/MM strategy. Data summary (generating AA/AB/BB calls) omitted here.

9 SNP array Genotype calls Association analysis Linkage analysis Loss of Heterozygosity Signal strength Copy number abberation

10 CNA --- Background Copy Number Aberration (CNA): A form of chromosomal aberration Deviation from the regular 2 copies for some segments of the chromosomes One of the key characteristics of cancer CNA in cancer: Reduce the copy number of tumor-suppressor genes Increase the copy number of oncogenes Possibly related to metastasis

11 CNA --- the statistician’s task High density arrays allow us to identify “focused CNA”: copy number change in small DNA segments. With the high per-probeset noise, how to achieve high sensitivity AND specificity?

12 CNA – maximizing sensitivity/specificity Two approaches that complement each other:  Reducing noise at the single probeset level: Based on dose-response (Huang et al., 2006) Based on sequence properties (Nannya et al., 2005)  Segmentation methods. Smoothing; Hidden Markov Model-based methods; Circular Binary Segmentation … …

13 HMM data segmentation Fridlyand et al. Journal of Multivariate Analysis, June 2004, V. 90, pp. 132-153 Amplified Normal Deleted

14 Forward-backword fragment assembling

15 Some example: Top: model cell line, 3 copy segment in chromosome 9 Bottom: Cancer sample

16 Keith W. Brown and Karim T.A. Malik, 2001, Expert Reviews in Molecular Medicine LOH Loss of Heterozygosity (LOH) Happens in segments of DNA.

17 Discov Med. 2011 Jul;12(62):25-32. LOH On SNP array, LOH will yield identical calls (AA or BB, rather than AB) for a number of consecutive SNPs.

18 GWAS © Pasieka, Science Photo Libraryhttp://www.mpg.de/10680/Modern_psychiatry

19 GWAS

20 Nature Genetics 41, 986 - 990 (2009) GWAS Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer

21 DNA sequencing

22 Background

23

24

25

26 When a reference genome is available --- Alignment Can rely on existing reference genome as a blue print. Align the short reads onto the reference genome. Need a few fold coverage to cover most regions. Sequence a whole new genome? --- Assembly Overlaps are required to construct the genome. The reads are short  need ~30 fold coverage. If 3G data per run, need 30 runs for a new genome similar to human size. Alignment and Assembly

27 Hash table-based alignment. Similar to BLAST in principle. (1) Find potential locations: (2) Local alignment.

28 Alignment and Assembly From read to graph:

29 Alignment and Assembly

30 de Bruijn graph assembly Red: read error.

31 Alignment and Assembly de Bruijn graph assembly

32 Alignment and Assembly de Bruijn graph assembly

33 Whole gnome/exome/transcriptome sequencing

34 Genomics Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations) Could be associated with disease: Rare variants (burden testing by collapsing by gene) De novo mutations (need family tree) Rare Mendelian disorders Structural variants in cancer

35 Identification of translocations from discordant paired-end reads. Cancer Genetics 206 (2014) 432e440 Structural changes

36 CNV by depth of coverage Cancer Genetics 206 (2014) 432e440 Structural changes

37 Cancer Genetics 206 (2014) 432e440 Structural changes

38 http://www.geneious.com/features/sequence-analysis-annotation-prediction Genotype calling

39 Medical Genomics Nature Reviews Genetics 11, 415 Example: Extreme-case sequencing to find rare variants associated with a disease.

40 GWAS

41


Download ppt "Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment."

Similar presentations


Ads by Google