Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.

Similar presentations


Presentation on theme: "Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics."— Presentation transcript:

1 Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics

2 Outline Personal genome sequencing Rationale: understanding human disease Variant discovery and interpretation Genome reduction strategies (exome sequencing) Functional analysis of biological systems using sequencing Transcriptome analysis: RNA-seq Regulatory element discovery: ChIP-seq Chromatin state profiling and the ‘histone code’ Large-scale efforts: ENCODE and the NIH Epigenome Roadmap

3 Whole genome sequencing: 1000 Genomes

4 Nature 467:1061 (2010)

5 The genetic architecture of human disease State, MW. Neuron 68:254 (2010)

6 Cooper and Shendure, Nat Rev Genet 12:628 (2011) Challenge: Interpreting genetic variation

7 Protein-sequence based DNA-sequence based Tools for identifying rare damaging mutations

8 Damages protein Conserved Cooper and Shendure, Nat Rev Genet 12:628 (2011) All humans have rare damaging mutations

9 Genome reduction: Exome sequencing Bamshad et al. Nat Rev Genet 12:745 (2011)

10 De novo mutation Likely to have functional effect Recurrence in independent affected individuals Absence in controls Reveal critical pathways in disease Screen unrelated trios for recurrence Finding disease-causing rare variants by exome sequencing

11 Sanders et al., Nature 485:237 (2012)

12 Outline Personal genome sequencing Rationale: understanding human disease Variant discovery and interpretation Genome reduction strategies (exome sequencing) Challenges to de novo genome assembly using short reads Functional analysis of biological systems using sequencing Transcriptome analysis: RNA-seq Regulatory element discovery: ChIP-seq Chromatin state profiling and the ‘histone code’ Large-scale efforts: ENCODE and the NIH Epigenome Roadmap

13 mRNA-seq workflow Martin and Wang Nat Rev Genet 12:671 (2011) Wang et al. Nat Rev Genet 10:57 (2009)

14 Gene expression profiling by massively parallel RNA sequencing (RNA-seq)

15 Mapping RNA-seq reads and quantifying transcripts

16 Quantifying gene expression by RNA-seq Use existing gene annotation: Align to genome plus annotated splices Depends on high-quality gene annotation Which annotation to use: RefSeq, GENCODE, UCSC? Isoform quantification? Identifying novel transcripts? Reference-guided alignments: Align to genome sequence Infer splice events from reads Allows transcriptome analyses of genomes with poor gene annotation De novo transcript assembly: Assemble transcripts directly from reads Allows transcriptome analyses of species without reference genomes

17 Normalization methods: Reads per kilobase of feature length per million mapped reads (RPKM) RNA-seq reads mapped to reference What is a “feature?” What about genomes with poor genome annotation? What about species with no sequenced genome? For a detailed comparison of normalization methods, see Bullard et al. BMC Bioinformatics 11:94.

18 Wang et al. Nat Rev Genet 10:57 (2009) What depth of sequencing is required to characterize a transcriptome?

19 Considerations Gene length: Long genes are detected before short genes Expression level: High expressors are detected before low expressors Complexity of the transcriptome: Tissues with many cell types require more sequencing Feature type Composite gene models Common isoforms Rare isoforms Detection vs. quantification Obtaining confident expression level estimates (e.g., “stable” RPKMs) requires greater coverage

20 Pervasive alternative splicing in humans Wang et al. Nature 456:470 (2008)

21 Map reads to genome Map remaining reads to known splice junctions Composite gene model approach Requires good gene models Isoforms are ignored Which annotation to use: RefSeq, GENCODE, UCSC?

22 Strategies for transcript assembly Garber et al. Nat Methods 8:469 (2011)

23 ChIP-seq General transcription machinery Transcription factors Modifications to histone tails Methylated DNA

24 Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010) Rationale: identifying regulatory elements in genomes

25 ChIP-seq peak calling ChIP-seq is an enrichment method Requires a statistical framework for determining the significance of enrichment ChIP-seq ‘peaks’ are regions of enriched read density relative to an input control Input = sonicated chromatin collected prior to immunoprecipitation

26 There are many ChIP-seq peak calling methods Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)

27 Zhou et al. Nat Rev Genet 12:7 (2011) The histone code

28 Mapping and analysis of chromatin state dynamics in nine human cell types Ernst et al., Nature 473:43 (2011) Cell types: H1 ESC K562 (erythrocyte derived) GM12878 (B-lymphoblastoid) HepG2 (hepatocellular carcinoma) HUVEC (umbilical vein endothelium) HSMM (skeletal muscle myoblasts) NHLF (lung fibroblast) NHEK (epidermal keratinocytes) HMEC (mammary epithelium) Marks: H3K4me3 (promoter/enhancer) H3K4me2 (promoter/enhancer) H3K4me1 (enhancer) H3K9ac (promoter/enhancer) H3K27ac (promoter/enhancer) H3K36me3 (transcribed regions) H4K20me1 (transcribed regions) H3K27me3 (Polycomb repression) CTCF

29 Mapping and analysis of chromatin state dynamics in nine human cell types Ernst et al., Nature 473:43 (2011)

30 Chromatin state dynamics at WLS Ernst et al., Nature 473:43 (2011)

31 Annotation based on nearest TSS Functions associated with putative promoter and enhancer states

32 ChIP-seq: enhancer identification in vivo p300 = enhancer-associated factor Visel et al. Nature 457:854 (2009) p300 binding = ~90% predictive of enhancer activity

33 Myers, PLoS Biol 9:e1001046 (2011) Systematic experimental annotation of regulatory functions

34 http://genome.ucsc.edu/ENCODE/ The ENCODE Project

35 http://www.roadmapepigenomics.org/ The NIH Roadmap Epigenomics Project

36 Myers, PLoS Biol 9:e1001046 (2011) ENCODE cell lines

37 http://genome.ucsc.edu/ENCODE/ ENCODE Project data access

38 Genome Browser interface and data types Genome Viewer Categories of data: displayed as tracks Discrete intervals (genes) or continuous (transcription) Hyperlinks and pulldown tabs for individual tracks Go to track description page Hide or show data in genome viewer Some tracks include multiple datasets (‘subtracks’) Go to track description page to select

39 ENCODE Transcription track Display optionsSubtracks

40 Conclusions Personal genomics is becoming a reality Genome sequencing will be a routine diagnostic tool $5,000 to sequence single genome; current cost for clinical resequencing of single genes Your genome will be sequenced Long-read sequencing will solve de novo assembly issues Data analysis and interpretation RNA-seq and ChIP-seq Identifying genes and annotating regulatory function within and among genomes Computational issues: data normalization, peak calling, differential expression and binding Large-scale studies revealing regulatory architecture of human & model genomes


Download ppt "Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics."

Similar presentations


Ads by Google