Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.

Slides:

Advertisements

Similar presentations

Functional Genomics with Next-Generation Sequencing

Advertisements

Methods to read out regulatory functions

Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.

Manolis Kellis: Research synopsis Brief overview 1 slide each vignette Why biology in a computer science group? Big biological questions: 1.Interpreting.

Peter Tsai Bioinformatics Institute, University of Auckland

Data Analysis for High-Throughput Sequencing

Canadian Bioinformatics Workshops

Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Transcriptomics Jim Noonan GENE 760.

High Throughput Sequencing

mRNA-Seq: methods and applications

Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.

Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.

1 1 - Lectures.GersteinLab.org Overview of ENCODE Elements Mark Gerstein for the "ENCODE TEAM"

Whole Exome Sequencing for Variant Discovery and Prioritisation

Nuevas perspectivas en análisis genomico: implicaciones del proyecto ENCODE 1 Rory Johnson Bioinformatics and Genomics Centre for Genomic Regulation AEEH.

P300 Marks Active Enhancers Ruijuan LiChao HeRui Fu.

Outline  Nucleosome distribution  Chromatin modification patterns  Mechanisms of chromatin modifications  Biological roles.

Current Topics in Genomics and Epigenomics – Lecture 2.

Todd J. Treangen, Steven L. Salzberg

An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA

RNAseq analyses -- methods

Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.

Next Generation DNA Sequencing

TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.

I519 Introduction to Bioinformatics, Fall, 2012

The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.

Sackler Medical School

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.

Differential Principal Component Analysis (dPCA) for ChIP-seq

Introduction to RNAseq

Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.

Thoughts on ENCODE Annotations Mark Gerstein. Simplified Comprehensive (published annotation, mostly in '12 & '14 rollouts)

TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.

Overview of ENCODE Elements

No reference available

Jason Ernst Broad Institute of MIT and Harvard

Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.

UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.

Accessing and visualizing genomics data

Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:

Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.

Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.

Canadian Bioinformatics Workshops

Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.

Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.

Considerations for multi-omics data integration Michael Tress CNIO,

Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.

The Chromatin State The scientific quest to decipher the histone code Lior Zimmerman.

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on

Cancer Genomics Core Lab

Using RNA-seq data to improve gene annotation

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD

Many Sample Size and Power Calculators Exist On-Line

Jason Ernst Joint work with Pouya Kheradpour, Luke Ward

Jason Ernst Joint work with Pouya Kheradpour, Luke Ward

High-Resolution Profiling of Histone Methylations in the Human Genome

Genetics and Epigenetics of the Skin Meet Deep Sequence

High-Resolution Profiling of Histone Methylations in the Human Genome

Alex M. Plocik, Brenton R. Graveley Molecular Cell

ChIP-seq Robert J. Trumbly

Anh Pham Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease.

By Wenfei Jin Presenter: Peter Kyesmu

Sequence Analysis - RNA-Seq 2

Integrative analysis of 111 reference human epigenomes

Presentation transcript:

Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics

Outline Personal genome sequencing Rationale: understanding human disease Variant discovery and interpretation Genome reduction strategies (exome sequencing) Functional analysis of biological systems using sequencing Transcriptome analysis: RNA-seq Regulatory element discovery: ChIP-seq Chromatin state profiling and the ‘histone code’ Large-scale efforts: ENCODE and the NIH Epigenome Roadmap

Whole genome sequencing: 1000 Genomes

Nature 467:1061 (2010)

The genetic architecture of human disease State, MW. Neuron 68:254 (2010)

Cooper and Shendure, Nat Rev Genet 12:628 (2011) Challenge: Interpreting genetic variation

Protein-sequence based DNA-sequence based Tools for identifying rare damaging mutations

Damages protein Conserved Cooper and Shendure, Nat Rev Genet 12:628 (2011) All humans have rare damaging mutations

Genome reduction: Exome sequencing Bamshad et al. Nat Rev Genet 12:745 (2011)

De novo mutation Likely to have functional effect Recurrence in independent affected individuals Absence in controls Reveal critical pathways in disease Screen unrelated trios for recurrence Finding disease-causing rare variants by exome sequencing

Sanders et al., Nature 485:237 (2012)

Outline Personal genome sequencing Rationale: understanding human disease Variant discovery and interpretation Genome reduction strategies (exome sequencing) Challenges to de novo genome assembly using short reads Functional analysis of biological systems using sequencing Transcriptome analysis: RNA-seq Regulatory element discovery: ChIP-seq Chromatin state profiling and the ‘histone code’ Large-scale efforts: ENCODE and the NIH Epigenome Roadmap

mRNA-seq workflow Martin and Wang Nat Rev Genet 12:671 (2011) Wang et al. Nat Rev Genet 10:57 (2009)

Gene expression profiling by massively parallel RNA sequencing (RNA-seq)

Mapping RNA-seq reads and quantifying transcripts

Quantifying gene expression by RNA-seq Use existing gene annotation: Align to genome plus annotated splices Depends on high-quality gene annotation Which annotation to use: RefSeq, GENCODE, UCSC? Isoform quantification? Identifying novel transcripts? Reference-guided alignments: Align to genome sequence Infer splice events from reads Allows transcriptome analyses of genomes with poor gene annotation De novo transcript assembly: Assemble transcripts directly from reads Allows transcriptome analyses of species without reference genomes

Normalization methods: Reads per kilobase of feature length per million mapped reads (RPKM) RNA-seq reads mapped to reference What is a “feature?” What about genomes with poor genome annotation? What about species with no sequenced genome? For a detailed comparison of normalization methods, see Bullard et al. BMC Bioinformatics 11:94.

Wang et al. Nat Rev Genet 10:57 (2009) What depth of sequencing is required to characterize a transcriptome?

Considerations Gene length: Long genes are detected before short genes Expression level: High expressors are detected before low expressors Complexity of the transcriptome: Tissues with many cell types require more sequencing Feature type Composite gene models Common isoforms Rare isoforms Detection vs. quantification Obtaining confident expression level estimates (e.g., “stable” RPKMs) requires greater coverage

Pervasive alternative splicing in humans Wang et al. Nature 456:470 (2008)

Map reads to genome Map remaining reads to known splice junctions Composite gene model approach Requires good gene models Isoforms are ignored Which annotation to use: RefSeq, GENCODE, UCSC?

Strategies for transcript assembly Garber et al. Nat Methods 8:469 (2011)

ChIP-seq General transcription machinery Transcription factors Modifications to histone tails Methylated DNA

Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010) Rationale: identifying regulatory elements in genomes

ChIP-seq peak calling ChIP-seq is an enrichment method Requires a statistical framework for determining the significance of enrichment ChIP-seq ‘peaks’ are regions of enriched read density relative to an input control Input = sonicated chromatin collected prior to immunoprecipitation

There are many ChIP-seq peak calling methods Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)

Zhou et al. Nat Rev Genet 12:7 (2011) The histone code

Mapping and analysis of chromatin state dynamics in nine human cell types Ernst et al., Nature 473:43 (2011) Cell types: H1 ESC K562 (erythrocyte derived) GM12878 (B-lymphoblastoid) HepG2 (hepatocellular carcinoma) HUVEC (umbilical vein endothelium) HSMM (skeletal muscle myoblasts) NHLF (lung fibroblast) NHEK (epidermal keratinocytes) HMEC (mammary epithelium) Marks: H3K4me3 (promoter/enhancer) H3K4me2 (promoter/enhancer) H3K4me1 (enhancer) H3K9ac (promoter/enhancer) H3K27ac (promoter/enhancer) H3K36me3 (transcribed regions) H4K20me1 (transcribed regions) H3K27me3 (Polycomb repression) CTCF

Mapping and analysis of chromatin state dynamics in nine human cell types Ernst et al., Nature 473:43 (2011)

Chromatin state dynamics at WLS Ernst et al., Nature 473:43 (2011)

Annotation based on nearest TSS Functions associated with putative promoter and enhancer states

ChIP-seq: enhancer identification in vivo p300 = enhancer-associated factor Visel et al. Nature 457:854 (2009) p300 binding = ~90% predictive of enhancer activity

Myers, PLoS Biol 9:e (2011) Systematic experimental annotation of regulatory functions

The ENCODE Project

The NIH Roadmap Epigenomics Project

Myers, PLoS Biol 9:e (2011) ENCODE cell lines

ENCODE Project data access

Genome Browser interface and data types Genome Viewer Categories of data: displayed as tracks Discrete intervals (genes) or continuous (transcription) Hyperlinks and pulldown tabs for individual tracks Go to track description page Hide or show data in genome viewer Some tracks include multiple datasets (‘subtracks’) Go to track description page to select

ENCODE Transcription track Display optionsSubtracks

Conclusions Personal genomics is becoming a reality Genome sequencing will be a routine diagnostic tool $5,000 to sequence single genome; current cost for clinical resequencing of single genes Your genome will be sequenced Long-read sequencing will solve de novo assembly issues Data analysis and interpretation RNA-seq and ChIP-seq Identifying genes and annotating regulatory function within and among genomes Computational issues: data normalization, peak calling, differential expression and binding Large-scale studies revealing regulatory architecture of human & model genomes