Functional Genomics with Next-Generation Sequencing

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
High throughput sequencing Barbera van Schaik
1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences.
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Sequencing Genomes 1) Map the genome 2) Prepare an AC library 3) Order the library 4) Subdivide each AC into lambda contigs 5) Subdivide each lambda into.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
Next-generation sequencing
Canadian Bioinformatics Workshops
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
9 Genomics and Beyond Brief Chapter Outline
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
High Throughput Sequencing
mRNA-Seq: methods and applications
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
CSE 6406: Bioinformatics Algorithms. Course Outline
Todd J. Treangen, Steven L. Salzberg
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
DNA Methylation mapping
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA
Finish up array applications Move on to proteomics Protein microarrays.
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Nozomu TAKAHASHI June 11th, 2012
The iPlant Collaborative
I519 Introduction to Bioinformatics, Fall, 2012
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Sackler Medical School
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
Introduction to RNAseq
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
Trends Biomedical Science
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Accessing and visualizing genomics data
Engineering magnetosomes to express novel proteins Which ones? Tweaking p18 Linker Deleting or replacing GFP Something else? TRZN Oxalate decarboxylases.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Engineering magnetosomes to express novel proteins Which ones? Tweaking p18 Linker Deleting or replacing GFP Something else? TRZN Oxalate decarboxylases.
How do eucaryotic gene activator proteins increase the rate of transcription initiation? 1.By activating directly on the transcription machinery. 2.By.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Genome-wide characteristics of sequence coverage by next-generation sequencing: how does this impact interpretation? Jen Taylor Bioinformatics Team CSIRO.
Methods in Cell Biology Cont. Sept. 24, Science Bomb 2 Unc-22: encodes a myofilament in C. elegans.
? ? Individual 1Individual 2 1. Questions This is a pedigree for a disease involving a mutation within an imprinted gene. The disease manifests only when.
ChIP-seq Downstream Analysis Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
High-throughput data used in bioinformatics
RNA Quantitation from RNAseq Data
Cancer Genomics Core Lab
Epigenetics 04/04/16.
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
7.2 Transcription & gene expression
Gene expression estimation from RNA-Seq data
Sequencing Data Analysis
Genome organization and Bioinformatics
AH Biology: Unit 1 Proteomics and Protein Structure 1
Volume 133, Issue 3, Pages (May 2008)
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Figure 1 SpliceSeq “splice graphs” for the 7 qRT-PCR tested genes
ChIP-seq Robert J. Trumbly
Sequencing Data Analysis
Presentation transcript:

Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

Capacity and Resolution Next generation sequencing Increasing capacity leads to increased resolution Eric Lander, Broad Institute CSIRO. INI Meeting July 2010 - Tutorial - Applications

How a Genome Works? Parts Description Comparisons Function? Interconnectedness? Comparisons Population - level Between genomes CSIRO. INI Meeting July 2010 - Tutorial - Applications

Application domains Reference genome No Reference Genome Partially sequenced UNsequenced “PUN Genomes” CSIRO. INI Meeting July 2010 - Tutorial - Applications

Impact of a Reference Genome Sequence Data Assembly Contigs Genome Alignment Read Density Characterisation CSIRO. INI Meeting July 2010 - Tutorial - Applications

Applications of Next Generation Sequencing Profiling of Variation Genetic variation Transcript variation Epigenetic variation Metagenomic variation Discovery Novel genomes Novel genes Novel transcripts Small / long non-coding RNA RNA Sequencing (RNASeq) Coding and non-coding transcript profiling Dynamic and Context dependent Epigenomics Genome-wide protein-DNA interactions, DNA modifications Heritable and reversible regulation of gene expression Today CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq Qualitative – transcript diversity Quantitative – transcript abundance Impact of NGS Observation of transcript complexity Transcript discovery Small / long non-coding RNA Analytical challenges Transcript complexity Compositional properties CSIRO. INI Meeting July 2010 - Tutorial - Applications

Reads per kilobase per million (RPKM) RNASeq Sample Total RNA PolyA RNA Small RNA Reference Analysis Mapping to Genome Digital “Counts” Reads per kilobase per million (RPKM) Transcript structure Secondary structure Targets or Products Library Construction PUN Assembly to Contigs Sequencing Base calling & QC CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq – Transcript Complexity Mapping : Reads with multiple locations Conserved domains ? Sequencing error ? Reads Spanning Exons Gapped alignments ? Erange Pipeline : Mortazavi et al., Nature Methods VOL.5 NO.7 JULY 2008 CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq – Compositional properties Depth of Sequence Sequence count ≈ Transcript Abundance Majority of the data can be dominated by a small number of highly abundant transcripts Ability to observe transcripts of smaller abundance is dependent upon sequence depth CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq – Compositional properties Sequence counts are a composition of a fixed number of total sequence reads Therefore they are sum-constrained and not independent Large variations in component numbers and sizes can produce artefacts True Reads RPKM CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq - Correspondence Good correspondence with : Expression Arrays Tiling Arrays qRT-PCR Range of up to 5 orders of magnitude Better detection of low abundance transcripts Greater power to detect Transcript sequence polymorphism Novel trans-splicing Paralogous genes Individual cell type expression CSIRO. INI Meeting July 2010 - Tutorial - Applications

Reference Genome - RNASeq CSIRO. INI Meeting July 2010 - Tutorial - Applications

Reference Genome - RNASeq Human Exome Number of exons targeted: ~180,000 (CCDS database) plus700+ miRNA(Sanger v13) 300+ ncRNA CSIRO. INI Meeting July 2010 - Tutorial - Applications

Epigenome Protein-DNA interactions [ChIPSeq] Methylation [MethylSeq] Nucleosome positioning Histone modification Transcription factor interactions Methylation [MethylSeq] Impact of NextGen Whole genome profiling Resolution Analytical challenges Systematic bias Unambiguous mapping Robust event calling Image : ClearScience CSIRO. INI Meeting July 2010 - Tutorial - Applications

ChIPSeq MNase Linker Digest Remove Nucleosomes Sequence & Align CSIRO. INI Meeting July 2010 - Tutorial - Applications

ChIPSeq MNase Digest Remove Nucleosomes Sequence & Align CSIRO. INI Meeting July 2010 - Tutorial - Applications

ChipSeq methods CisGenome ERANGE FindPeaks F-Seq GLITR MACS PeakSeq QuEST CSIRO. INI Meeting July 2010 - Tutorial - Applications Pepke et al., 2009

MethylSeq using Bisulfite conversion Cytosine Uracil Bisulfite conversion Thymine PCR 5-methylcytosine Cytosine Bisulfite conversion PCR CSIRO. INI Meeting July 2010 - Tutorial - Applications

Limited publications from BS-Seq Mammals Methylation predominant occurs at CpG site Several publications in human One publications in mouse Plants Methylation occurs at CG, CHH, CHG sites Two publications in arabidopsis H = A, G, T CSIRO. INI Meeting July 2010 - Tutorial - Applications

Problems of mapping BS-seq reads Reduced sequence complexity Cm methylated C Un-methylated Watson >>A Cm G T T C T C C A G T C>> Bisulfite conversion >>A Cm G T T T T T T A G T T>> >>A C G T T T T T T A G T T>> CSIRO. INI Meeting July 2010 - Tutorial - Applications

Problems of mapping BS-seq reads Increased search space Watson >> A Cm G T T C T C C A G T C >> Crick << T G Cm A A G A G G T C A G << BSW >> ACmGTTTTTTAGTT >> BSC << TGCmAAGAGGTTAG << Bisulfite conversion BSW >> ACmGTTTTTTAGTT >> BSWR << TG CAAAAAATCAA >> BSCR >> ACG TTCTCCAAGA >> BSC << TGCmAAGAGGTTAG << PCR CSIRO. INI Meeting July 2010 - Tutorial - Applications

ELAND Mapping reads to genome sequences Mapping reads to two converted genome sequences Cross match for reads mapping to multiple positions in converted genomes Mapping results were combined to generate methylation information Eland only allows 2 mismatches. Lister et al. Cell (2008) CSIRO. INI Meeting July 2010 - Tutorial - Applications

BSMAP Based on HASH table seeding algorithm Xi and Li BMC Bioinformatics (2009) CSIRO. INI Meeting July 2010 - Tutorial - Applications

Re-mapping of Lister’s data using BSMAP Raw Reads Methods Uniquely Mapped Reads Unique and Nonclonal Reads Unique and nonclonal reads% 144,704,372 Eland 55,805,931 39,113,599 27.03% BSMAP 67,975,425 48,498,687 35.52% Lister et al. Cell (2008) CSIRO. INI Meeting July 2010 - Tutorial - Applications

Methylation pattern throughout chromosomes CHG Crick Watson Position Arabidopsis Chromosome 3 CG CHH Methylation Level / 50Kb 1.0 0.80 0.20 CSIRO. INI Meeting July 2010 - Tutorial - Applications

Partially / Unsequenced Genomes Options for dealing with partial or unsequenced genomes Wait for or generate the genome sequence ‘Borrow’ a reference genome from a phylogenetic neighbour Take a deep breath and ‘do denovo’ Denovo Genome Denovo Transcriptome Gene Annotation DNA or RNA Sequence Data Genetic Variation Partial Assembly Transcript Variation Partial Sequence Database Non-coding RNA CSIRO. INI Meeting July 2010 - Tutorial - Applications

Plant Genomes – Haploid Size Human Arabidopsis Rice Potato Sugarcane Cotton Barley Wheat Diameter proportional to genome haploid genome size CSIRO. INI Meeting July 2010 - Tutorial - Applications

Plant Genomes – Total Size Human Cotton Barley Sugarcane Wheat CSIRO. INI Meeting July 2010 - Tutorial - Applications

Denovo RNA Seq Why transcriptome ? Large genome sizes with high repeat content are difficult to assemble Transcriptomes more constant size Enriched for functional content Aims : Transcript discovery Small /long non-coding RNA profiling Analytical challenges Assembly – ABySS, Velvet, Euler-SR Comparisons between non-discrete, overlapping transcripts Annotation Ploidy CSIRO. INI Meeting July 2010 - Tutorial - Applications

Summary – Impacts and Challenges RNASeq Increased resolution Increased power for transcript complexity and variation Analytical challenges – transcript complexity, compositional bias Large gains in small and long non-coding RNA profiling Epigenomics ChipSeq and MethylSeq Genome-wide with resolution Robust event calling is challenging Denovo transcriptomics Attractive option for large, repeat rich genomes CSIRO. INI Meeting July 2010 - Tutorial - Applications

Acknowledgements CSIRO PI Bioinformatics Team Andrew Spriggs Stuart Stephen Emily Ying Jose Robles Michael James CSIRO Biostatistics David Lovell CSIRO. INI Meeting July 2010 - Tutorial - Applications