Canadian Bioinformatics Workshops

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Functional Genomics with Next-Generation Sequencing
The Past, Present, and Future of DNA Sequencing
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Transcriptional regulation and promoter analysis
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Outline Questions from last lecture? P. 40 questions on Pax6 gene Mechanism of Transcription Activation –Transcription Regulatory elements Comparison between.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
RNA-seq: the future of transcriptomics ……. ?
Data Analysis for High-Throughput Sequencing
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Transcriptomics Jim Noonan GENE 760.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Lecture 1 Introduction to high throughput sequencing
High Throughput Sequencing
mRNA-Seq: methods and applications
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Fine Structure and Analysis of Eukaryotic Genes
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
The virochip (UCSF) is a spotted microarray. Hybridization of a clinical RNA (cDNA) sample can identify specific viral expression.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Data Type 1: Microarrays
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Microarray Technology
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Massive Parallel Sequencing
Regulation of Gene Expression Eukaryotes
Finish up array applications Move on to proteomics Protein microarrays.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Verna Vu & Timothy Abreo
Doug Brutlag 2011 Genomics, Bioinformatics & Medicine Doug Brutlag Professor Emeritus of.
I519 Introduction to Bioinformatics, Fall, 2012
Tag profiling is dead... October 2009 Claudia Voelckel Patrick Biggs...long live mRNA-Seq!
MCB 317 Genetics and Genomics Topic 11 Genomics. Readings Genomics: Hartwell Chapter 10 of full textbook; chapter 6 of the abbreviated textbook.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Analysis of protein-DNA interactions with tiling microarrays
Introduction to RNAseq
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Trends Biomedical Science
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Lecture-5 ChIP-chip and ChIP-seq
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
March 6, 2016 EpiQ Chromatin Analysis Kit A New Tool for Epigenetic Research Gábor Kohut PhD Field Application Specialist Central and Eastern Europe.
Engineering magnetosomes to express novel proteins Which ones? Tweaking p18 Linker Deleting or replacing GFP Something else? TRZN Oxalate decarboxylases.
Canadian Bioinformatics Workshops
The Transcriptional Landscape of the Mammalian Genome
RNA-Seq analysis in R (Bioconductor)
High-Resolution Profiling of Histone Methylations in the Human Genome
Review Warm-Up What is the Central Dogma?
High-Resolution Profiling of Histone Methylations in the Human Genome
Next-generation DNA sequencing
Sequence Analysis - RNA-Seq 2
Presentation transcript:

Canadian Bioinformatics Workshops www.bioinformatics.ca

Beyond genome sequencing Asim Siddiqui Bioinformatics Workshop Next Generation Sequencing

Questions about the genome Obtaining a genome sequence is a one step towards understanding biological processes Questions that follow from the genome are: What is transcribed? Where do proteins bind? What is methylated? In other words, how does it work?

Central dogma of molecular biology

The Transcriptome The transcriptome is the entire set of RNA transcripts in the cell, tissue or organ. The transcriptome is cell type specific and time dependant i.e. It is a function of cell state The transcriptome can help us understand how cells differentiate and respond to changes in their environment.

Transcriptome complexity Transcripts may be: Modified Spliced Edited Degraded Transcriptome is substantially more complex than the genome and is time variant.

Historic measurements Northern blots RT-PCT FRET The above assays must be targeted to a specific locus

ESTs ESTs were the first genome wide scan for transcriptional elements Different library types: Proportional Normalized Subtractive Can be sequenced from the 5’ or 3’ end

“Hello Mr Chips” Microarray chips introduced in 90’s Essentially a parallel Northern blot Probes placed on slides RNA -> cDNA, labelled with fluorescent dye and hybridized. Fluorescence measured Chips have been highly successful Simplified analysis Useful when there is no genome sequence Linear signal across 500 fold variation Standardization has aided use in medical diagnostics E.g. Mammaprint

Chips: pros and cons Advantages Disadvantages Do not require a genome sequence Highly characterised, with many s/w packages available One Affymetrix chip FDA approved Disadvantages Measurements limited to what’s on the array Hard to distinguish isoforms when used for expression Can’t detect balanced translocations or inversions when used for resequencing

SAGE

SAGE Advantages Disadvantages Digital count for each transcript Novel transcript discovery Disadvantages Alternative transcripts may share a tag The tag may map to multiple genomic locations Doesn’t work well if genome is unknown Expensive

“Goodbye Mr Chips” Large sale EST and SAGE libraries are expensive with Sanger sequencing Next gen sequencing has dropped the cost by a factor of 100 Papers have demonstrated large numbers alternatively spliced and novel transcripts Chips are established, especially in the diagnostic market, but...their days are numbered

mRNA-seq Basic work flow Align reads (sometimes to transcriptome first and then the genome) Tally transcript counts Align tags to spliced transcripts Add to transcript counts

Cloonan et al. 2008 Used SOLiD to generate 10Gb of data from mouse embryonic stem cells and embryonic bodies Used a library of exon junctions to map across known splice events

Distribution of tags

Alignment strategy

Tag locations

Additional papers Bainbridge et al 2006 – used 454 to investigate the transcriptome of ES cells Mortazavi et al 2008 – used Illumina to investigate transcription in liver cells

Mortazavi et al 2008

General issues Coverage across the transcript may not be random Some reads map to multiple locations Some reads don’t map at all Reads mapping outside of known exons may represent New gene models New genes

Size of the transcriptome Carter et al (2005) Using arrays estimated 520,000 to 850,000 transcripts per cell. Use upper limit and estimate average transcript size of 2kb Transcriptome ~2GB Transcriptome cost ~ genome cost

The Boundome DNA binding proteins control genome function Histones impact chromatin structure Activators and repressors impact gene expression The location of these proteins helps us understand how the genome works

Finding protein binding sites EMSA ChIP ChIP-chip ChIP-seq

ChIP

Chip-Seq Instead of probing against a chip, measure directly Basic work flow Align reads to the genome Identify clusters and peaks Determine bound sites

Robertson et al. 2007 Used Illumina technology to find STAT1 binding sites Comparisons with two ChIP-PCR data sets suggested that ChIP-seq sensitivity was between 70% and 92% and specificity was at least 95%.

Tag statistics

Typical Profile

Mikkelsen et al., 2007 Performed a comparison with ChIP-chip methods ~98% concordance

Comparison with ChIP-seq

Johnson et al, 2007 Gene known to be regulated by NeuroD1 for many years Traditional biochemistry and bioinformatics failed to find the site. Site assumed to be 100’s kb upstream ChIP-seq found a site with weak match to the consensus motif in exon 1

The Methylome In methylated DNA, cytosines are methylated. This leads to silencing of genes in the region e.g. X inactivation It is yet another form of transcriptional control and together with histone modifications a key component of epigenetics

Bi-sulphite sequencing Converts un-methylated cytosines to uracil (which becomes thymine when converted to cDNA) Experimental procedure is difficult Sequence alignment is tricky, but the basic concepts hold

Taylor et al, 2007 Targeted sequencing reduced alignment difficulties Used dynamic programming to identify alignments of sequences against an in silico bisulphate converted sequence of the target amplicon regions

Cokus et al, 2008 Used Illumina shotgun sequencing Tested reads against every possible methylation pattern and retained unique hits

The basic workflow All of these analyses follow the same basic pattern Align reads Count Analyze

Metagenomics Craig Venter’s sequencing of the sea one of the earliest and most well known examples Used Sanger sequencing Many recent studies including Angly et al – studied ocean virome Cox-Foster et al – studied colony collapse disorder All use 454 for its longer read length and target amplification of 16S or 18S ribsomal subunits

Summary Basic processing algorithm is the same Results are analyzed using standard statistical practices established in work using earlier experimental methods Metagenomics covers a new type of sequencing not easily performed with Sanger