Next generation sequencing Xusheng Wang 4/29/2010.

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

GBrowse at TAIR Philippe Lamesch TAIR curator. Seqviewer.
The Past, Present, and Future of DNA Sequencing
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
RNAseq.
Introduction to genomes & genome browsers
 Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina  Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species.
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
Transcriptome Sequencing with Reference
Next-generation sequencing
Canadian Bioinformatics Workshops
Next-generation sequencing – the informatics angle Gabor T. Marth Boston College Biology Department AGBT 2008 Marco Island, FL. February
Design Goals Crash Course: Reference-guided Assembly.
Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.
Bioinformatics Methods and Computer Programs for Next-Generation Sequencing Data Analysis Gabor Marth Boston College Biology Next Generation Sequencing.
Data analysis methods for next- generation sequencing technologies Gabor T. Marth Boston College Biology Department Epigenomics & Sequencing Meeting July.
Bioinformatics for next-generation DNA sequencing Gabor T. Marth Boston College Biology Department BC Biology new graduate student orientation September.
Next-generation sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department.
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
Deep Sequencing Introduction to Bioinformatics Seminar
Informatics tools for next-generation sequence analysis Gabor T. Marth Boston College Biology Department University of Michigan October 20, 2008.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
A Contract Research and Services Organization. Ideas to Life! A Contract Research and Services Organization  Xcelris is a Specialty Contract Research.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Department of Bioinformatics and Computational Biology
Next Now-Generation Genomics: methods and applications for modern disease research Aaron J. Mackey, Ph.D. Center for Public Health.
NGS Analysis Using Galaxy
Whole Exome Sequencing for Variant Discovery and Prioritisation
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Todd J. Treangen, Steven L. Salzberg
Introduction to next generation sequencing Rolf Sommer Kaas.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
Next Generation DNA Sequencing
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
Introduction To Next Generation Sequencing (NGS) Data Analysis
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
Introduction to RNAseq
RNA-seq: Quantifying the Transcriptome
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
Informatics challenges for next-generation sequence analysis
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Next-generation sequencing: the informatics angle
Next-generation sequencing: the informatics angle Gabor T. Marth Boston College Biology Department CHI Next-Generation Data Analysis meeting Providence,
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Introduction to next-gen sequencing bioinformatics.ca Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Introduction to Next Generation Sequencing. Strategies For Interrogating the Transcriptome Known genes Predicted genes Surrogate strategy Exon verification.
Canadian Bioinformatics Workshops
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Next-generation DNA sequencing
Presentation transcript:

Next generation sequencing Xusheng Wang 4/29/2010

Outline Background Technologies Data analysis and Applications

Why sequence DNA?

De novo sequencing genome Homo sapiens Mus musculus Rattus norvegicus Pan troglodytes Macaca mulatta Drosophila melanogaster Danio rerio Takifugu rubripes Arabidopsis thaliana oryza sativa Caenorhabditis elegans

Individual human genome sequencing

Interests to me… C57BL/6J (B) DBA/2J (D) F1 20 generations brother-sister matings BXD1 BXD2 BXD80 + … + F2 BXD RI Strain set BXD RI Strain set fully inbred fully inbred isogenic hetero- geneous hetero- geneous Recombined genomes are needed for mapping female male chromosome pair Inbred Isogenic siblings Inbred Isogenic siblings BXD

Cancer genome sequencing

Map-and-count experiments RNA-seqChIP-seq

History of DNA sequencing Messing & Llaca, PNAS (1998) Sanger Sequencer

Next generation sequencing technologies ABI SOLiDIllumina GA2 Roche 454

Single molecular sequencing technologies Helicos Single Molecule Real Time (SMRT) DNA sequencing

Comparison of NGS platforms Michael L. Metzker Nature Reviews Genetics 11, (Jan 2010)

Roche (454) GS FLX sequencer

Hiseq 2000

A A T T A A T T Prepare library Illumina Genome Analyzer

Prepare clusters Illumina Genome Analyzer

Prepare clusters Illumina Genome Analyzer

Sequencing Illumina Genome Analyzer A T C G

Sequencing Illumina Genome Analyzer A T C G

Sequencing Illumina Genome Analyzer A

Sequencing Illumina Genome Analyzer A T C G A

Sequencing Illumina Genome Analyzer A T

xwang TGGAAGAATATAGAGCCTGTCACAATCCTCCCTTTGAGCAGCATTAGTCTACAAAGGAAAAGAAAGT TCTCATGACTCTAGTGCCACCCTCACATACTTAC `_ab_ZaabbbaY`\_\a[[_a`aaa]`_aa\`aa[a\\aW]``a`VW \aa`aZ__Y]Z_aWZV_a][]][a`Y^X[\\[FKT``F\[^W`^TVZTVXODD xwang GGGCACTCTTGTGCGGCAACGGCTGGGTGAGGACTCAACGGGGCCCCGTCCTGTCTAGCCTCGC CCTCGCTTGCGGGACCAGACCGGACACTGGCGAAGTA X\\V_aa_aaa_a_R[[aa^U^`aV^_HXT[NMYPU_\PU]VTRZU[P K`_HIV[GG]IHSDG\_XYGPDW_LFHIOJT`ROJGTDDTZPIGVKJIJDGMD xwang CTTGCAGCAGATGTCTGGACTCCTCCAAAATACATGCCTAGGCGTCAACGCAGTTACCACCTGCTTT CCGCCAGTGATGCGTCCTCCTGGG.TCGCGTCTC VFHMMZOMZKMMZRGFFTHDZRFYMFHFDOTFJDDMEWKYHMMOHDDJ ZIDIGDDV]QIPNFGDODHHMFDDFDIFKDP_DHWDFDRHXHFDRGGGGHYDZ xwang ATGCTGACCAATCCGGAAC.CTCGGCTAGAAAACGCCAGGGGTCGAGAGAAGAATAATCTACAATCC GAAACAGCCAGGGAGTGAAACAGTATACGTGTAT [MRD[WKIPGSMJDQUVRDDJJPSDDMGPPYY_GGMDRFNFDDMHHDJ JHLPZMKDMOMJJDJDWIDRMNHIDHHDHLDNMDDRMMMDSKDDKHHNOFHDR 0 Data from Illumina GA2 LaneTileX Y Filter: 0-No; 1-Yes 1:Single; 2: paired endIndexMechine name

SOLiD system

SOLID sequencing

Ligation-based sequencing

Decoding color space Raw error rate = ~3% Corrected error rate = ~0.1%

Single SNP detection

Data from SOLiD (.csfasta) >4_27_99_F3 T >4_27_1062_F3 T >4_27_1570_F3 T >4_28_935_F3 T >4_28_1306_F3 T >4_29_429_F3 T >4_29_506_F3 T >4_29_636_F3 T >4_29_940_F3 T >4_29_1957_F3 T >4_31_522_F3 T >4_31_1523_F3 T

Quality value (_QV.qual) >4_27_99_F >4_27_1062_F >4_27_1570_F >4_28_935_F >4_28_1306_F >4_29_429_F

Comparing Sequencers read length bases per machine run 10 bp1,000 bp200 bp 1 Gb 100 Mb 1Mb 100Gb Illumina, AB/SOLiD short-read sequencers ABI capillary sequencer 454 pyrosequencer ( Mb in bp reads) (100Gb in bp reads)

Comparing Sequencers Roche (454)Illumina GASOLiD ChemistryPyrosequencingReverse terminatorLigation-based AmplificationEmulsion PCRBridge AmpEmulsion PCR Paired ends/sepYes/3kbYes/200 bpYes/3 kb Mb/run100 Mb20 Gb100 Gb Time/run7 h4 days/8days7 days / 14 days Read length400 bp bp50 bp

Data volume

… and they give you the picture on the box Read mapping Read mapping is like doing a jigsaw puzzle… …you get the pieces… Problem is, some pieces are easier to place than others…

Alignment of reads Reads generated from sequencing is mapped to a reference genome Conventional tools like Blast or Blat do not work well with short sequence reads. Modification of existing alignment algorithms to handle short reads.

Alignment Tools ELAND MAQ Bowtie SOAP Bioscope/Corona Lite pipeline Tophat BFAST

SAM/BAM format File format version Sequence name; Sequence length read group

Sequence variations SNPsInsertionDeletion Medium Insertion

SNPs between the C57BL/6J and DBA/2J 4,553,000 SOLiDIllumina

Evaluating SNPs calls 5% adjacent SNPs 20% adjacent SNPs

21 August 2015 D 321 ChrID Supports: TAAGAATGAGTTGGCAAATAAAGAGTTTGGTGAGTTTATAGAAATATAGGggccg ataggACAAGGTACAAGGAATGGCTGAAGGAGAGAGGTTG GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG TGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGG GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG AGTTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGGA GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG TTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAA TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAAT GTTTATAGAAATATAGG ACAAGGTACAAGGAATGGC GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG base - 1million bases Medium indels detection

Large indels detection K. Chen et al., Nature Methods 6: (2009) Concordance Insertion Deletion Clone inserted size

InDels between the C57BL/6J and DBA/2J

Inversion detected by paired-end data Total Inversions Span exon(s) or gene(s) IntronsIntergenic

Copy Number Variations (CNVs) Total CNVsGainsLosses 21,7397,18214,557 Graubert, et al PLoS Genetics Anderson, et al Genes & Immunity Several gene members of Klra family was deleted in DBA/2J

De novo assembly ABySS ALLPATHS Euler-SR SHRAP SSAKE Velvet SOAP

Variation viewed at a genome scale

RNA sequencing

.csfasta Filtering Ribosomal RNA tRNA TTTT / AAAA Adapters Alignments Reference sequences Merging and sorting Counting reads Novel transcripts RNA sequencing analysis pipeline

Alignment methods Transcriptome reads that cross splice junctions Anchor Extend method

Alternative splicing Novel Transcribed Region (NTR) Definition: a segment of genomic sequence that is transcribed but is not currently annotated as an exon in a database

Our RNAseq data on UCSC genome browser bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Williamslab&hgS_otherUserSessionName=eye _RNAseq

Finding the new SNP data

Finding the indel data

Using the new sequence data