Click to edit Master title style Irys data analysis January 10 th, 2014.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

MCB Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
DNAseq analysis Bioinformatics Analysis Team
Introduction to Short Read Sequencing Analysis
1000 Genomes SV detection Boston College Chip Stewart 24 November 2008.
Some new sequencing technologies. Molecular Inversion Probes.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Expanding the Tool Kit for BAC Extension Summary of completion criteria developed for NSF Tomato Sequencing Workshop January 14, 2007.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Zebra Finch Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
High Throughput Sequencing
Bacterial Genome Finishing Using Optical Mapping Dibyendu Kumar, Fahong Yu and William Farmerie Interdisciplinary Center for Biotechnology Research, University.
NGS Analysis Using Galaxy
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
De-novo Assembly Day 4.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Todd J. Treangen, Steven L. Salzberg
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Introduction to Short Read Sequencing Analysis
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Click to edit Master title style User Group Meeting Software.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
The Changing Face of Sequencing
The iPlant Collaborative
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
Identification of Copy Number Variants using Genome Graphs
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
De Novo Genome Assembly - Introduction
The Wellcome Trust Sanger Institute
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Accessing and visualizing genomics data
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
Will 10x technology make us rethink genome assemblies?
Canadian Bioinformatics Workshops
Short Read Sequencing Analysis Workshop
Lesson: Sequence processing
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Cross_genome: Assembly Scaffolding using Cross-species Synteny
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Denovo genome assembly of Moniliophthora roreri
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
2nd (Next) Generation Sequencing
Next-generation DNA sequencing
Presentation transcript:

Click to edit Master title style Irys data analysis January 10 th, 2014

Irys Workflow – Data Analysis Genome Map (.cmap) Single molecule maps (.bnx) Sample Anchoring (.xmap) irys ™ ICSirysView ™ Image processing Short NGS Contigs RefSeq Reference RefSeq Reference Genome Map 2 Structural variation detection Sequence Assembly Validation Sequence contig scaffolding Integration Analysis Scanning Sequence scaffolding without de novo assembly Using a reference (eg hg19) Using a second genome map Using NGS contigs Using a reference (eg hg19) Using a second genome map Using NGS contigs Gross assembly quality (reiterate) Missing sites, extra sites, interval differences structural differences Consed Gross assembly quality (reiterate) Missing sites, extra sites, interval differences structural differences Consed Alignment in irysview manual editing AGP output Conversion to FASTA Reimport superscaffolds to reiterate Alignment in irysview manual editing AGP output Conversion to FASTA Reimport superscaffolds to reiterate Mapping based variant calling Two color applications: epigenetics, DNA damage Assembly

workshops De novo assembly (Using irysview (Alex); Python/command line – Heng/Ernest) SV detection – Warren/Andy

Core workflow: Data QC: basic molecule stats 4

Core workflow: Data QC: molecule quality report 5 Always consider the mapping rate with respect to the stringency setting Mapping rate helps us estimate the useful coverage depth as well as data quality

Stretch normalization Evaporation (increasing [salt]) during the scanning prolonged of version 2 chips results in shortening of molecules in nanochannels. This can be corrected for by measuring the average stretch in each scan and correcting with a normalization factor. Determining average stretch: –Internal ruler based normalization –Reference mapping based normalization 6

Core workflow: De novo assembly: optArg 7 From molecule quality report and.err file p value based on genome size or as stringent as possible Stringencies vary based on step

No reference? With no reference, we can run a de novo assembly based on expectations and data QC observations: –Expected genome size –Site density (in silico) –Label density (empirical) –Molecule n50 (empirical) Run de novo assembly (relaxed) Use the result of the de novo assembly to run molecule quality report Update error characteristics (stretch normalization) and rerun de novo assembly 8

De novo assembly QC 9 We started with 1.8Gb (>100kb) that mapped at 40%. We had a good quality reference so we expect to use ~0.8Gb. Genome has 14 chromosomes Expected size is 20Mb Map n50 is good, we may be able to further improve it with additional depth or optimized sample prep

De novo assembly QC 10

De novo assembly QC 11 Higher stringency assembly The higher stringency assembly misses some of the genome but resolves the chimera

Click to edit Master title style Applications: Sequence anchoring 12

12 Mb Streptomyces Genome Assembly with Various Technologies Total MbContigsN50 (kb) ,870 DNA sequence scaffolding BioNano Genomics NGS + Cosmids Short-Read NGS Only 3 rd -Gen Reads

Sequence anchoring Illumina + cosmids: 11.38Mb, 97 contigs, n50 length: 154kb, 11.38Mb anchored Illumina: 9.08Mb, 124 contigs, n50 length: 92kb, 8.9Mb anchored Pac Bio: 11.63Mb, 20 contigs, n50 length: 918kb, 11.63Mb anchored 1 Mb Validate sequence assembly Find errors Scaffold/Orient/Size gaps Output FASTA or AGP (soon)

Click to edit Master title style Applications: Structural variation 15

Structural Variation-Insertion/Deletion Calls (vs hg19) 95 regions in BioNano GenomeMaps correspond to N-based gaps in hg19 (not included in graph). The gaps may contain repeats and polymorphic regions, where SV enriches.

Structural Variant Examples: Insertions and Deletions Genome Map hg19 Molecules +4.9kb Genome Map hg19 Molecules -176,265 kb #h SmapEntryIDQryContigIDRefcontigID1RefcontigID2 QryStartPos QryEndPos RefStartPos RefEndPosOrientationConfidenceType #f intint float stringfloatstring net size ,483,278 1,488,217 75,697,428 75,878,632+ delete 176, kb region181.2 kb region #h SmapEntryIDQryContigIDRefcontigID1RefcontigID2 QryStartPos QryEndPos RefStartPos RefEndPosOrientationConfidenceType #f intint float stringfloatstring net size ,093,571 1,111,027 13,122,638 13,135,195+ insert 4, kb region12.6 kb region

workshops De novo assembly (Using irysview (Alex); Using Python/command line – Heng/Ernest) –OptArg- iterations, stringencies, merging, ref mapping –Output.err file Alignref Visualization of genome maps to molecules Identification of chimeras SV detection – Warren/Andy –Explain the SV detection application (consider IP issues) –Discuss stringency parameters –Show resulting table ranges explain types

19