P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.

Slides:



Advertisements
Similar presentations
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Advertisements

Next-generation sequencing
Next Generation Sequencing, Assembly, and Alignment Methods
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Greg Phillips Veterinary Microbiology
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
The Human Genome Race. Collins vs. Venter Collins Venter.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Genome sequencing and assembling
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
High Throughput Sequencing
Genome Sequencing and Assembly High throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Next generation sequencing platforms Applications
Next generation sequencing Xusheng Wang 4/29/2010.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
The Changing Face of Sequencing
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
The iPlant Collaborative
Chapter 21 Eukaryotic Genome Sequences
RNA Sequencing I: De novo RNAseq
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Jan Pačes Institute of Molecular Genetics AS CR
1.Data production 2.General outline of assembly strategy.
Human Genome.
Neanderthals Noonan, et al. Sequencing and Analysis of Neanderthal Genomic DNA Green, et al. Analysis of one million base pairs of Neanderthal DNA Kristine.
billion-piece genome puzzle
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
De Novo Genome Assembly - Introduction
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Accessing and visualizing genomics data
Chapter 5 Sequence Assembly: Assembling the Human Genome.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Boundless Lecture Slides Free to share, print, make copies and changes. Get yours at Available on the Boundless Teaching Platform.
Canadian Bioinformatics Workshops
1 Aplicação de metodologias genómicas na detecção de polimorfismos no sobreiro Ciência 2010 Octávio S. Paulo Computational Biology and Population Genomics.
Lesson: Sequence processing
Assembly algorithms for next-generation sequencing data
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
CAP5510 – Bioinformatics Sequence Assembly
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Genome sequence assembly
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
Next-Generation Sequencing Strategies Enable Routine Detection of Balanced Chromosome Rearrangements for Clinical Diagnostics and Genetic Research  Michael E.
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Assembly of Solexa tomato reads
Presentation transcript:

P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome Assembly Bacteria Genome Analysis Genome Annotation and Genome Browser

Overview of Genome Analysis

Criteria include: genome size (some plants are >>>human genome) cost relevance to human disease (or other disease) relevance to basic biological questions relevance to agriculture Criteria for selecting genomes for sequencing

Sequence one individual genome, or several? Try one… --Each genome center may study one chromosome from an organism --It is necessary to measure polymorphisms (e.g. SNPs) in large populations For viruses, thousands of isolates may be sequenced. For the human genome, cost is the impediment. Criteria for selecting genomes for sequencing

Ancient DNA projects Special challenges: Ancient DNA is degraded by nucleases The majority of DNA in samples derives from unrelated organisms such as bacteria that invaded after death The majority of DNA in samples is contaminated by human DNA Determination of authenticity requires special controls, and analysis of multiple independent extracts Metagenomics projects Two broad areas: Environmental (ecological) e.g. hot spring, ocean, sludge, soil Organismal e.g. human gut, feces, lung

Whole Genome Sequencing (WGS) Multiple copies of DNA Fragments of ,000 bases No information is retained on which part of the DNA the fragments came from. 8

WGS sequencing: fragments We start with millions of pairs of reads, bases each Multiple copies of DNA provide multiple coverage by reads The problem of genome assembly is to recover the original sequence of bases of the genome (as much as possible…). 9

Assembling a jigsaw puzzle 1 The task of the assembly becomes the task of assembling a giant jigsaw puzzle We look for reads whose sequences suggest that they came from the same place in the genome: AGTGATTAGATGATAGTAGA ||||||||| GATGATAGTAGAGGATAGATTTA 10

Assembling a jigsaw puzzle 2 Then we put “overlapping” reads together AGTGATTAGATGATAGTAGA AGATGATAGTAGAGATAGATAGACC ATAGATAGACCACTCATCATAC AGTGATTAGATGATAGTAGAGATAGATAGACCACTCATCATAC reads This yields a “contig” 11

Assembling a jigsaw puzzle 3 We use read pairing information to order and orient contigs to produce scaffolds – the final product of assembly Pairs of reads belonging to the same fragment of DNA contig 12

Difficulties in NGS assembly Sequencing errors: two reads that came from the same place in the genome often have mismatching sequences AGTGATTAGATCATAGTAGAG || ||||||||| ATGATAGTAGAGGATAGAT Repetitive DNA (~ 5-20% of human DNA is repetitive): TTAGGGTTAGGGTTAGGGTTAGGGTTAGGG 13

Repeat regions may cause omissions ARBRC ARC 14 (1)Long insert library :10kb (2)Mate-paired librared (3)Long read : 3-4 Kb from 3 rd Generation sequencer.

Erroneous duplications UMD2 BosTau4 Each base in the genome is covered by 6 reads, on average. A way to judge which assembly is correct is to compute the average read coverage for these regions. Two recent published assemblies of the cow genome: UMD2 and BosTau4 Segmental duplications were a central theme in BosTau4 genome paper UMD2 assembly had many fewer duplications We examined the duplications, > 99.5% identity, >5000bp, one copy in the UMD2 assembly and two copies in the BosTau4 15

Next Gen vs. Sanger Sequencing 16

De novo Sequencing vs Re-sequencing Assembly Tools ABySS ALLPATHS Edena Euler-SR SHARCGS SHRAP SSAKE VelvetAssembly Alignment Tools Cross_match ELAND Exonerate MAQ Mosaik SHRiMP SOAP ZoomMapping CLC Genomics

Coverage % Sequenced When has a genome been fully sequenced?

Coverage % Sequenced Sanger sequencing ~1000bp NGS sequencing Solexa: ~100bp SOLiD: ~70bp For 99.75% % Accuracy NEED 60X - 100X COVERAGE Read coverage