Assembly of Solexa tomato reads

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
Bioinformatics caacaagccaaaactcgtacaaCgagatatctcttggaaaaactgctcacaatattgacgtacaaggttgttcatgaaactttcggtaAcaatcgttgacattgcgacctaatacagcccagcaagcagaat Managing.
Sequence Alignment technology Chengwei Lei Fang Yuan Saleh Tamim.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
International Tomato Finishing Workshop Wellcome Trust Sanger Institute April 2007 Wellcome Trust Medical Photographic Library.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
02/08/2015Regional Writing Centre2 02/08/2015Regional Writing Centre3.
Sequence comparison: Local alignment
Genome Sequencing and Assembly High throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Improving the Accuracy of Genome Assemblies July 17 th 2012 Roy Ronen *,1, Christina Boucher *,1, Hamidreza Chitsaz 2 and Pavel Pevzner 1 1. University.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
WGP Tomato EU-SOL meeting July 15, 2009 Antoine Janssen.
Fuzzypath – Algorithms, Applications and Future Developments
Status report on gap closure of the human chromosome 5 BAC map Authentication of C5 BAC maps Map and sequence status Gap status and steps used to close.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
RNA Sequencing I: De novo RNAseq
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Chromosome 12 M. Pietrella 1, G. Falcone 1, E. Fantini 1, A. Fiore 1, C. Perla 1, M.R. Ercolano 2, A. Barone 2, M.L. Chiusano 2, S. Grandillo 3, N. D’Agostino.
Chromosome 12 M. Pietrella 1, G. Falcone 1, E. Fantini 1, A. Fiore 1, M.R. Ercolano 2, A. Barone 2, M.L. Chiusano 2, S. Grandillo 3, N. D’Agostino 2, A.
HeterochromatinEuchromatin Relative chromosome length Relative bivalent diameter X 1.23 X 1.00 Relative area Relative optical density.
Bioinformatics Scheme of the sequencing project (Martínez & Figueras, 2007) Construction Bookseller Bases determination Fragments assembly Gene search.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
August 2008Bioinformatics Tools for Comparative Genomics of Vectors1 Genomes Daniel Lawson EBI.
1.Data production 2.General outline of assembly strategy.
Human Genome.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
GigAssembler. Genome Assembly: A big picture
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Repetitive element (RE) mediated DNA level recombination by non-allelic homologous recombination (NAHR) as the mechanism for disperse duplication of a.
Sequencing Chromosome 12. runs db (blast) SOL dbrelational db Choice of suitable seed BACs Running 96 samples For each BAC check db update db update dbcheck.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
US Contribution to the International Tomato Genome Sequencing Effort Current structure of contributions Ongoing activity summary Funding issues.
Physical Map and Organization of Arabidopsis thaliana Chromosome 4
Virginia Commonwealth University
Lesson: Sequence processing
Tomato Sequencing Project Meeting at SOL 2008, Oct. 15, 2008
Quality Control & Preprocessing of Metagenomic Data
Extract DNA and RNA from the same E. coli culture
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Genome sequence assembly
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Sequence comparison: Local alignment
Stuff to Do.
GEP Annotation Workflow
From: TopHat: discovering splice junctions with RNA-Seq
Bioinformatics: Buzzword or Discipline (???)
The ability of the SOP to sequence and identify unknown samples.
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Screenshot of JCVI's Advanced Reference Viewer ( jcvi
Introduction to Sequencing
Polymorphism discovery in 09-CB1 × IPO323 versus 09-ASA-3apz × IPO94269 bulks. Polymorphism discovery in 09-CB1 × IPO323 versus 09-ASA-3apz × IPO94269.
Sequence the 3 billion base pairs of human
Volume 10, Issue 6, Pages (June 2017)
Life Sciences Business challenge
Presentation transcript:

Assembly of Solexa tomato reads María José Truco Bioinformatics and Genomics Program CRG-Centre de Regulació Genòmica Barcelona

RESULTS: Testing of the three tomato run concentrations Methodology: -Mixture of 9 BACs (2 incomplete). Three concentration runs: 1 pM, 2pM and 4 pM -Alignment to Reference Genomes using ELAND (Solexa) Sequences are aligned if have no more than 2 mismatches exists and align only to a single location of the reference genome ~ 30% of the sequences are contamination from E. coli and vector 1pM 2pM 4pM Sequences % sequences E. coli 632242 25.8 1095147 25.5 1330215 23.4 pBeloBAC11 96316 3.9 170772 4.0 220972 Yeast 84480 3.5 151431 194927 3.4 Human BACs* 84 0.0 68 75 Tomato BACs 1002996 41.0 1852042 43.1 2658390 46.8 Repeats** 414523 16.9 702356 16.3 854140 15.0 Non-matching*** 218044 8.9 336869 7.8 418374 7.4 All reads 2448685 4301227 5677093 Testing for contamination during sample preparation ** Reads matching 2 or more sites in the reference genomes ** * Reads with more than 2 mismatches or not aligning to any reference genome

RESULTS: BAC coverage before assembly 0 20000 40000 60000 80000 100000 500 400 300 200 100

RESULTS: BAC recovery after assembly (VELVET: Zerbino &Birmey; http://www.ebi.ac.uk/~zerbino/velvet/) using sequences from 4pM run (3920769 sequences) and 1 pM run (1627900 sequences) 4pM run: BAC recovery 66.4-89% (81.3-94.6%) 1pM run: BAC recovery 45.1-82.4% (66-97.6%) 0 20000 40000 60000 80000 100000 2500 2000 1500 1000 500

RESULTS: Gap filling of incomplete BAC EF606852 Methodology: -Selection of two fragments of ~4Kb flanking the gap -Blast assembled contigs against the two sequences flanking the gap 4121 25 left gap flanking region 4153 7297 right gap flanking region GAP 25 4121 4153 7297 12372 perfect match mis match assembled contig left gap flanking region right gap flanking region