Bioinformatics and Sequencing Relevant to SolCAP

Slides:

Advertisements

Similar presentations

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.

Advertisements

DNA polymorphisms Insertion-deletion length polymorphism – INDEL Single nucleotide polymorphism – SNP Simple sequence repeat length polymorphism – mini-

The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.

What Is Genomics? Genomics is the study of how the entire genome of a species functions as a unit and evolves over time. It is the study of life’s blueprint,

Mining SNPs from EST Databases Picoult-Newberg et al. (1999)

16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.

PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)

Genomics. Gene expression DNA (Genome) pre-mRNA mRNA mRNA (Transcriptome) Proteins (Proteome) Metabolites (Metabolome) Regulation Nucleus Cytoplasm Chromatography.

High Throughput Sequencing

CS 6293 Advanced Topics: Current Bioinformatics

DNA Sequencing Today, laboratories routinely sequence the order of nucleotides in DNA. DNA sequencing is done to: Confirm the identity of genes isolated.

Reading the Blueprint of Life

DNA Technology- Cloning, Libraries, and PCR 17 November, 2003 Text Chapter 20.

Finishing the Human Genome

Genetic and Molecular Epidemiology Lecture III: Molecular and Genetic Measures Jan 19, 2009 Joe Wiemels HD 274 (Mission Bay)

Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.

6.3 Advanced Molecular Biological Techniques 1. Polymerase chain reaction (PCR) 2. Restriction fragment length polymorphism (RFLP) 3. DNA sequencing.

Plant Molecular Systematics Michael G. Simpson

AP Biology Ch. 20 Biotechnology.

High Throughput Sequencing Methods and Concepts

Genomics BIT 220 Chapter 21.

Biotechnology Methods Producing Recombinant DNAProducing Recombinant DNA Locating Specific GenesLocating Specific Genes Studying DNA SequencesStudying.

Module 1 Section 1.3 DNA Technology

Genomics & Proteomics Analysis Chapter 20 Overview of topics to be discussed  How to sequence genomic DNA (we will have to touch briefly on polymerase.

High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

A Sequenciação em Análises Clínicas Polymerase Chain Reaction.

Chapter 14 The Techniques of Molecular Genetics

CHAPTER 7 DNA SEQUENCING - INTRODUCTION - SANGER DIDEOXY METHOD - AUTOMATED SEQUENCING - NEXT GENERATION OF SEQUENCING METHODS MISS NUR SHALENA SOFIAN.

Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.

Comparative analyses of the potato and tomato transcriptomes

PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.

GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.

Ultra-High Throughput DNA Sequencing on the 454/Roche GS-FLX

6.3 Advanced Molecular Biological Techniques 1. Polymerase chain reaction (PCR) 2. Restriction fragment length polymorphism (RFLP) 3. DNA sequencing.

DNA Technology and Genomics

Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.

Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.

Plan A Topics? 1.Making a probiotic strain of E.coli that destroys oxalate to help treat kidney stones in collaboration with Dr. Lucent and Dr. VanWert.

Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.

1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,

Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.

Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.

Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.

DNA Sequencing First generation techniques

Next-generation sequencing technology

DNA Sequencing Second generation techniques

Next generation sequencing

DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford

DNA Technologies (Introduction)

Next-generation sequencing technology

Genomic Analysis Chapter 19

Section 3: Gene Technologies in Detail

Chapter 20: DNA Technology and Genomics

Sequencing Technologies

Chapter 20 – DNA Technology and Genomics

Relationship between Genotype and Phenotype

The Human Genome Project

DNA Sequence Determination (Sanger)

Polymerase Chain Reaction (PCR)

Screening a Library for Clones Carrying a Gene of Interest

DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.

Genomic Analysis Chapter 19-20

Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine

A Sequenciação em Análises Clínicas

Plant Biotechnology Lecture 2

Introduction to Sequencing

Chapter 20: DNA Technology and Genomics

Standard (Sanger) sequencing

SBI4U0 Biotechnology.

Presentation transcript:

Bioinformatics and Sequencing Relevant to SolCAP C. Robin Buell Department of Plant Biology Michigan State University East Lansing MI 48824 buell@msu.edu

An Overview of DNA Sequencing

Prokaryotic DNA Plasmid DNA occurs in long pieces called chromosomes. Chromosomes consist of the DNA and proteins. The DNA is tightly coiled. To see the DNA better this picture shows the DNA uncoiled. DNA exits as a double-helix. Untwisting the double-helix enables us to see it’s chemical structure. http://en.wikipedia.org/wiki/Image:Prokaryote_cell_diagram.svg

Eukaryotic DNA DNA occurs in long pieces called chromosomes. Chromosomes consist of the DNA and proteins. The DNA is tightly coiled. To see the DNA better this picture shows the DNA uncoiled. DNA exits as a double-helix. Untwisting the double-helix enables us to see it’s chemical structure. http://en.wikipedia.org/wiki/Image:Plant_cell_structure_svg.svg

DNA Structure The two strands of a DNA molecule are held together by weak bonds (hydrogen bonds) between the nitrogenous bases, which are paired in the interior of the double helix. The two strands of DNA are antiparallel; they run in opposite directions. The carbon atoms of the deoxyribose sugars are numbered for orientation. The uprights of this “ladder” are alternating molecules of a deoxyribose sugar (the blue pentagons) and a phosphate molecule (the yellow circles). The rungs of the “ladder” are the nitrogen bases: adenine, cytosine, guanine, and thymine. We abbreviate these base names as A, C, G, and T. Adenine only forms a bond with Thymine. Cytosine only bonds with Guanine. DNA easily splits apart up the middle of the ladder. Hydrogen bonds are easy to break. It does this naturally when it makes a copy of itself (replicates). Raising the temperature of the DNA will also cause it to come apart (denature). We take advantage of these properties of DNA to discover the order of the bases (sequencing). http://en.wikipedia.org/wiki/Image:DNA_chemical_structure.png

Sequencing DNA The goal of sequencing DNA is to tell the order of the bases, or nucleotides, that form the inside of the double-helix molecule. High throughput sequencing methods -Sanger/Dideoxy -Next Generation (NextGen)

Whole Genome Shotgun Sequencing Start with a whole genome Shear the DNA into many different, random segments. Sequence each of the random segments. Then, put the pieces back together again in their original order using a computer For now just think of a BAC as part of a genome.

SEQUENCER OUTPUT ASSEMBLE FRAGMENTS Fill in any gaps Annotate genes TAGCTAGC AGCTAGC AGCTAGGCTC AGCTCGCTAGCTAGCTAGCTAGCTAGGCTC AGCTCGCTA TAGCTAGCTA CTAGCTAGCTAGGCTC GCTAGCTAGCT CTCGCTAGCTAG AGCTCGCTAGCTA GCTAGCTAGC Gene 1 Gene 2 Gene 3…… Fill in any gaps Annotate genes

Theory Behind Shotgun Sequencing Haemophilus influenzae 1.83 Mb base Coverage unsequenced (%) 1X 37% 2X 13% 5X 0.67% 6X 0.25 7X 0.09% For 1.83 Mb genome, 6X coverage is 10.98 Mb of sequence, or 22,000 sequencing reactions, 11000 clones (1.5-2.0 kb insert), 500 bp average read.

Sanger Dideoxy Sequencing reactions -Initial dideoxy sequencing involved use of radioactive dATP and 4 separate reactions (ddATP, ddTTP, ddCTP, ddGTP) & separation on 4 separate lanes on an acrylamide gel with detection through autoradiogram -New techologies use 4 fluorescently labeled bases and separation on capillaries and detection through a CCD camera

Sanger Dideoxy DNA sequencing

Data Analysis An chromatogram is produced and the bases are called Software assign a quality value to each base Phred & TraceTuner Read DNA sequencer traces Call bases Assign base quality values Write basecalls and quality values to output files. JTrace and Phred softwares call the bases according to the peaks in the chromatogram and assign a quality value for each of the bases called.

GOOD BAD

454 Genome Sequencing System Library prep, amplification and sequencing: 2-4 days Single sample preparation from bacterial to human genomic DNA Single amplification per genome with no cloning or cloning artifacts Picoliter volume molecular biology 400 Mb per run (4-5 hr); less than $ 15,000 per run Read lengths 200-230 bases; new Titanium platform, 400 Mb per run, 400-500 bases per reads Massively parallel imaging, fluidics and data analysis Requires high genome coverage for good assembly Error rate of 1-2% Problem with homopolymers

454-Pyrosequencing Perform emulsion PCR Construct Depositing DNA Beads into the PicoTiter™Plate Construct Single stranded adaptor liagated DNA Sequencing by Synthesis: Simultaneous sequencing of the entire genome in hundreds of thousands of picoliter-size wells Pyrophosphate signal generation

Solexa/lIlumina Sequencing Sequencing by synthesis (not chain termination) Generate up to 12 Gb per run

Other “Next Generation” Sequencing Technologies SoLiD by Applied Biosystems- short reads (~25-75 nucleotides) Helicos- short reads (<50 nucleotides) Pacific Biosystems-LONG reads (several kilobases)

Potato Wheat: Rice: 430 Mb 850 Mb 16,000 Mb 5 Mb How much sequences are needed to assemble a eukaryotic genome? -Depends on the genome size of the organism (genes plus repeats), ploidy level, heterozygosity, desired quality Potato 850 Mb Wheat: 16,000 Mb Rice: 430 Mb 5 Mb Arabidopsis 130 Mb John Doe 2,500 Mb

Eukaryotic Genomes and Gene Structures Intergenic Region

Expressed Sequence Tags (ESTs): Sampling the Transcriptome and Genic Regions What is an EST? single pass sequence from cDNA specific tissue, stage, environment, etc. pick individual clones T7 T3 template prep cDNA library in E.coli Insert in pBluescript Multiple tissues, states.. with enough sequences, can ask quantitative questions

Uses of EST sequencing: -Gene discovery -Digital northerns/insights into transcriptome -Genome analyses, especially annotation of genomic DNA -SNP discovery in genic regions Issues with EST sequencing: -Inherent low quality due to single pass nature -Not 100 % full length cDNA clones -Redundant sequencing of abundant transcripts Address through clustering/ assembly to build consensus sequences = Gene Index, Unigene Set, Transcript Assembly

Locus/Gene Gene models Full length cDNAs Expressed Sequence Tags

Types of Genomic/DNA-based Diagnostic Markers Restriction Fragment Length Polymorphisms (RFLPs) Random Amplification of Polymorphic DNA (RAPDs) Cleaved Amplified Polymorphisms (CAPs) Amplified Fragment Length Polymorphisms (AFLPs) Simple Sequence Repeats (SSRs; microsatellites) Single Nucleotide Polymorphisms (SNPs)

-Look for size polymorphisms SSRs -Specific primers that flank simple sequence repeat (mono-, di-, tri-, tetra-, etc) which has a higher likelihood of a polymorphism -Amplify genomic DNA -Separate on gel -Look for size polymorphisms http://cropandsoil.oregonstate.edu/classes/css430/images/0902.jpg http://www.nal.usda.gov/pgdic/Probe/v2n1/chart.gif

SSRs -Computational prediction of SSRs in potato transcriptome data -http://solanaceae.plantbiology.msu.edu/analyses_ssr_query.php

-Detect mismatch (many methods for this) SNPs -Specific primers -Amplify genomic DNA -Detect mismatch (many methods for this) http://cmbi.bjmu.edu.cn/cmbidata/snp/images/SNP.gif

Potato SNPs: Intra-varietal and inter-varietal -Bulk of sequence data from ESTs (Sanger derived) -Use computational methods to identify SNPs within existing potato ESTs -http://solanaceae.plantbiology.msu.edu/analyses_snp.php

Illumina Paired End RNA-Seq Potato Varieties: Atlantic, Premier, Snowden Two Paired End RNA-Seq runs were performed. Reads are 61bp long Insert sizes: Atlantic: 350bp Premier: 300bp Snowden: 300bp Paired End Sequencing is carried out by an Illumina module that regenerates the clusters after the first run and sequences the clusters from the other end.

Velvet Assemblies of Potato Illumina Sequences With a read length of 31 and a minimum contig length of 150bp: Atlantic: 45214 contigs Premier: 54917 contigs Snowden: 58754 contigs

Sequence quality: Viewing a Atlantic potato contig from the Velvet assembly

Identify intra-varietal SNPs Query SNPs Filtered SNPs Atlantic Asm 224748 150669 Premier Asm 265673 181800 Snowden Asm 258872 166253 A/C SNP

Hawkeye Viewer – Visualizing SNPs G/T SNP

Analyses in progress SNP Identification: -Identify inter-varietal SNPs using draft genome sequence from S. phureja -Identify only biallelic SNPs -Identify high confidence SNPs -Identify SNPs that meet Infinium design requirements SNP Selection: -Annotate transcripts for gene function -Identify candidate genes within SNP set