Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Assessment Report Biology School of Science and Mathematics Rey Sia, Chair Laurie B. Cook, Assessment Coordinator.
Welcome Each of You to My Molecular Biology Class.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
The local police have recovered these three body parts from two backyards in Madison. Break into your groups and answer the following questions: - How.
PRIORITIZING REGIONS OF CANDIDATE GENES FOR EFFICIENT MUTATION SCREENING.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
The DNA Bead String Exercise Anita L DeStefano, PhD Department of Biostatistics BU School of Public Health Co-Director Biostatistics Program Associate.
Next-generation sequencing
Next Generation Sequencing, Assembly, and Alignment Methods
BIOINFORMATICS Ency Lee.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Bioinformatics in the Biology Curriculum Gloria Rendon NCSA July 2008.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Unit 1 Biology Notes Characteristics of Life
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Dr. F. HessVMDL: Gravity & Atmospheric Gases 1 USING THE VIRTUAL MOLECULAR DYNAMICS LABORATORY COMPUTER SIMULATION.
High Throughput Sequencing
Next generation sequencing Why? What? How? Marcel Dinger Developmental Biology Divisional Seminar 7 October 2010.
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
BACKGROUND Have a gene involved in neurological disease, its function unclear Knockout is lethal, so… Designed a conditional knockout (cKO) mouse where.
Basic Statistics Michael Hylin. Scientific Method Start w/ a question Gather information and resources (observe) Form hypothesis Perform experiment and.
Purdue University Ann Rundell, Assistant Professor Workshop on BME Teaching of Innovation, Design & Entrepreneurship.
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Todd J. Treangen, Steven L. Salzberg
Genomics – Next-Gen sequencing and Microarrays
Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.
Next-Generation Sequencing: Methodology and Application
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Hua Bao Sun Yat-sen University, Guangzhou,
The iPlant Collaborative
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Research Problem Integrate Core Concepts and throughout the Biology Curriculum (Vision and Change, AAAS, 2009) Conceptual learning is is an understanding.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Lecture 12 RNA – seq analysis.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
L ESSON A IMS & O BJECTIVES Two part lab: First part will be completed in class today. (1) Use the online Bioinformatics tool ClustalW to analyze DNA sequences.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
General Education Assessment Report Assessment Cycle.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Lesson: Sequence processing
Sequencing technologies
Computational Reasoning in High School Science and Math
Sequencing technology and assembly
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
A novel counting algorithm to detect common fetal trisomies
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Summary of the Standards of Learning
Introduction to Bioinformatic
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
BF528 - Applications in Translational Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland

Next-generation sequencing has revolutionized both biological research and clinical medicine, with sequencing of entire human genomes being used to predict drug responsiveness and to diagnose disease (for example Choi 2009).

In contrast to traditional Sanger sequencing, next-generation sequencing datasets have shorter read lengths and higher error rates. This can create challenges for downstream analysis since even a small error rate will result in a large number of sequencing reads that contain errors due to the abundance of sequencing reads. Indeed, Illumina MiSeq data produces reads with an error rate of 0.1% (Glenn 2011), yet this corresponds to only ~85% of the 150 bp sequencing reads ( ) being error-free. Sequencing error in read

 This module is designed for a genetics or molecular biology class. It will require 3 lecture/seminar class periods with optional additional Linux-based lab activities  Prior to beginning this module, students should be familiar with:  Sample preparation techniques for DNA sequencing  DNA replication and the enzymes that synthesize DNA  Nucleic acid and nucleotide structure

 Initial evaluation of the quality of eukaryotic genome sequencing data  Implementation of error correction techniques  Comparison of the quality of sequencing data before and after error correction  Completed small eukaryotic genome data on Illumina platform  If students will not be performing command-line programming themselves, this data should be analyzed with:  Jellyfish to produce data on k-mer frequencies that students can use to generate a histogram in Excel  Quake to perform error correction so that students can be provided with pre- and post-error correction datasets

 At the completion of this module, students will be able to:  Describe the important differences between highthroughput and traditional (low throughput) experiments  Explain the reasons for variations in the quality of highthroughput datasets  Utilize computational tools to quantify errors in sequencing data  Interpret the quality of a sequencing experiment and be able to implement effective quality control measures

 Excel or other Analytical packages to create a k-mer frequency distribution  Galaxy to create a boxplot of PHRED33 scores  Optional: Quake and Jellyfish on Linux system to generate k-mer data and perform error correction

 This module will develop students’ abilities to:  Apply the process of science ▪ Design experiment from methodological design through data analysis ▪ Analyze and interpret data  Ability to use modeling and simulation ▪ Design experimental strategies and predict outcomes  Ability to use quantitative reasoning ▪ Depict data using histograms and boxplots ▪ Interpret graphs and use the results of their analysis to modify error correction strategies

 Intro to sequencing history and platforms  Discuss typical sources of error in sequencing reads  Discuss sequence output formats and PHRED33 scores  Upload raw data to Galaxy  Optional: Quake in Linux to manipulate parameters and improve quality essays/bringing-it-all-back-home-next- generation-sequencing-technology-and-you#

 Introduce software packages that can be used to assess data quality  Demonstrate breaking sequencing reads into k-mers  Use Excel or Jellyfish to create k-mer graph  Use Excel or Jellyfish to create k-mer graph following manipulation of error correction parameters (variations in k-mer size) K-mer frequency distibution

 Discussion of using PHRED33 scores to assess data quality  Create boxplots of PHRED33 scores in Galaxy for raw data  Create boxplots of PHRED33 scores in Galaxy for data post Quake correction  can have students compare outcomes following Quake correction with different parameters Raw Data Data post Quake correction

 Why has next-generation sequencing technology led to a revolution in biology/medicine?  Discuss and predict how chemical and physical mechanisms lead to errors  Comparison of sequence improvement based on different parameters  How do software packages determine which base is in error and which is correct if sequencing reads conflict?  Why is it important to have a numerical measure of error in addition to the nucleotide sequence?

 This module will be performed as a team-based project with students preparing and handing in a report at the end. Students will be able to:  Predict predominant types or sources of error based on experimental design and sequencing platform  Prepare a boxplot using Galaxy for an exemplary dataset and use the boxplot to evaluate the quality of the sequence data  Effectively improve the quality of any set of NGS reads prior to assembly

   Kenney DR, Schatz MC, Salzberg SL Quake:quality- aware detection and correction of sequencing errors. Genome Biology 11:R116  Marcais G, Kingsford C A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27: [Jellyfish program]  hspotlight_sequencing.pdf hspotlight_sequencing.pdf  mss-2586.pdf mss-2586.pdf