By Alfonso Farrugio, Hieu Nguyen, and Antony Vydrin Sequencing Technologies and Human Genetic Variation.

Slides:



Advertisements
Similar presentations
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Advertisements

1 Analyzing Kleinberg’s Small-world Model Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
PCR Polymerase Chain Reaction Mariam Cortes Tormo Miami Children’s Hospital Research institute 2013.
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
CSE182-L17 Clustering Population Genetics: Basics.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
DNA Sequencing. CS273a Lecture 3, Autumn 08, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
EXAMPLE 1 Identify direct variation equations
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Todd J. Treangen, Steven L. Salzberg
Short Tandem Repeats (STR) and Variable Number Tandem Repeats (VNTR)
1 A Robust Framework for Detecting Structural Variations February 6, 2008 Seunghak Lee 1, Elango Cheran 1, and Michael Brudno 1 1 University of Toronto,
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
Creating Variation Sexual Reproduction & Mutations.
Solving Equations Medina1 Variables on Both Sides.
Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Identification of Copy Number Variants using Genome Graphs
Lecture 4 Haplotype assembly. Variation calling, diploid genomes CAGCTACATCACGAGCATCGACGAGCTAGCGAGCGATCGCGA CAGCTACATAACGAGCATCGACCAGCTAGCGAGCTATCGCCA.
Solve an equation by combining like terms EXAMPLE 1 8x – 3x – 10 = 20 Write original equation. 5x – 10 = 20 Combine like terms. 5x – =
Solving Equations Medina1 Multi-Step Equations. Steps to solve Medina2 3. Use inverse of addition or subtraction You may not have to do all the steps.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Ch 13-1, 13-4 & 14-1: Changing the Living World, Genetic Engineering, Human Molecular Genetics Essential Questions: What is the purpose of selective breeding?
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
How does DNA copy itself?. The DNA molecule “unzips” as the rungs of the ladder separate and the molecule splits into two single strands. How DNA copies.
Chapter 2 Genetic Variations. Introduction The human genome contains variations in base sequence from one individual to another. Some sequence variants.
DNA Replication “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Canadian Bioinformatics Workshops
Solving Equations with Addition or Subtraction Medina1.
KGEM: an EM Error Correction Algorithm for NGS Amplicon-based Data Alexander Artyomenko.
Mutation. What you need to know How alteration of chromosome number or structurally altered chromosomes can cause genetic disorders How point mutations.
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
Naotoshi Seo, Hiroshi Toyoizumi Performance Evaluation Laboratory
Genome alignment Usman Roshan.
Pairwise and NGS read alignment
Random Variates 2 M. Overstreet Spring 2005
It’s Friday! Today’s Warm up!
Molecular Phylogenetics
Genetic Variations with Populations
Discovery tools for human genetic variations
PowerPoint Slide Shuffler
Canadian Bioinformatics Workshops
2.7 The Distributive Property
GENETIC VARIATION Sources of Variation.
Chromosomal Mutations
Math is Magic!!!! Everyone pick a number – any number you want!
Key NGS principles. Key NGS principles. A and B, identification of structural variants: Longer (paired-end or mate-pair) sequencing reads are more adept.
Presentation transcript:

By Alfonso Farrugio, Hieu Nguyen, and Antony Vydrin Sequencing Technologies and Human Genetic Variation

Overview  Introduction  Simulating genomic variation and sequencing  Analyzing and comparing different sequencing technologies  Algorithms for detecting human genetic variation

Introduction  Different people have different mutations in their genomes  A recent study was done (Nature 453, 56-64, 5/1/2008) where 8 human genomes were compared, and 1,695 structural variants were found

 Whole-genome shotgun sequencing allows for fast and relatively cheap sequencing of human genomes  New technologies are being developed to allow for accurate detection of human genomic variation  Most of these technologies use short paired reads.  How long should the reads be in order to optimize the process of detecting human genomic variation ?  What algorithms can be used to detect variations in a new individual’s genome ?

Simulating Genomic Variation  Program to take a human genome and add randomly-distributed inversions, insertions, deletions, and SNPs  The number of mutations (and their mean lengths) can be controlled by the user  To simplify, no two mutations can overlap each other (the SNPs are an exception)

InversionsInsertionsDeletions “Intermediate” mutated genome Original genome

Subtract Deletions “Intermediate” mutated genome

SNPs “Intermediate” mutated genome (output mutated genome)

Simulating Genomic Sequencing  Program to take a human genome and create paired reads (output read pairs to a file)  The read lengths are all identical, and the separation between reads in a pair is picked randomly based on a normal distribution  The program can simulate sequencing errors when creating the paired reads

Simulating Genomic Sequencing  The user can control the total number of reads, read lengths, the mean of the read separations, and sequencing error rate

Genome to be sequenced Choose uniformly - distributed random locations

Genome to be sequenced Create read pair at each location. Choose random direction for each read L L d1d1 L is a constant while d is random (normally distributed) Read direction

L L d2d2 L L d3d3

L L d2d2 L L d3d3 L L d1d1 Resulting paired reads

L L d2d2 L L d3d3 L L d1d1 Paired reads with simulated sequencing errors