BioSci D145 Lecture #4 Bruce Blumberg

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

The Past, Present, and Future of DNA Sequencing
Bioinformatics Lectures at Rice
Next-generation sequencing
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
The Human Genome Project and ~ 100 other genome projects:
Bacterial Physiology (Micr430)
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Introduce to Microarray
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
with an emphasis on DNA microarrays
BioSci D145 Lecture #5 Bruce Blumberg
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
BioSci D145 lecture 1 page 1 © copyright Bruce Blumberg All rights reserved BioSci D145 Lecture #4 Bruce Blumberg –4103 Nat Sci.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Data Type 1: Microarrays
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Microarray Technology
DNA Chips Attach DNA to tiny spots on glass slides (i.e., chip). Hybridize fluorescently-labeled DNA probes to chip. Detect hybridization to different.
Monday Human and chimp DNA is ~98.7 similar, But, we differ in many and profound ways, Can this difference be attributed, at least in part, to differences.
Genomics I: The Transcriptome
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Large Scale Gene Expression with DNA Microarrays Vermont Genetics Network Microarray Outreach Program.
MICROARRAY TECHNOLOGY
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays. The Central Dogma.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
BioSci 145B lecture 5 page 1 © copyright Bruce Blumberg All rights reserved BioSci 145B Lecture #5 5/4/2004 Bruce Blumberg –2113E McGaugh Hall -
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Lecture 23 – Functional Genomics I Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
BioSci D145 lecture 1 page 1 © copyright Bruce Blumberg All rights reserved BioSci D145 Lecture #3 Bruce Blumberg –4103 Nat Sci.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
Microarray: An Introduction
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
The Central Dogma. Life - a recipe for making proteins DNA protein RNA Translation Transcription.
Research Techniques Made Simple: Next-Generation Sequencing:
Lesson: Sequence processing
Next generation sequencing
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
BioSci D145 Lecture #4 Bruce Blumberg
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Section 3: Gene Technologies in Detail
Very important to know the difference between the trees!
Volume 1, Issue 1, Pages (February 2002)
Functional Genomics in Evolutionary Research
Microarray Technology and Applications
Cloning Overview DNA can be cloned into bacterial plasmids for research or commercial applications. The recombinant plasmids can be used as a source of.
Lecture 11 By Shumaila Azam
Today… Review a few items from last class
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
Introduction to Microarrays.
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
Next-generation DNA sequencing
Introduction to Sequencing
Human Genome Project Seminal achievement. Scientific milestone.
Data Type 1: Microarrays
Design Issues Lecture Topic 6.
Presentation transcript:

Bruce Blumberg (blumberg@uci.edu) BioSci D145 Lecture #4 Bruce Blumberg (blumberg@uci.edu) 4103 Nat Sci 2 - office hours Tu, Th 3:30-5:00 (or by appointment) phone 824-8573 TA – Riann Egusquiza (regusqui@uci.edu) 4351 Nat Sci 2– office hours M 1:45-3:45 Phone 824-6873 check e-mail and noteboard daily for announcements, etc.. Please use the course noteboard for discussions of the material Updated lectures will be posted on web pages after lecture http://blumberg-lab.bio.uci.edu/biod145-w2018 http://blumberg.bio.uci.edu/biod145-w2018/ Last year’s midterm is now posted. Term paper outlines due Friday (2/2) by midnight. Dropbox is open BioSci D145 lecture 1 page 1 ©copyright Bruce Blumberg 2014. All rights reserved

Why should any funding agency give you money to pursue this research? Term paper outline Title of your proposal A paragraph introducing your topic and explaining why it is important; i.e., what impact will the knowledge gained have. Why should any funding agency give you money to pursue this research? NIH now requires a statement of human health relevance for all grant applications NSF wants to know what is the intellectual merit of your proposed research and what broader impacts of your proposed research Present your hypothesis A supposition or conjecture put forth to account for known facts; esp. in the sciences, a provisional supposition from which to draw conclusions that shall be in accordance with known facts, and which serves as a starting-point for further investigation by which it may be proved or disproved and the true theory arrived at. Enumerate 2-3 specific aims in the form of questions that test your hypothesis At least one of these aims needs to have a strong “whole genome” component BioSci D145 lecture 4 page 2 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Modern DNA sequence analysis Cycle sequencing Virtually all small-scale DNA sequencing (i.e., when we send DNA to be sequenced) is done by cycle sequencing with fluorescent ddNTPs ABI Big Dye chemistry Capillary sequencers predominant form of technology in use But, next generation sequencing has displaced old technology in genome centers. Solexa (Illumina) 454 sequencing (Roche) Ion Torrent (some University cores still use this) 3rd generation sequencing (individual DNA molecule) now available e.g., Pacific Biosciences (sequence reads of 1,000-10K bases) Oxford Nanopore (sequence reads of up to 100K bases) BioSci D145 lecture 4 page 3 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Other sequencing technologies Sequencing by hybridization Construct a high-density microchip with all possible combinations of a short oligonucleotide Up to 25-mers By photolithography Synthesized on chip directly Label and hybridize fragment to be sequenced Wash stringently Read fluorescent spots Reconstruct sequence by computer BioSci D145 lecture 5 page 4 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Other sequencing technologies (contd) Sequencing by hybridization rarely used for de novo sequencing Extremely fast and useful to sequence something you already know the sequence of but want to identify mutation - resequencing Disease causing changes e.g in mitochondrial DNA SNP discovery Works best for examining sequence of <10 kb BioSci D145 lecture 5 page 5 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Other sequencing technologies (contd) http://www.affymetrix.com/products/arrays/index.affx SNP discovery Photo shows mitochondrial chip Right panel shows pairs of normal (top) vs disease (bottom) (Leber’s Hereditary Optic Neuropathy) Top 3 disease mutations Bottom control with no change BioSci D145 lecture 5 page 6 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Other sequencing technologies – Next Generation sequencing 2nd generation = high throughput, short sequences 3rd generation = single molecule sequencing Small number of sequence templates (thousands) but very long reads (~105 bp) What is the immediate implication of this technology for genome assembly? Key review is Metzger, M.L. (2010) Sequencing technologies — the next generation, Nature Reviews Genetics 11, 31-46. We should now be able to completely sequence large insert clones directly and avoid fragmentation by repetitive elements! BioSci D145 lecture 5 page 7 ©copyright Bruce Blumberg 2004-2007. All rights reserved

3rd generation

Other sequencing technologies (contd) Illumina (Solexa) sequencing https://www.illumina.com/content/dam/illumina-marketing/documents/products/illumina_sequencing_introduction.pdf Based on synthesis of complementary strand to a template (like Sanger) Detection of elongation with labeled terminators Steps Library generation - fragment genome to appropriate size (depends on application) and add adapters to each end Cluster generation – capture fragments on lawn of oligos and amplify Sequencing – reversible terminator Data analysis – align reads to reference genome Analysis of reads BioSci D145 lecture 5 page 9 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Other sequencing technologies (contd) Illumina sequencing (contd) Library preparation – fragment target and add adapters. Can multiplex to gain additional capacity That is, Hiseq-X can generate 1.8 Tb sequence per run, but we don’t need this much for most applications so use different adapters and “bar-code” samples. This way, you can get many sequences from one run and then deconvolute them also has advantage of removing batch effects Can direclty compare all sequences with each other because they come from same run of machine. BioSci D145 lecture 5 page 10 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Bar coding sequence analysis BioSci D145 lecture 5 page 11 ©copyright Bruce Blumberg 2004-2017. All rights reserved

Other sequencing technologies (contd) Deep sequencing - what is the point? Can generate huge number of reads in parallel iSeq100 – 1.2 Gb (4 million reads/run, 2 x 150 bp) Miniseq – 7.5 Gb (25 million reads/run, 2 x 150 bp) MiSeq – 15 gb (15 million reads/run, 2 x 300 bp) NextSeq – 120 Gb (400 million reads/run, 2 x 150 bp) HiSeq – 1.5 Tb (5 billion/run, 2 x 150 bp) HiseqX – 1.8 Tb (6 billion/run, 2 x 150 bp) Novaseq – 6.0 Tb (20 billion/run, 2 x 150 bp) What is massively parallel sequencing good for? Rapid sequencing of genomes, or resequencing of known sequences Ancient DNA (even dinosaurs? – Svante Pääbo says ~200K years is limit) ChIP-sequencing (week 6) Sequencing ESTs or other tags Determining microbial diversity in field samples Transcriptome sequencing Identifying variations in Viral populations Gene sequences in mixed populations BioSci D145 lecture 5 page 12 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Idea is to sequence many copies of the same thing Gene sequence Amplicon sequencing Idea is to sequence many copies of the same thing Gene sequence mRNA transcript BioSci D145 lecture 5 page 13 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Amplicon sequencing (contd) What is amplicon sequencing good for? Discovery of rare somatic mutations in complex samples (e.g., cancerous tumors - mixed with germline DNA) based on ultra-deep sequencing of amplicons Sequencing collections of exons from populations of individuals to identify diversity Sequencing collections of human exons from populations of individuals for the identification of rare alleles associated with disease Analysis of viral quasispecies present within infected populations in the context of epidemiological studies Evolutionary biology in populations BioSci D145 lecture 5 page 14 ©copyright Bruce Blumberg 2004-2007. All rights reserved

Consensus from all sources ~30K Number of genes C. elegans – 19,000 The human genome In Feb 12 2001, Celera and Human Genome project published “draft” human genome sequencs Celera -> 39114 Ensembl -> 29691 Consensus from all sources ~30K Number of genes C. elegans – 19,000 Arabidopsis - 25,000 Predictions had been from 50-140k human genes What’s up with that? Are we only slightly more complicated than a weed? How can we possibly get a human with less than 2x the number of genes as C. elegans Implications? UNRAVELING THE DNA MYTH: The spurious foundation of genetic engineering, Barry Commoner, Harpers Magazine Feb, 2002 BioSci D145 lecture 4 page 15 ©copyright Bruce Blumberg 2004-2016. All rights reserved

The answer – Gene sets don’t overlap completely (duh) Floor is 42K The human genome The answer – Gene sets don’t overlap completely (duh) Floor is 42K 130029build #236 UniGene Clusters (from EST and mRNA sequencing) http://www.ncbi.nlm.nih.gov/unigene Up from 123,459 in 2013 (85,793, 105,680, 128,826, 123,891 previous years) (“final” count Important questions to be answered about what constitutes a “gene” Crick genes? DNA-RNA-protein How about RNAs? miRNAs? Antisense transcripts? lncRNAs? = 42113 BioSci D145 lecture 4 page 16 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Genome sequencing(contd) Whole genome shotgun sequencing (Celera) premise is that rapid generation of draft sequence is valuable why bother trying to clone and sequence difficult regions? Basically just forget regions of repetitive DNA - not cost effective using this approach, genomes rarely are completely finished rule of thumb is that it takes at least as long to finish the last 5% as it took to get the first 95% problems sequence may never be complete as is C. elegans much redundant sequence with many sparse regions and lots of gaps. Fragment assembly for regions of highly repetitive DNA is dubious at best “Finished” fly and human genomes lack more than a few already characterized genes BioSci D145 lecture 4 page 17 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Genome sequencing (contd) Knowing what we know now – how to approach a large new genome? Xenopus tropicalis 1.7 Gb (about ½ human) BAC end sequencing Whole genome shotgun HAPPY mapping and radiation hybrid mapping to order scaffolds Gaps closed with BACs 8.5 x coverage (but > 9000 scaffolds for 18 chromosomes) Finishing now in process But how “finished” will it be? 2016 update – now version 9.0 FINALLY integrated BAC end sequences Integrated genetic map 50% of contigs > 72 kb Xenopus laevis – v9.1 – >90% of genome in chromosomal scaffolds 2 “subgenomes” fully characterized. annotation remains a big challenge. BioSci D145 lecture 4 page 18 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Human genome, mouse, rat, Drosophila, C. elegans “finished” Functional Genomics - Analysis of gene function on a whole genome basis Genome projects DNA sequencing Human genome, mouse, rat, Drosophila, C. elegans “finished” model organisms progressing rapidly (axolotl just finished) Lots of new genes, but many lack known function Functional genomics Identification of gene functions associate functions with new genes coming from genome projects function of genes identified from characterizing diseases or mutants Identification of genes by their function discovery of new genes BioSci D145 lecture 4 page 19 ©copyright Bruce Blumberg 2004-2016. All rights reserved

*Methods of profiling gene expression – large scale to whole genome What are the possibilities Array – micro or macro Sequence sampling (EST generation) SAGE – serial analysis of gene expression Massively parallel signature sequencing (RNA-seq, Illumina, 454) DNA microarray analysis was, until now totally dominant method Two basic flavors Spotted (spot DNA onto support) cDNA microarrays Oligonucleotide arrays Moderately expensive Synthesized (use photolithography to synthesize oligos onto silicon or other suitable support Affymetrix Gene Chips dominate VERY expensive Both are in wide use and suitable for whole genome analysis BioSci D145 lecture 4 page 20 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Source material is prepared cDNAs are PCR amplified OR Spotted arrays Source material is prepared cDNAs are PCR amplified OR Oligonucleotides synthesized Spotted onto treated glass slides RNA prepared from 2 sources Test and control Labeled probes prepared from RNAs Incorporate label directly Or incorporate modified NTP and label later Or chemically label mRNA directly Hybridize, wash, scan slide Express as ratio of one channel to other after processing BioSci D145 lecture 4 page 21 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Stanford type microarrayer DNA microarray types Stanford type microarrayer http://cmgm.stanford.edu/pbrown/mguide/index.html Printing method Reminiscent of fountain pen BioSci D145 lecture 4 page 22 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Amino-allyl labeled 1st strand cDNA Strategy to identify RAR target genes Agonist - TTNPB Antgonist - AGN193109 Harvest st 18 Poly A+ RNA Poly A+ RNA Amino-allyl labeled 1st strand cDNA Amino-allyl labeled 1st strand cDNA Alexa Fluor 555 (cy3) Alexa Fluor 647 (cy5) Alexa Fluor 555 (cy3) Alexa Fluor 647 (cy5) Probe microarrays upregulated downregulated BioSci D145 lecture 4 page 23 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Statistical analysis of output – VERY IMPORTANT! DNA microarray Statistical analysis of output – VERY IMPORTANT! Replicates are very important Preprocessing of data is needed To remove spurious signals BioSci D145 lecture 4 page 24 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Custom arrays possible and affordable DNA microarray Advantages Custom arrays possible and affordable Ratio of fluorescence is robust and reproducible Disadvantages Availability of chips Expense of production on your own Technical details in preparation BioSci D145 lecture 4 page 25 ©copyright Bruce Blumberg 2004-2016. All rights reserved

High density arrays are synthesized directly on support Affymetrix GeneChips High density arrays are synthesized directly on support 4 masks required per cycle -> 100 masks per chip (25-mers) Modern CPU (core i7) requires about 50 masks G.P. Li in Engineering directs a UCI facility that can make just about anything using photolithography BioSci D145 lecture 4 page 26 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Affymetrix GeneChips Streptavidin/phycoerythrin BioSci D145 lecture 4 page 27 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Each gene is represented by a series of oligonucleotide pairs Affymetrix GeneChips Each gene is represented by a series of oligonucleotide pairs One perfect match One with a single mismatch Only hybridization to perfect match but not mismatch is considered to be real Gene is considered “detected” if > ½ of oligo pairs are positive Number of pairs depends on organism and how well characterized array behavior is Human uses 8 pairs Xenopus uses 16 pairs BioSci D145 lecture 4 page 28 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Result is in single color Affymetrix GeneChips Result is in single color Always need two chips – control and experimental for each condition Also need replicates for each condition For diverse biological samples (e.g., humans) 10 replicates required! For less diverse samples (cell lines) probably 5 replicates needed Advantages Commercially available Standardized Disadvantages About $700 to buy, probe and process each chip (at UCI)! About $500 elsewhere May not be available for your organism of interest No ability to compare probes directly on the same chip Must rely on technology BioSci D145 lecture 4 page 29 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Identifying genes expressed in one condition vs. another DNA microarrays What are they good for? Identifying genes expressed in one condition vs. another One tissue vs. another (heart vs liver) Tissue vs. tumor (liver vs. hepatocarcinoma) In response to a treatment (e.g., RA) In response to disease (e.g., after viral infection) Building expression profiles Tissues Cancers Developmental stages Expressed genes Identifying organisms in food Array can identify which animals are present in a mix http://www.dnavision.com/files/FOODIDBrosh%20En.pdf BioSci D145 lecture 4 page 30 ©copyright Bruce Blumberg 2004-2016. All rights reserved

What are they good for? (contd) DNA microarrays What are they good for? (contd) Response of animal to drugs or chemicals Toxicogenomics Pharmacogenomics Diagnostics SNP analysis to identify disease loci Specific testing for known diseases BioSci D145 lecture 4 page 31 ©copyright Bruce Blumberg 2004-2016. All rights reserved

Signal intensity (or signal/noise) Improved dyes, label uniformly DNA microarrays What are the limitations of microarray technology? What sorts of factors might confound the experiment? Signal intensity (or signal/noise) Improved dyes, label uniformly Biological variation (samples are inherently different) Sufficient # of replicates is key keep individuals separate Not all mRNAs will be present at sufficient levels to detect Amplification, but beware of bias Good statistical analysis is required Bayesian statistics are best (Pierre Baldi is local expert) calculating the probability of a new event on the basis of earlier probability estimates which have been derived from empiric data i.e., don’t assume random distribution in datasets, calculate probability based on real data Bayesian approach great for small number of replicates, converges on t-test at high number of replicates http://cybert.microarray.ics.uci.edu/ BioSci D145 lecture 4 page 32 ©copyright Bruce Blumberg 2004-2016. All rights reserved