Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioSci D145 Lecture #4 Bruce Blumberg

Similar presentations


Presentation on theme: "BioSci D145 Lecture #4 Bruce Blumberg"— Presentation transcript:

1 Bruce Blumberg (blumberg@uci.edu)
BioSci D145 Lecture #4 Bruce Blumberg 4103 Nat Sci 2 - office hours Tu, Th 3:30-5:00 (or by appointment) phone TA – Riann Egusquiza 4351 Nat Sci 2– office hours M 1-3 Phone check and noteboard daily for announcements, etc.. Please use the course noteboard for discussions of the material Updated lectures will be posted on web pages after lecture Last year’s midterm is now posted. Term paper outlines due Friday (2/3) by midnight. BioSci D145 lecture 1 page 1 ©copyright Bruce Blumberg All rights reserved

2 Why should any funding agency give you money to pursue this research?
Term paper outline Title of your proposal A paragraph introducing your topic and explaining why it is important; i.e., what impact will the knowledge gained have. Why should any funding agency give you money to pursue this research? NIH now requires a statement of human health relevance for all grant applications NSF wants to know what is the intellectual merit of your proposed research and what broader impacts of your proposed research Present your hypothesis A supposition or conjecture put forth to account for known facts; esp. in the sciences, a provisional supposition from which to draw conclusions that shall be in accordance with known facts, and which serves as a starting-point for further investigation by which it may be proved or disproved and the true theory arrived at. Enumerate 2-3 specific aims in the form of questions that test your hypothesis At least one of these aims needs to have a strong “whole genome” component BioSci D145 lecture 4 page 2 ©copyright Bruce Blumberg All rights reserved

3 Modern DNA sequence analysis
Cycle sequencing Virtually all commercial DNA sequencing today is done by cycle sequencing with fluorescent ddNTPs ABI Big Dye chemistry Template preparation still tedious for small scale TempliPHi used in genome centers (obviated need for most automation) Capillary sequencers predominant form of technology in use But, next generation sequencing is already coming online and will rapidly displace old technology in genome centers. 454 sequencing (Roche) Solexa (Illumina) SoLID (Applied Biosystems) 3rd generation sequencing (individual DNA molecule) now available e.g., Pacific Biosciences (sequence reads of 1,000-10K bases) BioSci D145 lecture 4 page 3 ©copyright Bruce Blumberg All rights reserved

4 Other sequencing technologies
Sequencing by hybridization Construct a high-density microchip with all possible combinations of a short oligonucleotide Up to 25-mers By photolithography Synthesized on chip directly Label and hybridize fragment to be sequenced Wash stringently Read fluorescent spots Reconstruct sequence by computer BioSci D145 lecture 5 page 4 ©copyright Bruce Blumberg All rights reserved

5 Other sequencing technologies (contd)
Sequencing by hybridization rarely used for de novo sequencing Extremely fast and useful to sequence something you already know the sequence of but want to identify mutation - resequencing Disease causing changes e.g in mitochondrial DNA SNP discovery Works best for examining sequence of <10 kb BioSci D145 lecture 5 page 5 ©copyright Bruce Blumberg All rights reserved

6 Other sequencing technologies (contd)
SNP discovery Photo shows mitochondrial chip Right panel shows pairs of normal (top) vs disease (bottom) (Leber’s Hereditary Optic Neuropathy) Top 3 disease mutations Bottom control with no change BioSci D145 lecture 5 page 6 ©copyright Bruce Blumberg All rights reserved

7 Other sequencing technologies – Next Generation sequencing
2nd generation = high throughput, short sequences 3rd generation = single molecule sequencing Small number of sequence templates (thousands) but very long reads (~105 bp) What is the immediate implication of this technology for genome assembly? Key review is Metzger, M.L. (2010) Sequencing technologies — the next generation, Nature Reviews Genetics 11, We should now be able to completely sequence large insert clones directly and avoid fragmentation by repetitive elements! BioSci D145 lecture 5 page 7 ©copyright Bruce Blumberg All rights reserved

8 3rd generation

9 Other sequencing technologies (contd)
Illumina (Solexa) sequencing Based on synthesis of complementary strand to a template (like Sanger) Detection of elongation with labeled terminators Steps Library generation - fragment genome to appropriate size (depends on application) and add adapters to each end Cluster generation – capture fragments on lawn of oligos and amplify Sequencing – reversible terminator Data analysis – align reads to reference genome Analysis of reads BioSci D145 lecture 5 page 9 ©copyright Bruce Blumberg All rights reserved

10 Other sequencing technologies (contd)
Illumina sequencing (contd) Library preparation – fragment target and add adapters. Can multiplex to gain additional capacity That is, Hiseq-X can generate 1.8 Tb of data per run, but don’t need this much for most applications so use different adapters and “bar-code” samples. BioSci D145 lecture 5 page 10 ©copyright Bruce Blumberg All rights reserved

11 Bar coding sequence analysis
BioSci D145 lecture 5 page 11 ©copyright Bruce Blumberg All rights reserved

12 Other sequencing technologies (contd)
Deep sequencing What is the point? Can generate huge number of reads in parallel Miniseq – 7.5 Gb (25 million reads/run 2 x 150 bp) MiSeq – 15 gb (15 million reads/run 2 x 300 bp) NextSeq – 120 Gb (400 million reads/run 2 x 150 bp) HiSeq – 1.5 Tb (5 billion/run 2 x 150 bp) HiseqX – 1.8 Tb (6 billion/run 2 x 150 bp) What is massively parallel sequencing good for? Rapid sequencing of genomes, or resequencing of known sequences Ancient DNA (even dinosaurs? – Svante Pääbo says ~200K years is limit) ChIP-sequencing (week 6) Sequencing ESTs or other tags Determining microbial diversity in field samples Transcriptome sequencing Identifying variations in Viral populations Gene sequences in mixed populations BioSci D145 lecture 5 page 12 ©copyright Bruce Blumberg All rights reserved

13 Idea is to sequence many copies of the same thing Gene sequence
Amplicon sequencing Idea is to sequence many copies of the same thing Gene sequence mRNA transcript BioSci D145 lecture 5 page 13 ©copyright Bruce Blumberg All rights reserved

14 Amplicon sequencing (contd)
What is amplicon sequencing good for? Discovery of rare somatic mutations in complex samples (e.g., cancerous tumors - mixed with germline DNA) based on ultra-deep sequencing of amplicons Sequencing collections of exons from populations of individuals to identify diversity Sequencing collections of human exons from populations of individuals for the identification of rare alleles associated with disease Analysis of viral quasispecies present within infected populations in the context of epidemiological studies Evolutionary biology in populations BioSci D145 lecture 5 page 14 ©copyright Bruce Blumberg All rights reserved

15 Consensus from all sources ~30K Number of genes C. elegans – 19,000
The human genome In Feb , Celera and Human Genome project published “draft” human genome sequencs Celera -> 39114 Ensembl -> 29691 Consensus from all sources ~30K Number of genes C. elegans – 19,000 Arabidopsis - 25,000 Predictions had been from k human genes What’s up with that? Are we only slightly more complicated than a weed? How can we possibly get a human with less than 2x the number of genes as C. elegans Implications? UNRAVELING THE DNA MYTH: The spurious foundation of genetic engineering, Barry Commoner, Harpers Magazine Feb, 2002 BioSci D145 lecture 4 page 15 ©copyright Bruce Blumberg All rights reserved

16 The answer – Gene sets don’t overlap completely (duh) Floor is 42K
The human genome The answer – Gene sets don’t overlap completely (duh) Floor is 42K 130029build #236 UniGene Clusters (from EST and mRNA sequencing) Up from 123,459 in 2013 (85,793, 105,680, 128,826, 123,891 previous years) (“final” count Important questions to be answered about what constitutes a “gene” Crick genes? DNA-RNA-protein How about RNAs? miRNAs? Antisense transcripts? lncRNAs? = 42113 BioSci D145 lecture 4 page 16 ©copyright Bruce Blumberg All rights reserved

17 Genome sequencing(contd)
Whole genome shotgun sequencing (Celera) premise is that rapid generation of draft sequence is valuable why bother trying to clone and sequence difficult regions? Basically just forget regions of repetitive DNA - not cost effective using this approach, genomes rarely are completely finished rule of thumb is that it takes at least as long to finish the last 5% as it took to get the first 95% problems sequence may never be complete as is C. elegans much redundant sequence with many sparse regions and lots of gaps. Fragment assembly for regions of highly repetitive DNA is dubious at best “Finished” fly and human genomes lack more than a few already characterized genes BioSci D145 lecture 4 page 17 ©copyright Bruce Blumberg All rights reserved

18 Genome sequencing (contd)
Knowing what we know now – how to approach a large new genome? Xenopus tropicalis 1.7 Gb (about ½ human) BAC end sequencing Whole genome shotgun HAPPY mapping and radiation hybrid mapping to order scaffolds Gaps closed with BACS 8.5 x coverage (but > 9000 scaffolds for 18 chromosomes) Finishing now in process But how “finished” will it be? 2016 update – now version 9.0 FINALLY integrated BAC end sequences Integrated genetic map 50% of contigs > 72 kb Xenopus laevis – v9.1 – >90% of genome in chromosomal scaffolds 2 “subgenomes” fully characterized. annotation remains a big challenge. BioSci D145 lecture 4 page 18 ©copyright Bruce Blumberg All rights reserved

19 Human genome, mouse, rat, Drosophila, C. elegans “finished”
Functional Genomics - Analysis of gene function on a whole genome basis Genome projects DNA sequencing Human genome, mouse, rat, Drosophila, C. elegans “finished” model organisms progressing rapidly Lots of new genes, but many lack known function Functional genomics Identification of gene functions associate functions with new genes coming from genome projects function of genes identified from characterizing diseases or mutants Identification of genes by their function discovery of new genes BioSci D145 lecture 4 page 19 ©copyright Bruce Blumberg All rights reserved

20 *Methods of profiling gene expression – large scale to whole genome
What are the possibilities Array – micro or macro Sequence sampling (EST generation) SAGE – serial analysis of gene expression Massively parallel signature sequencing (RNA-seq, Illumina, 454) DNA microarray analysis was, until now totally dominant method Two basic flavors Spotted (spot DNA onto support) cDNA microarrays Oligonucleotide arrays Moderately expensive Synthesized (use photolithography to synthesize oligos onto silicon or other suitable support Affymetrix Gene Chips dominate VERY expensive Both are in wide use and suitable for whole genome analysis BioSci D145 lecture 4 page 20 ©copyright Bruce Blumberg All rights reserved

21 Source material is prepared cDNAs are PCR amplified OR
Spotted arrays Source material is prepared cDNAs are PCR amplified OR Oligonucleotides synthesized Spotted onto treated glass slides RNA prepared from 2 sources Test and control Labeled probes prepared from RNAs Incorporate label directly Or incorporate modified NTP and label later Or chemically label mRNA directly Hybridize, wash, scan slide Express as ratio of one channel to other after processing BioSci D145 lecture 4 page 21 ©copyright Bruce Blumberg All rights reserved

22 Stanford type microarrayer
DNA microarray types Stanford type microarrayer Printing method Reminiscent of fountain pen BioSci D145 lecture 4 page 22 ©copyright Bruce Blumberg All rights reserved

23 Amino-allyl labeled 1st strand cDNA
Strategy to identify RAR target genes Agonist - TTNPB Antgonist - AGN193109 Harvest st 18 Poly A+ RNA Poly A+ RNA Amino-allyl labeled 1st strand cDNA Amino-allyl labeled 1st strand cDNA Alexa Fluor 555 (cy3) Alexa Fluor 647 (cy5) Alexa Fluor 555 (cy3) Alexa Fluor 647 (cy5) Probe microarrays upregulated downregulated BioSci D145 lecture 4 page 23 ©copyright Bruce Blumberg All rights reserved

24 Statistical analysis of output – VERY IMPORTANT!
DNA microarray Statistical analysis of output – VERY IMPORTANT! Replicates are very important Preprocessing of data is needed To remove spurious signals BioSci D145 lecture 4 page 24 ©copyright Bruce Blumberg All rights reserved

25 Custom arrays possible and affordable
DNA microarray Advantages Custom arrays possible and affordable Ratio of fluorescence is robust and reproducible Disadvantages Availability of chips Expense of production on your own Technical details in preparation BioSci D145 lecture 4 page 25 ©copyright Bruce Blumberg All rights reserved

26 High density arrays are synthesized directly on support
Affymetrix GeneChips High density arrays are synthesized directly on support 4 masks required per cycle -> 100 masks per chip (25-mers) Pentium IV requires about 30 masks G.P. Li in Engineering directs a UCI facility that can make just about anything using photolithography BioSci D145 lecture 4 page 26 ©copyright Bruce Blumberg All rights reserved

27 Affymetrix GeneChips Streptavidin/phycoerythrin
BioSci D145 lecture 4 page 27 ©copyright Bruce Blumberg All rights reserved

28 Each gene is represented by a series of oligonucleotide pairs
Affymetrix GeneChips Each gene is represented by a series of oligonucleotide pairs One perfect match One with a single mismatch Only hybridization to perfect match but not mismatch is considered to be real Gene is considered “detected” if > ½ of oligo pairs are positive Number of pairs depends on organism and how well characterized array behavior is Human uses 8 pairs Xenopus uses 16 pairs BioSci D145 lecture 4 page 28 ©copyright Bruce Blumberg All rights reserved

29 Result is in single color
Affymetrix GeneChips Result is in single color Always need two chips – control and experimental for each condition Also need replicates for each condition For diverse biological samples (e.g., humans) 10 replicates required! For less diverse samples (cell lines) probably 5 replicates needed Advantages Commercially available Standardized Disadvantages About $700 to buy, probe and process each chip (at UCI)! About $500 elsewhere May not be available for your organism of interest No ability to compare probes directly on the same chip Must rely on technology BioSci D145 lecture 4 page 29 ©copyright Bruce Blumberg All rights reserved

30 Identifying genes expressed in one condition vs. another
DNA microarrays What are they good for? Identifying genes expressed in one condition vs. another One tissue vs. another (heart vs liver) Tissue vs. tumor (liver vs. hepatocarcinoma) In response to a treatment (e.g., RA) In response to disease (e.g., after viral infection) Building expression profiles Tissues Cancers Developmental stages Expressed genes Identifying organisms in food Array can identify which animals are present in a mix BioSci D145 lecture 4 page 30 ©copyright Bruce Blumberg All rights reserved

31 What are they good for? (contd)
DNA microarrays What are they good for? (contd) Response of animal to drugs or chemicals Toxicogenomics Pharmacogenomics Diagnostics SNP analysis to identify disease loci Specific testing for known diseases BioSci D145 lecture 4 page 31 ©copyright Bruce Blumberg All rights reserved

32 Signal intensity (or signal/noise) Improved dyes, label uniformly
DNA microarrays What are the limitations of microarray technology? What sorts of factors might confound the experiment? Signal intensity (or signal/noise) Improved dyes, label uniformly Biological variation (samples are inherently different) Sufficient # of replicates is key keep individuals separate Not all mRNAs will be present at sufficient levels to detect Amplification, but beware of bias Good statistical analysis is required Bayesian statistics are best (Pierre Baldi is local expert) calculating the probability of a new event on the basis of earlier probability estimates which have been derived from empiric data i.e., don’t assume random distribution in datasets, calculate probability based on real data Bayesian approach great for small number of replicates, converges on t-test at high number of replicates BioSci D145 lecture 4 page 32 ©copyright Bruce Blumberg All rights reserved

33 Other methods of transcriptome analysis - parallel
Microarray was once the dominant method Direct RNA sequencing methods are rapidly displacing microarrays SAGE (serial analysis of gene expression) Nanostring is modern implementation Short sequences RNAseq Directly sequence large numbers of RNAs Longer sequences SAGE Relies on generating many very short sequences and matching these to the genome 10 bp = short SAGE 17 bp = “long” SAGE BioSci D145 lecture 4 page 33 ©copyright Bruce Blumberg All rights reserved

34 Other methods of transcriptome analysis - parallel
SAGE (continued) What is the obvious shortcoming of this method? Sequences may not be unique and could have difficulty mapping to the genome BioSci D145 lecture 4 page 34 ©copyright Bruce Blumberg All rights reserved

35 Other methods of transcriptome analysis - parallel
RNA seq – Ali Mortazavi is local expert Use of massively parallel sequencing allows precise quantitation of transcript Also allows discovery of rare splice forms Discovery of unexpected transcripts Main problem is in mapping sequence calls to genome Sequencing has 1-2% errors which can make mapping to genome fail or induce “in silico cross-hybridization” Mapping to incorrect genomic location BioSci D145 lecture 4 page 35 ©copyright Bruce Blumberg All rights reserved

36 Assumes you know all the transcripts
Microarray vs. RNAseq Microarray Assumes you know all the transcripts Any sequence you did not know was expressed will not be there. except whole genome tiling arrays – Kapranov paper Detection limit issues Signal-noise ratio Well validated , expression analysis can be quantitative RNAseq No assumption re transcripts but need genome sequence Can discover novel sequences or new splice forms not yet characterized (if you have genome) Detection limits are not a problem – can detect small # Getting better, expression analysis can be quantitative BioSci D145 lecture 4 page 36 ©copyright Bruce Blumberg All rights reserved


Download ppt "BioSci D145 Lecture #4 Bruce Blumberg"

Similar presentations


Ads by Google