Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to genome biology and microarrays experiment

Similar presentations


Presentation on theme: "Introduction to genome biology and microarrays experiment"— Presentation transcript:

1 Introduction to genome biology and microarrays experiment
Statistics for Microarray Data Analysis – Lecture 1 The Fields Institute for Research in Mathematical Sciences May 25, 2002

2 The human genome The cell is the fundamental working unit of every living organism. Humans: trillions of cells (metazoa); other organisms like yeast: one cell (protozoa). Cells are of many different types (e.g. blood, skin, nerve cells), but all can be traced back to a single cell, the fertilized egg.

3 Genes The human genome is distributed along 23 pairs of chromosomes.
22 autosomal pairs; the sex chromosome pair, XX for females and XY for males. In each pair, one chromosome is paternally inherited, the other maternally inherited. Chromosomes are made of compressed and entwined DNA. A (protein-coding) gene is a segment of chromosomal DNA that directs the synthesis of a protein.

4 Chromosomes and DNA

5 DNA A deoxyribonucleic acid or DNA molecule is a double-stranded polymer composed of four basic molecular units called nucleotides. Each nucleotide comprises a phosphate group, a deoxyribose sugar, and one of four nitrogen bases: adenine (A), guanine (G), cytosine (C), and thymine (T). The two chains are held together by hydrogen bonds between nitrogen bases. Base-pairing occurs according to the following rule: G pairs with C, and A pairs with T.

6

7 Genetic and physical maps

8 Genetic and physical maps
Physical distance: number of base pairs (bp). Genetic distance: expected number of crossovers between two loci, per chromatid, per meiosis. Measured in Morgans (M) or centiMorgans (cM). 1cM ~ 1 million bp (1Mb) in humans

9 Central dogma The expression of the genetic information stored in the DNA molecule occurs in two stages: (i) transcription, during which DNA is transcribed into mRNA; (ii) translation, during which mRNA is translated to produce a protein. DNA  mRNA  protein Other important aspects of regulation: methylation, alternative splicing, etc. The correspondence between DNA's four-letter alphabet and a protein's twenty-letter alphabet is specified by the genetic code, which relates nucleotide triplets to amino acids.

10 Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell.
Measuring protein might be better, but is currently harder.

11 Differential expression
Each cell contains a complete copy of the organism's genome. Cells are of many different types and states E.g. blood, nerve, and skin cells, dividing cells, cancerous cells, etc. What makes the cells different? Differential gene expression, i.e., when, where, and in what quantity each gene is expressed. On average, 40% of our genes are expressed at any given time.

12 RNA A ribonucleic acid or RNA molecule is a nucleic acid similar to DNA, but single-stranded; ribose sugar rather than deoxyribose sugar; uracil (U) replaces thymine (T) as one of the bases. RNA plays an important role in protein synthesis and other chemical activities of the cell. Several classes of RNA molecules, including messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), and other small RNAs.

13 Exons and introns Genes comprise only about 2% of the human genome; the rest consists of non-coding regions, whose functions may include providing chromosomal structural integrity and regulating when, where, and in what quantity proteins are made (regulatory regions). The terms exon and intron refer to coding (translated into a protein) and non-coding DNA, respectively.

14 Functional genomics The various genome projects have yielded the complete DNA sequences of many organisms. E.g. human, mouse, yeast, fruitfly, etc. Human: 3 billion base-pairs, thousand genes. Challenge: go from sequence to function, i.e., define the role of each gene and understand how the genome functions as a whole.

15 Introduction to microarrays

16 Gene expression assays
DNA microarrays rely on the hybridization properties of nucleic acids to monitor DNA or RNA abundance on a genomic scale in different types of cells. The main types of gene expression assays: High-density nylon membrane arrays; Serial analysis of gene expression (SAGE); Short oligonucleotide arrays (Affymetrix); Long oligonucleotide arrays (Agilent); Fibre optic arrays (Illumina); cDNA arrays (Brown/Botstein)*.

17 Applications of microarrays
Measuring transcript abundance; Genotyping; Estimating DNA copy number; Determining identity by descent; Measuring mRNA decay rates; Identifying protein binding sites; Determining sub-cellular localization of gene products; ……..

18

19 The process Building the chip: RNA preparation: Hybing the chip:
PCR PURIFICATION and PREPARATION MASSIVE PCR PREPARING SLIDES PRINTING RNA preparation: Hybing the chip: CELL CULTURE AND HARVEST POST PROCESSING ARRAY HYBRIDIZATION RNA ISOLATION DATA ANALYSIS cDNA PRODUCTION PROBE LABELING Taken from Schena & Davis

20 bacterial glycerol stocks)
Building the chip PCR amplification Directly from colonies with SP6-T7 primers in 96-well plates Consolidate into 384-well plates Arrayed Library (96 or 384-well plates of bacterial glycerol stocks) Slide 4 Ideally, you will have a slides consist of all genes in the genomeThe design of the array / template is an imporatnt bioinformatics question. As you are only able to detect expression of genes on this template. Spot as microarray on glass slides Ngai Lab, UC Berkeley)

21 384 well plate Glass Slide cDNA clones Pins collect cDNA from wells
Contains cDNA probes cDNA clones Spotted in duplicate Print-tip group 1 Glass Slide Array of bound cDNA probes 4x4 blocks = 16 print-tip groups Print-tip group 6

22 Sample preparation

23 Hybridization Binding of cDNA target samples to cDNA probes on the slide
cover slip Hybridize for 5-12 hours

24 Hybridization chamber
3XSSC HYB CHAMBER ARRAY LIFTERSLIP SLIDE LABEL SLIDE LABEL Humidity Temperature Formamide (Lowers the Tm)

25 Expression profiling with DNA microarrays
cDNA “B” Cy3 labeled cDNA “A” Cy5 labeled Laser Laser 2 Hybridization Scanning Slide 5 You will have another sample of interests consists of a portion of the genes and the questions is to find out what genes are being expressed in this sample. The simplified version of the experiments is to label the sample with a color dye hybridize the sample on the slides and scan the slides under some laser to see which spots that light up. This provides a list of genes and it’s corresponding expression levels in our samples. Our interest is on comparing the expression levels between the two samples. + Analysis Image Capture

26 RGB overlay of Cy3 and Cy5 images

27 Raw data Human cDNA arrays ~ 43K spots;
16–bit TIFFs: ~ 20Mb per channel; ~ 2,000 x 5,500 pixels per image; Spot separation: ~ 136um; For a “typical” array: Mean = 43, med = 32, SD = 26 pixels per spots

28 Image analysis The raw data from a cDNA microarray experiment consist of pairs of image files, 16-bit TIFFs, one for each of the dyes. Image analysis is required to extract measures of the red and green fluorescence intensities for each spot on the array.

29 Image analysis GenePix

30 Image analysis R and G for each spot on the array.
1. Addressing. Estimate location of spot centers. 2. Segmentation. Classify pixels as foreground (signal) or background. 3. Information extraction. For each spot on the array and each dye signal intensities; background intensities; quality measures. Motivation behind background adjustment: A spot’s measured fluorescence intensity includes a contribution that is not specifically due to the hybridization of the target to the probe, but to something else, e.g. the chemical treatment of the slide, autofluorescence etc. Want to estimate and remove this unwanted contribution. R and G for each spot on the array.

31 Spot Batch automatic addressing.
Segmentation. Seeded region growing (Adams & Bischof 1994): adaptive segmentation method, no restriction on the size or shape of the spots. Intensity extraction Foreground. Mean of pixel intensities within a spot. Background. Morphological opening: non-linear filter which generates an image of the estimated background intensity for the entire slide. Spot quality measures. Software package. Spot, built on the freely available software package R.

32 Quality measures Spot One channel, R or G Signal/noise ratio;
Variation in pixel intensities; Identification of “bad spots” (no signal), etc. Two channels, R/G Brightness: foreground/background ratio; Uniformity: variation in pixel intensities and ratios of intensities; Morphology: area, perimeter, circularity. Array (slide) Percentage of spots with no signal; Range of intensities; Distribution of spot signal area, etc. How to use quality measures in subsequent analyses?

33 Terminology Probe: DNA spotted on the array, aka. spot, immobile substrate. Target: DNA hybridized to the array, mobile substrate. Sector: collection of spots printed using the same print-tip (or pin), aka. print-tip-group, pin-group, spot matrix, grid. Batch: collection of slides with the same probe layout. The terms slide or array are often used to refer to the printed microarray.

34 Microarray Life Cycle Biological Question Data Analysis & Modelling
Sample Preparation Microarray Life Cycle MicroarrayDetection Microarray Reaction Taken from Schena & Davis

35 WWW resources Complete guide to “microarraying”
Parts and assembly instructions for printer and scanner; Protocols for sample prep; Software; Forum, etc. Animation:

36 Acknowledgments Introduction to biology: based on Bioconductor short course lecture 1 ( and Temple short course lecture 1 with pictures from Introduction to microarray: based on IPAM, UCLA Thanks also to: Natalie Thorne (WEHI) Ingrid Lönnstedt (Uppsala)


Download ppt "Introduction to genome biology and microarrays experiment"

Similar presentations


Ads by Google