CZ5211 Topics in Computational Biology Lecture 2: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel: 6874-6877

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,
Consider the following…  Do all of the cells in your body carry out the same processes?  Do all of the cells in your body make the same proteins?  Do.
Recombinant DNA Technology
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Applications to Bioinformatics: Microarray Data Mining
Biological background: Gene Expression and Molecular Laboratory Techniques Class web site: Statistics.
Gene Expression Chapter 9.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
Microarrays Dr Peter Smooker,
Chip arrays and gene expression data. With the chip array technology, one can measure the expression of 10,000 (~all) genes at once. Can answer questions.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays Technology behind microarrays Data analysis approaches
Bacterial Physiology (Micr430)
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Alternative Splicing As an introduction to microarrays.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Introduce to Microarray
Lecture 4 Microarray & Analysis Alizadeh et al. Nature 403 (2000)
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
Expression Profiling Using DNA MicroArrays - Each cell type within an organism expresses a unique combination of genes – this is, in part, what makes cells.
By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi
Analysis of microarray data
with an emphasis on DNA microarrays
Biology, 9th ed,Sylvia Mader
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
CDNA Microarrays MB206.
Data Type 1: Microarrays
Microarray Technology
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
LSM3241: Bioinformatics and Biocomputing Lecture 8: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel:
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Lecture 7. Functional Genomics: Gene Expression Profiling using
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
DNA Gene A Transcriptional Control Imprinting Histone Acetylation # of copies of RNA? Post Transcriptional Processing mRNA Stability Translational Control.
Proteome and Gene Expression Analysis Chapter 15 & 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Gene expression and DNA microarrays No lab on Thursday. No class on Tuesday or Thursday next week –NCBI training Monday and Tuesday –Feb. 5 during class.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Introduction to Oligonucleotide Microarray Technology
Microarray: An Introduction
Computational Biology
Gene Expression Analysis
Microarray Technology and Applications
Lecture 11 By Shumaila Azam
Data Type 1: Microarrays
Presentation transcript:

CZ5211 Topics in Computational Biology Lecture 2: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel: Room 07-24, level 7, SOC1, NUS

2 Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin, nerve), but all arose from a single cell (the fertilized egg) Each* cell contains a complete copy of the genome (the program for making the organism), encoded in DNA.

3 DNA DNA molecules are long double-stranded chains; 4 types of bases are attached to the backbone: adenine (A), guanine (G), cytosine (C), and thymine (T). A pairs with T, C with G. A gene is a segment of DNA that specifies how to make a protein. Human DNA has about 25-35K genes; Rice about 50-60K but shorter genes.

4 Exons and Introns exons are coding DNA (translated into a protein), which are only about 2% of human genome introns are non-coding DNA, which provide structural integrity and regulatory (control) functions exons can be thought of program data, while introns provide the program logic Humans have much more control structure than rice

5 Gene Expression Cells are different because of differential gene expression. About 40% of human genes are expressed at one time. Gene is expressed by transcribing DNA into single-stranded mRNA mRNA is later translated into a protein Microarrays measure the level of mRNA expression

6 Molecular Biology Overview Cell Nucleus Chromosome Protein Gene (DNA) Gene (mRNA), single strand cDNA

7 Gene Expression Genes control cell behavior by controlling which proteins are made by a cell House keeping genes vs. cell/tissue specific genes Regulation: Transcriptional (promoters and enhancers) Post Transcriptional (RNA splicing, stability, localization - small non coding RNAs)

8 Gene Expression Regulation: Translational (3’UTR repressors, poly A tail) Post Transcriptional (RNA splicing, stability, localization - small non coding RNAs) Post Translational (Protein modification: carbohydrates, lipids, phosphorylation, hydroxylation, methlylation, precursor protein) cDNA

9 Gene Expression Measurement mRNA expression represents dynamic aspects of cell mRNA expression can be measured with latest technology mRNA is isolated and labeled with fluorescent protein mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser

10 Traditional Methods Northern Blotting –Single RNA isolated –Probed with labeled cDNA RT-PCR –Primers amplify specific cDNA transcripts

11 Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at same time – Glass slide of DNA molecules Molecule: string of bases (25 bp – 500 bp) uniquely identifies gene or unit to be studied

12 Gene Expression Microarrays The main types of gene expression microarrays: Short oligonucleotide arrays (Affymetrix) cDNA or spotted arrays (Brown/Botstein). Long oligonucleotide arrays (Agilent Inkjet); Fiber-optic arrays...

13 Fabrications of Microarrays Size of a microscope slide Images:

14 Differing Conditions Ultimate Goal: –Understand expression level of genes under different conditions Helps to: –Determine genes involved in a disease –Pathways to a disease –Used as a screening tool

15 Gene Conditions Cell types (brain vs. liver) Developmental (fetal vs. adult) Response to stimulus Gene activity (wild vs. mutant) Disease states (healthy vs. diseased)

16 Expressed Genes Genes under a given condition –mRNA extracted from cells –mRNA labeled –Labeled mRNA is mRNA present in a given condition –Labeled mRNA will hybridize (base pair) with corresponding sequence on slide

17 Two Different Types of Microarrays Custom spotted arrays (up to 20,000 sequences) –cDNA –Oligonucleotide High-density (up to 100,000 sequences) synthetic oligonucleotide arrays –Affymetrix (25 bases) –SHOW AFFYMETRIX LAYOUT

18 Custom Arrays Mostly cDNA arrays 2-dye (2-channel) –RNA from two sources (cDNA created) Source 1: labeled with red dye Source 2: labeled with green dye

19 Two Channel Microarrays Microarrays measure gene expression Two different samples: –Control (green label) –Sample (red label) Both are washed over the microarray –Hybridization occurs –Each spot is one of 4 colors

20 Microarray Technology

21 Microarray Image Analysis Microarrays detect gene interactions: 4 colors: –Green: high control –Red: High sample –Yellow: Equal –Black: None Problem is to quantify image signals

22 Single Color Microarrays Prefabricated –Affymetrix (25mers) Custom –cDNA (500 bases or so) –Spotted oligos (70-80 bases)

23 Microarray Animations Davidson University: Imagecyte:

24 Basic idea of Microarray Construction –Place array of probes on microchip Probe (for example) is oligonucleotide ~25 bases long that characterizes gene or genome Each probe has many, many clones Chip is about 2cm by 2cm Application principle –Put (liquid) sample containing genes on microarray and allow probe and gene sequences to hybridize and wash away the rest – Analyze hybridization pattern

25 Microarray analysis Operation Principle: Samples are tagged with flourescent material to show pattern of sample-probe interaction (hybridization) Microarray may have 60K probe

26 Microarray Processing sequence

27 Gene Expression Data Gene expression data on p genes for n samples Genes mRNA samples Gene expression level of gene i in mRNA sample j = Log (Red intensity / Green intensity) Log(Avg. PM - Avg. MM) sample1sample2sample3sample4sample5 …

28 Some possible applications Sample from specific organ to show which genes are expressed Compare samples from healthy and sick host to find gene-disease connection Probes are sets of human pathogens for disease detection

29 Huge amount of data from single microarray If just two color, then amount of data on array with N probes is 2 N Cannot analyze pixel by pixel Analyze by pattern – cluster analysis

30 Major Data Mining Techniques Link Analysis –Associations Discovery –Sequential Pattern Discovery –Similar Time Series Discovery Predictive Modeling –Classification –Clustering

31 Strengthens signal when averages are taken within clusters of genes (Eisen) Useful (essential ?) when seeking new subclasses of cells, tumours, etc. Leads to readily interpreted figures Cluster Analysis: Grouping Similarly Expressed Genes, Cell Samples, or Both

32 Some clustering methods and software Partitioning:K-Means, K-Medoids, PAM, CLARA … Hierarchical:Cluster, HAC、BIRCH、CURE、 ROCK Density-based: CAST, DBSCAN、OPTICS、 CLIQUE… Grid-based:STING、CLIQUE、WaveCluster… Model-based:SOM (self-organized map)、 COBWEB、CLASSIT、AutoClass… Two-way Clustering Block clustering

33 Assessment of various methods Algorithmic Approaches to Clustering Gene Expression Data, Ron Shamir School of Computer Science, Tel-Aviv University Tel-Aviv – Conclusion: hierarchical clustering exceptional

34 Partitioning

35 Density-based clustering

36 Hierarchical (used most often)

37 Hierarchical Clustering: grouping similarly expressed genes gene Sample A B C … … …. … Gene Expression Profile Analysis

38 After Clustering gene sample A B C … … …. … Gene Expression Profile Analysis

39 Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) data clustered randomized row column both time

40 Distance measurements Correlation coefficients Association coefficients Probabilistic similarity coefficients Types of Similarity Measurements

41 Correlation Coefficients The most popular correlation coefficient is Pearson correlation coefficient (1892) correlation between X={X 1, X 2, …, X n } and Y={Y 1, Y 2, …, Y n } : –where s XY s XY is the similarity between X & Y

42 Use of Similarity for Tree Construction Normalize similarity so that =1 Then have nxn similarity matrix S whose diagonal elements are 1 Define distance matrix by (for example) D = 1 – S Diagonal elements of D are 0 Now use distance matrix to built tree (using some tree-building software recall lecture on Phylogeny) s XX

43 A dendrogram (tree) for clustered genes Cluster 6=(1,2) Cluster 7=(1,2,3) Cluster 8=(4,5) Cluster 9= (1,2,3,4,5) Let p = number of genes. 1. Calculate within class correlation. 2. Perform hierarchical clustering which will produce (2p-1) clusters of genes. 3. Average within clusters of genes. 4 Perform testing on averages of clusters of genes as if they were single genes. E.g. p=5

44 A real case Nature Feb, 2000 Paper by Allzadeh. A et al Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling

45 Validation Techniques: Hubert’s Γ Statistics X= [X(i, j)] and Y= [Y(i, j)] are two n × n matrix –X(i, j) : similarity of gene i and gene j –Hubert’s Γ statistic represents the point serial correlation : where M = n (n - 1) / 2 –A higher value of Γ represents the better clustering quality. if genes i and j are in same cluster, otherwise

46 Discovering sub-groups

47 Time Course Data Gene Expression is Time-Dependent

48 Sample of time course of clustered genes time

49Limitations Cluster analyses: –Usually outside the normal framework of statistical inference –Less appropriate when only a few genes are likely to change –Needs lots of experiments Single gene tests : –May be too noisy in general to show much –May not reveal coordinated effects of positively correlated genes. –Hard to relate to pathways

50 Useful Links Affymetrix Michael Eisen Lab at LBL (hierarchical clustering software “Cluster” and “Tree View” (Windows)) rana.lbl.gov/ Review of Currently Available Microarray Software ArrayExpress at the EBI Stanford MicroArray Database Yale Microarray Database Microarray DB