Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Bayesian Factor Regression Models in the “Large p, Small n” Paradigm Mike West, Duke University Presented by: John Paisley Duke University.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Microarray GEO – Microarray sets database
Differentially expressed genes
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Patrick Kemmeren Using EP:NG.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
AN ALGORITHM FOR TESTING UNIDIMENSIONALITY AND CLUSTERING ITEMS IN RASCH MEASUREMENT Rudolf Debelak & Martin Arendasy.
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Multiple testing in high- throughput biology Petter Mostad.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Lab 4 R and Bioconductor II Feb 15, 2012 Alejandro Quiroz and Daniel Fernandez
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Gene expression analysis
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Stat 565- Lecture 0 Introduction and Map of this Class.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Lecture 07: Dealing with Big Data
A B Supporting Information Figure S1: Distribution of the density of expression intensities for the complete microarray dataset (A) and after removal of.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Principle Component Analysis and its use in MA clustering Lecture 12.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Principal Components Analysis ( PCA)
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Canadian Bioinformatics Workshops
PREDICT 422: Practical Machine Learning
Clustering Manpreet S. Katari.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Differential Gene Expression
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Building and Analyzing Genome-Wide Gene Disruption Networks
Dimension reduction : PCA and Clustering
Multidimensional Space,
Presentation transcript:

Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz

Outline Unsupervised – Hierarchical clustering – Principal component analysis Supervised – LIMMA package Linear models for microarray data

Before any high level analysis…. Download the data set used in lab 4 – Go to and download GSE10940 Load the.CEL files and use the custom CDF file annotation used in lab 4: “drosophila2dmrefseqcdf” Perform RMA normalization and obtain in a matrix the expression intensities – Obtain the genes that are up and down expressed with a fold change of 2. Store the gene ides in: X.top

The data set Secretory and transmembrane proteins traverse the endoplasmic reticulum (ER) and Golgi compartments for final maturation prior to reaching their functional destinations. Members of the p24 protein family function in trafficking some secretory proteins in yeast and higher eukaryotes. Yeast p24 mutants have minor secretory defects and induce an ER stress response that likely results from accumulation of proteins in the ER due to disrupted trafficking. Test the hypothesis that loss of Drosophila melanogaster p24 protein function causes a transcriptional response characteristic of ER stress activation.

Supervised Method LIMMA Linear Models for MicroArray data – A package for differential expression analysis from microarray data. – Makes use of linear models to describe the expression of each gene. – Uses empirical Bayes and other shrinkage methods to borrow information across genes making the analyses stable even for experiments with small number of arrays.

LIMMA uses linear models to analyze microarray data. – The approach requires the definition of 2 matrices Design matrix – Provides the representation on how the different factors are distributed in the data – It is assumed a linear model – Where y j contains the expression for gene j – The estimates of α j are provided by lmFit() Contrast matrix – Allows the definition of the comparison between factors of interest – If the parameters are of interest » C is the contrast matrix – These parameters are estimated by contrast.fit()

Given the large number of linear models fits arising from a microarray there is a pressing need to take advantage of the parallel structure whereby the same model is fitted to each gene Using a hierarchical framework, a moderate t- statistic is computed – Standard errors are shrunk towards a common value using a Bayesian model This borrows information for the inference of individual genes The degrees of freedom are increased – Reflexes the greater reliability to the smoothed standard errors

Unsupervised Method Hierarchical clustering Hierarchical clustering – First, need to calculate all the pair wise distances D=dist(t(X.top)) – Finally, perform the hierarchical clustering H1=hclust(D,method=“single”) H2=hclust(D,method=“complete”) H3=hclust(D,method=“average”) plot(Hi) Is there something odd from the clustering?

Unsupervised Method MDS Multidimensional scaling (MDS) is a set of related statistical techniques to explore similarities in data*. *Wikipedia.

Unsupervised Method Principal component In R, the function prcomp performs principal component analysis In our context, the idea is to visualize the impact of possible dimension reduction in GENES – Important: Remember that in prcomp, the genes have to be columns and the samples rows.