Statistics for Microarrays

Slides:



Advertisements
Similar presentations
Experimental Design and Differential Expression Class web site: Statistics for Microarrays.
Advertisements

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Normalization of microarray data
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Image Quantitation in Microarray Analysis More tomorrow...
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Fred Hutchinson Cancer Research Center March 9, 2001.
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Normalization Class web site: Statistics for Microarrays.
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Gene Expression Data Analyses (2)
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
Normalization Review and Cluster Analysis Class web site: Statistics for Microarrays.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Corrections and Normalization in microarrays data analysis
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
Image Quantitation in Microarray Analysis More tomorrow...
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics
CDNA Microarrays MB206.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Agenda Introduction to microarrays
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Pre-processing in DNA microarray experiments Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Bioinformatic Strategies For Application of Genomic Tools to Environmental.
Microarray Data Analysis The Bioinformatics side of the bench.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Normalization Methods for Two-Color Microarray Data
Getting the numbers comparable
Normalization for cDNA Microarray Data
Design Issues Lecture Topic 6.
Presentation transcript:

Statistics for Microarrays Experimental Design, Normalization, and Exploratory Data Analysis A B C Class web site: http://statwww.epfl.ch/davison/teaching/Microarrays/

16-bit TIFF files (Rfg, Rbg), (Gfg, Gbg) R, G Biological question Differentially expressed genes Sample class prediction etc. Experimental design Microarray experiment 16-bit TIFF files Image analysis (Rfg, Rbg), (Gfg, Gbg) Normalization R, G Estimation Testing Clustering Discrimination Biological verification and interpretation

Some Considerations for cDNA Microarray Experiments (I) Scientific (Aims of the experiment) Specific questions and priorities How will the experiments answer the questions Practical (Logistic) Types of mRNA samples: reference, control, treatment, mutant, etc Source and Amount of material (tissues, cell lines) Number of slides available

Some Considerations for cDNA Microarray Experiments (II) Other Information Experimental process prior to hybridization: sample isolation, mRNA extraction, amplification, labelling,… Controls planned: positive, negative, ratio, etc. Verification method: Northern, RT-PCR, in situ hybridization, etc.

Aspects of Experimental Design Applied to Microarrays (I) Array Layout Which cDNA sequences are printed Spatial position Allocation of samples to slides Design layouts A vs B: Treatment vs control Multiple treatments Factorial Time series There are two broad aspects of to designing a microarray experiments. The first part is designing the array. Such as which cDNA sequence to print, what library to spots and what quality controls to include. This part of design is more of a bioinformatics question. The second aspects is on the allocation of samples to the slides. This refers to the assignment of dye labels to the samples and to determine which samples should be paried and hyb on the same slides. Later in the talk, we’ll see the different design choices in each of the different experimental settings. -- Other general issues to keep in mind which affects design choices are replication. The number and type of replication, this often determines precision and generalizability of your experiments. Extensibility, refers to the ability to compare between essentially arbitrarily many sources of data sets. -- In the interest of time, I will be focusing on the aspect that illustrate precision of estimates varies in different design layout in 4 different experimental context / setting.

Aspects of Experimental Design Applied to Microarrays (II) Other considerations Replication Physical limitations: the number of slides and the amount of material Sample Size Extensibility - linking There are two broad aspects of to designing a microarray experiments. The first part is designing the array. Such as which cDNA sequence to print, what library to spots and what quality controls to include. This part of design is more of a bioinformatics question. The second aspects is on the allocation of samples to the slides. This refers to the assignment of dye labels to the samples and to determine which samples should be paried and hyb on the same slides. Later in the talk, we’ll see the different design choices in each of the different experimental settings. -- Other general issues to keep in mind which affects design choices are replication. The number and type of replication, this often determines precision and generalizability of your experiments. Extensibility, refers to the ability to compare between essentially arbitrarily many sources of data sets. -- In the interest of time, I will be focusing on the aspect that illustrate precision of estimates varies in different design layout in 4 different experimental context / setting.

Layout options The main issue is the use of reference samples, typically labelled green. Standard statistical design principles can lead to more efficient layouts; use of dye-swaps can also help. Sample size determination is more than usually difficult, as there are 1,000s of possible changes, each with its own SD.

Natural design choice T1 T2 T3 T4 T1 Ref T2 Tn-1 Tn C Case 1: Meaningful biological control (C) Samples: Liver tissue from four mice treated by cholesterol modifying drugs. Question 1: Genes that respond differently between the T and the C. Question 2: Genes that responded similarly across two or more treatments relative to control. Case 2: Use of universal reference Samples: Different tumor samples. Question: To discover tumor subtypes. In some cases, given the nature of the experiment and the material available, one design stands out as preferable to all others. For example, if we wish to study mRNA from cells, each treated by a different drug, and the primary comparisons of interest are those of the treated cells versus the untreated cells, then the appropriate design is clear: the untreated cells become a de facto reference, and all hybridizations involve one treated set of cells and the untreated cells. Remember that in a 2-color microarray system, every thing has to be pairwise comparisons, we can not simply observed the effect of T1, T2 rather we need to observed the relative expression of T1 to something else. In this case, relative expression of T1 to C is a natural choice. These are examples, where given the nature of the scientific question, you have a natural design choice. With most experiments, a number of designs can be devised which seem suitable for use, and we need some principles for choosing one from the set of possibilities.

Treatment vs Control T C 2 /2 22 Two samples e.g. KO vs. WT or mutant vs. WT Direct Indirect T Ref T C C Ref average (log (T/C)) log (T / Ref) – log (C / Ref ) A very common questions and a active area of discussion among genomics circles. The heart of the design issue with cDNA microarrays is the decision between direct rather than indirect comparisons, that is, between making expression comparisons within slides rather between slides. We begin by discussing this comparison in the simplest case of treatment T versus control C. 2 /2 22

One-way layout: one factor, k levels I) Common Reference II) Common reference III) Direct comparison Number of Slides Ave. variance Units of material A = B = C = 1 A = B = C = 2 Ave. variance C B A ref Here we illustrate how combination of indirect and direct comparisons are often a practical solution. All pair-wise comparisons are of equal importance Compare three sources of mRNA….In all the illustration follows, we consider the variance for one gene in a direct hybridization to be sigma. If the number of slides are the limitation…. If there are no limitations…..compared using the same amount of materials When k gets large, difficult to do all pairwise comparisons

One-way layout: one factor, k levels I) Common Reference II) Common reference III) Direct comparison Number of Slides N = 3 N=6 N=3 Ave. variance 2 0.67 Units of material A = B = C = 1 A = B = C = 2 Ave. variance 1 C B A ref Here we illustrate how combination of indirect and direct comparisons are often a practical solution. All pair-wise comparisons are of equal importance Compare three sources of mRNA….In all the illustration follows, we consider the variance for one gene in a direct hybridization to be sigma. If the number of slides are the limitation…. If there are no limitations…..compared using the same amount of materials When k gets large, difficult to do all pairwise comparisons For k = 3, efficiency ratio (Design I / Design III) = 3. In general, efficiency ratio = 2k / (k-1). (But may not be achievable due to lack of independence.)

Illustration from one experiment Design I A B C Ref Design III A B C MAD is a robust measure of SD. Since we don’t have enough sample to estimate v for every gene, we use the sample variance of the log-ratios to approximate v. Box plots of log ratios: direct still ahead

Factorial experiments Treated cell lines Possible experiments CTL OSM OSM & EGF EGF Here interest is not in genes for which there is an O or an E (main) effect, but in which there is an OE interaction, i.e. in genes for which log(O&E/O)-log(E/C) is large or small.

2 x 2 factorial: some design options Indirect A balance of direct and indirect I) II) III) IV) # Slides N = 6 Main effect A 0.5 0.67 NA Main effect B 0.43 0.3 Int A.B 1.5 1 C A.B B A Depending on the question of interest: Interaction only; Main effect only A combination of both Table entry: variance (assuming all log ratios uncorrelated)

Some Design Possibilities for Detecting Interaction Samples: treated tumor cell lines at 4 time points (30 minutes, 1 hour, 4 hours, 24 hours) Question: Which genes contribute to the enhanced inhibitory effect of OSM when it is combined with EGF? Role of time? Design A: ctl Design B: ctl OSM Design A uses less mRNA (2 units per source, compared to 6), but larger variance 2 OSM & EGF OSM & EGF EGF OSM EGF

Combining Estimates A D M L V P How do we combine these? Different ways of estimating the same contrast: e.g. A compared to P Direct = A-P Indirect = A-M + (M-P) or A-D + (D-P) or -(L-A) - (P-L) M L V P How do we combine these?

Time Course Experiments Number of time points Which differences are of highest interest (e.g. between initial time and later times, between adjacent times) Number of slides available

Design choices in time series. Entry: variance t vs t+1 t vs t+2 t vs t+3 Ave T1T2 T2T3 T3T4 T1T3 T2T4 T1T4 N=3 A) T1 as common reference 1 2 1.5 B) Direct Hybridization 3 1.67 N=4 C) Common reference D) T1 as common ref + more .67 1.06 E) Direct hybridization choice 1 .75 .83 F) Direct Hybridization choice 2 T2 T3 T4 T1 Ref Generating all possible hybridization patterns is difficult once the number of mRNA sources becomes large

Replication Why? What is it? To reduce variability To increase generalizability What is it? Duplicate spots Duplicate slides Technical replicates Biological replicates

Technical Replicates: Labeling 3 sets of self – self hybridizations Data 1 and Data 2 were labeled together and hybridized on two slides separately Data 3 were labeled separately Data 3 Data 2 Data 1 Data 1

Sample Size Variance of individual measurements (X) Effect size(s) to be detected (X) Acceptable false positive rate Desired power (probability of detecting an effect of at least the specfied size)

Extensibility “Universal” common reference for arbitrary undetermined number of (future) experiments Provides extensibility of the series of experiments (within and between labs) Linking experiments necessary if common reference source diminished/depleted

Summary Balance of direct and indirect comparisons Optimize precision of the estimates among comparisons of interest Must satisfy scientific and physical constraints of the experiment This is the situation where generating all possible designs combination is not feasible. Finding an algorithm to estimate the local optimal designs.

(BREAK)

Mini-Review: How to make a cDNA microarray

Pins collect cDNA from wells 384 well plate -- Contains cDNA probes cDNA clones Spotted in duplicate Print-tip group 1 Glass Slide Array of bound cDNA probes 4x4 blocks = 16 print-tip groups Print-tip group 6

Building the chip Ngai Lab arrayer , UC Berkeley Print-tip head

Microarray Experiment

Hybridization Binding cDNA samples (targets) to cDNA probes on slide cover slip Hybridise for 5-12 hours

Quantification of expression For each spot on the slide we calculate Red intensity = Rfg - Rbg fg = foreground, bg = background, and Green intensity = Gfg - Gbg and combine them in the log (base 2) ratio Log2( Red intensity / Green intensity)

Background matters From Spot From GenePix

Quality Measurements Array Correlation between spot intensities Percentage of spots with no signals Distribution of spot signal area Spot Signal / Noise ratio Variation in pixel intensities Identification of “bad spot” (spots with no signal) Ratio (2 spots combined) Circularity

Affymetrix Oligo Chips Only one “color” Different technology, different normalization issues Affy chip normalization is an active research area – see http://www.stat.berkeley.edu/users/terry/zarray/Affy/affy_index.html

Preprocessing: Data Visualization Was the experiment a success? Are there any specific problems? What analysis tools should be used?

Tools for Microarray Normalization and Analysis Both commercial and free software The labs for this course use the R package sma Upcoming release (29 April 2002) of Bioconductor (http://www.bioconductor.org/)

Red/Green overlay images Co-registration and overlay offers a quick visualization, revealing information on color balance, uniformity of hybridization, spot uniformity, background, and artefacts such as dust or scratches Bad: high bg, ghost spots, little d.e. Good: low bg, lots of d.e.

Scatterplots: always log, always rotate log2R vs log2G M=log2R/G vs A=log2√RG

Histograms Signal/Noise = log2(spot intensity/background intensity)

Boxplots of log2R/G Liver samples from 16 mice: 8 WT, 8 ApoAI KO

Spatial plots: background from the two slides

Highlighting extreme log ratios Top (black) and bottom (green) 5% of log ratios

Pin group (sub-array) effects Lowess lines through points from pin groups Boxplots of log ratios by pin group

Boxplots and highlighting pin group effects Log-ratios Print-tip groups Clear example of spatial bias

Plate effects

Clearly visible plate effects KO #8 Probes: ~6,000 cDNAs, including 200 related to lipid metabolism. Arranged in a 4x4 array of 19x21 sub-arrays.

Time of printing effects spot number Green channel intensities (log2G). Printing over 4.5 days. The previous slide depicts a slide from this print run.

Preprocessing: Normalization Why? To correct for systematic differences between samples on the same slide, or between slides, which do not represent true biological variation between samples. How do we know it is necessary? By examining self-self hybridizations, where no true differential expression is occurring. We find dye biases which vary with overall spot intensity, location on the array, plate origin, pins, scanning parameters,….

Self-self hybridizations False color overlay Boxplots within pin-groups Scatter (MA-)plots

Similar patterns apparent in non self-self hybridizations From the NCI60 data set (Stanford web site)

From Lawrence Berkeley National Laboratory

Normalization Methods (I) Normalization based on a global adjustment log2 R/G -> log2 R/G - c = log2 R/(kG) Choices for k or c = log2k are c = median or mean of log ratios for a particular gene set (e.g. housekeeping genes). Or, total intensity normalization, where k = ∑Ri/ ∑Gi. Intensity-dependent normalization Here, run a line through the middle of the MA plot, shifting the M value of the pair (A,M) by c=c(A), i.e. log2 R/G -> log2 R/G - c (A) = log2 R/(k(A)G). One estimate of c(A) is made using the LOWESS function of Cleveland (1979): LOcally WEighted Scatterplot Smoothing.

Normalization Methods (II) Within print-tip group normalization In addition to intensity-dependent variation in log ratios, spatial bias can also be a significant source of systematic error. Most normalization methods do not correct for spatial effects produced by hybridization artefacts or print-tip or plate effects during the construction of the microarrays. It is possible to correct for both print-tip and intensity-dependent bias by performing LOWESS fits to the data within print-tip groups, i.e. log2 R/G -> log2 R/G - ci(A) = log2 R/(ki(A)G), where ci(A) is the LOWESS fit to the MA-plot for the ith grid only.

Normalization: Which Spots to use? The LOWESS lines can be run through many different sets of points, and each strategy has its own implicit set of assumptions justifying its applicability. For example, the use of a global LOWESS approach can be justified by supposing that, when stratified by mRNA abundance, a) only a minority of genes are expected to be differentially expressed, or b) any differential expression is as likely to be up-regulation as down-regulation. Pin-group LOWESS requires stronger assumptions: that one of the above applies within each pin-group. The use of other sets of genes, e.g. control or housekeeping genes, involve similar assumptions.

Use of Control Slides: M vs A Plot M = log R/G = logR - logG Lowess curve blanks Positive controls Negative controls A = ( logR + logG ) /2

Normalization makes a difference Global scale, global lowess, pin-group lowess; spatial plot after, smooth histograms of M after

Normalization by controls: Microarray Sample Pool titration series Pool the whole library Control set to aid intensity- dependent normalization Different concentrations in titration series Spotted evenly spread across the slide in each pin-group

Comparison of Normalization Schemes (courtesy of Jason Goncalves) No consensus on best normalization method Experiment done to assess the common normalization methods Based on reciprocal labeling experimental data for a series of 140 replicate experiments on two different arrays each with 19,200 spots

DESIGN OF RECIPROCAL LABELING EXPERIMENT Replicate experiment in which we assess the same mRNA pools but invert the fluors used. The replicates are independent experiments and are scanned, quantified and normalized as usual

***

Scale normalization: between slides Boxplots of log ratios from 3 replicate self-self hybridizations. Left panel: before normalization Middle panel: after within print-tip group normalization Right panel: after a further between-slide scale normalization.

The “NCI 60” experiments (no bg) Some scale normalization seems desirable

Scale normalization: another data set Log-ratios Only small differences in spread apparent. No action required. `

One way of taking scale into account Assumption: All slides have the same spread in M True log ratio is mij where i represents different slides and j represents different spots. Observed is Mij, where Mij = ai mij Robust estimate of ai is MADi = medianj { |yij - median(yij) | }

A slightly harder normalization problem Global lowess doesn’t do the trick here

Print-tip-group normalization helps

But not completely Still a lot of scatter in the middle in a WT vs KO comparison

Effects of previous normalization Before normalization After print-tip-group normalization

Within print-tip-group box plots of M after print-tip-group normalization

Taking scale into account, cont. Assumption: All print-tip-groups have the same spread in M True log ratio is mij where i represents different print-tip-groups and j represents different spots. Observed is Mij, where Mij = ai mij Robust estimate of ai is MADi = medianj { |yij - median(yij) | }

Effect of location & scale normalization Clearly care is needed in making decisions like this

A comparison of three M v A plots Unnormalized Print-tip normalization Print tip & scale n

The same normalization on another data set Before After .

Normalization: Summary Reduces systematic (not random) effects Makes it possible to compare several arrays Use logratios (M vs A-plots) Lowess normalization (dye bias) MSP titration series – composite normalization Pin-group location normalization Pin-group scale normalization Between slide scale normalization Control Spots Normalization introduces more variability Outliers (bad spots) are handled with replication

Pre-processed cDNA Gene Expression Data On p genes for n slides: p is O(10,000), n is O(10-100), but growing, Slides slide 1 slide 2 slide 3 slide 4 slide 5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Genes 3 Gene expression level of gene 5 in slide 4 = Log2( Red intensity / Green intensity) These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.

First Steps: QQ-Plots Used to assess whether a sample follows a particular (e.g. normal) distribution (or to compare two samples) A method for looking for outliers when data are mostly normal Sample Sample quantile is 0.125 Theoretical Value from Normal distribution which yields a quantile of 0.125

Acknowledgments Terry Speed (UCB and WEHI) Jean Yee Hwa Yang (UCB) Sandrine Dudoit (UCB) Ben Bolstad (UCB) Natalie Thorne (WEHI) Ingrid Lönnstedt (Uppsala) Henrik Bengtsson (Lund) Jason Goncalves (Iobion) Matt Callow (LLNL) Percy Luu (UCB) John Ngai (UCB) Vivian Peng (UCB) Dave Lin (Cornell)