Presentation is loading. Please wait.

Presentation is loading. Please wait.

Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10,341-356,

Similar presentations


Presentation on theme: "Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10,341-356,"— Presentation transcript:

1 Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10,341-356, 2003

2 Outline Splines Estimating Unobserved Expression Values and Time Points Model Based Clustering Algorithm for Temporal Data Aligning Temporal Data Results

3 Splines The word “spline” come from the ship building industry

4 Splines Splines are piecewise polynomials with boundary continuity and smoothness constraints. The typical way to represent a piecewise cubic curve :

5 Splines – We have cubic polynomial : – equations are required : – Interpolating splines

6 Splines B-spline – In terms of a set of normalized Basis functions The application of fitting curved to gene expression time-series data – Convenient with the B-spline basis to obtains approximating or smoothing splines – Fewer basis coefficient than there are observed data points – Avoid overfitting

7 Splines The basis coefficients : – Interpreted geometrically as control points – The vertices of a polygon that control the shape of the spline but are not interpolated by the curve – The curve lies entirely within the convex hull of this controlling polygon. – Each vertex exerts only a local influence on the curve.

8 Splines

9 – 任何 x i 區間中 S(t) 必為 k-1 次的多項式 – S(t) 具有 1,2,…,k-2 階微分的連續性 – 對於同一 k 值而言 – 在 t 的有效區間中 b i,k ≧ 0 ,且任一 b i,k 均僅有唯一極大值,除 k=1,2 外 b i,k 均為連續平滑曲線。 y t 1 x i x i+1 x i+2 x i+3 b i,1 b i,2 b i,3

10 Splines A uniform knot vector is one in which the entries are evenly space – i.e. – The basis functions will be translated of each other, i.e. – For a periodic cubic B-spline (k=4), the equation specifying the curve :

11 B-splines – The B-spline will only be defined in the shaded region 3  t  4

12 Estimating Unobserved Expression Values and Time Points To obtain a continuous time formulation, use cubic B-spline – Getting the value of the splines at a set of control points in the time-series. Re-sample the curve to estimate expression values at any time-points. Spline function are not fit for each gene individually – due to noise and missing value – lead to over-fitting Instead, constraint the spline coefficients of co-expressed genes to have the same covariance matrix – Use other genes in the same class to estimate the missing values of a specific gene.

13 Estimating Unobserved Expression Values and Time Points A probabilistic model of time series expression data – Assume a set of genes are grouped together Using prior biological knowledge a clustering algorithm

14 Estimating Unobserved Expression Values and Time Points – – – – – – – –

15 To learn the parameters of this model ( , ,  and  ) – Use the observed values, and maximize the likelihood of the input data –

16 Estimating Unobserved Expression Values and Time Points – Decompose the probability : If the  values were observed, decompose the probability :

17 Estimating Unobserved Expression Values and Time Points – Use EM E step : find the best estimation for  using the values we have for  2, , and . M step : maximize.

18 Model Based Clustering Algorithm for Temporal Data A new clustering algorithm that simultaneously solves the parameter estimation and class assignment problems – – EM algorithm E step M step

19 Model Based Clustering Algorithm for Temporal Data –

20 Aligning Temporal Data Assume we have two sets of time-series gene expression profiles – Splines for reference – Splines in the set to be warped A mapping – Linear transformation

21 Aligning Temporal Data The error of the alignment: – Averaged squared distance Find parameters a and b that minimize The error for a set of genes S of size n The averaged squared distance between the two curve Take into account the degree of overlap between the curves.

22 Aligning Temporal Data – – – –

23 Results 800 genes in Saccharomyces cerevisiae with five groups Unobserved data estimation

24 Results Clustering – Explore the effect that non-uniform sampling Two synthetic curves :

25 Results

26

27


Download ppt "Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10,341-356,"

Similar presentations


Ads by Google