A New Approach to Analyzing Gene Expression Time Series Data Ziv Bar-Joseph Georg Gerber David K. Gifford Tommi S. Jaakkola Itamar Simon Learning Seminar:

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Lecture 10 Curves and Surfaces I
Chapter 15 Above: GPS time series from southern California after removing several curve fits to the data.
SVM—Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Data mining and statistical learning - lecture 6
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Classical inference and design efficiency Zurich SPM Course 2014
A Similarity Analysis of Curves: A Comparison of the Distribution of Gangliosides in Brains of Old and Young Rats. Yolanda Munoz Maldonado Department of.
Visual Recognition Tutorial
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Pattern Recognition and Machine Learning
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
Getting the numbers comparable
Principal Component Analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Curve-Fitting Regression
Independent Component Analysis (ICA) and Factor Analysis (FA)
1 Dr. Scott Schaefer Catmull-Rom Splines: Combining B-splines and Interpolation.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Ordinary least squares regression (OLS)
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Geometric Approaches to Reconstructing Time Series Data Project Update 29 March 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Missing value estimation methods for DNA microarrays
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
V. Space Curves Types of curves Explicit Implicit Parametric.
Today’s class Spline Interpolation Quadratic Spline Cubic Spline Fourier Approximation Numerical Methods Lecture 21 Prof. Jinbo Bi CSE, UConn 1.
Splines Vida Movahedi January 2007.
Curve-Fitting Regression
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
1. Interpolating polynomials Polynomial of degree n,, is a linear combination of Definitions: (interval, continuous function, abscissas, and polynomial)
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Analysis of the yeast transcriptional regulatory network.
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Curve Registration The rigid metric of physical time may not be directly relevant to the internal dynamics of many real-life systems. Rather, there can.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 Estimating the Term Structure of Interest Rates for Thai Government Bonds: A B-Spline Approach Kant Thamchamrassri February 5, 2006 Nonparametric Econometrics.
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
Learning Chaotic Dynamics from Time Series Data A Recurrent Support Vector Machine Approach Vinay Varadan.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Principal Components Analysis ( PCA)
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
CSCI480/582 Lecture 9 Chap.2.2 Cubic Splines – Hermit and Bezier Feb, 11, 2009.
Estimating standard error using bootstrap
Nonlinear Dimensionality Reduction
CH 5: Multivariate Methods
Lecture 8:Eigenfaces and Shared Features
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Spatial Analysis Longley et al..
Spline Interpolation Class XVII.
Analyzing Time Series Gene Expression Data
Objectives Classify polynomials and write polynomials in standard form. Evaluate polynomial expressions.
Dynamic modeling of gene expression data
Facultad de Ingeniería, Centro de Cálculo
政治大學統計系余清祥 2004年5月26日~ 6月7日 第十六、十七週:估計密度函數
Presentation transcript:

A New Approach to Analyzing Gene Expression Time Series Data Ziv Bar-Joseph Georg Gerber David K. Gifford Tommi S. Jaakkola Itamar Simon Learning Seminar: Bioinformatics & Other Applications Prof. Nathan Intrator Presented By: Adam Segoli Schubert May 16, 2005

Overview Gene Expression Time Series Statistical Analysis of Time-Series DNA Microarray Gene Expression Time-Series Analyzing Gene Expression Time-Series Data Estimating Unobserved Expression Values and Time Points What is a Spline? Using the Splines Parameters Analysis Aligning Time-Series Data Aligning Temporal Data Using Splines Results – Unobserved Data Estimation Result - Aligning Temporal Data References

Gene Expression

Time-Series A series of values of variables taken in successive periods of time Time Points Sampling Intervals (constant / inconstant) A well established area in statistical analysis of data is dedicated to the study of time-series

Statistical Analysis of Time-Series Two main goals: Identifying the nature of the phenomenon Predicting unobserved values of the time- series variable

DNA Microarray Allows the monitoring of expression levels of thaousands of genes under a variety of conditions. The data of microarray experiments is usually in the form of a large matrix. Very Expensive.

Gene Expression Time-Series Determined by measuring mRNA levels or protein concentrations Commonly are very short (i.e. 4 to 20 samples) Usually unevenly sampled The measuring techniques are extremely noise- prone and/or subject to bias in the biological measurements.

Analyzing Gene Expression Time- Series Data Estimating Unobserved Expression Values and Time Points Aligning Time-Series Data

Estimating Unobserved Expression Values and Time Points Row Average or Filling with Zeros Singular Value Decomposition (SVD) Weighted K-Nearest Neighbors Linear Interpolation

A New Analysis Approach By using Cubic Splines.

What is a Spline? A special curve defined piecewise by polynomials. Given k points t i called knots in an interval [a,b] with The parametric curve is called a Spline of degree n if and A Cubic Spline if n = 3.

Using the Splines We Obtain a continues time formulation by using cubic splines to represent gene expression curves. Spline control points are uniformly spaced. We constrain spline coefficients of co- expressed genes to have the same covariance matrix.

Estimating Unobserved Data Using Splines Given c Genes Classes. - The gene i (of class j) value as observed at time t Can be written as

Estimating Unobserved Data Using Splines Resampling gene I at any time t’ of an unobserved time point: Estimating Missing Values: Averaging of the observed values using the class covariance matrix, class average and the gene specific variation. Where are determined by a probabilistic model.

Estimating Unobserved Data Using Splines Parameters Analysis Y i – Vector of observed expression values for gene i. S i – Matrix m x q for m observations.

Aligning Time-Series Data Dynamic Time Wraping Developed for voice recognition purposes at the 70’s. Dynamic Programming John Aach & George M. Church operates on individual genes

Aligning Temporal Data Using Splines Operates on a set of genes. Assume we have two spline curve for gene i: We define a mapping function T(s) = t

Aligning Temporal Data Using Splines We Define the alignment error for each gene: Alignment Limits: Starting Point Ending Point

Aligning Temporal Data Using Splines We define the error for a set of genes S of size n as: - Weighted coefficients that sum to one (uniform / nonuniforn).

Aligning Temporal Data Using Splines The Mapping function (T(s) = t) can then be found by minimizing ‘s value. Using standard non-linear optimization techniques.

Results – Unobserved Data Estimation Comparison of the new approach with: Linear Interpolation Spline interpolation using individual genes K-Nearest neighbors (KNN) k = 20

Result - Aligning Temporal Data Aligned three yeast cell-cycle gene expression time series

Thank You! Any Questions?

References C. S. Moller-Levet. Clustering of Gene Expressiom Time-Series Data. Biology. Fifth Edition By Neil A. Campbell, Jane B. Reece, and Lawrence G. Mitchell. J. Aach and G. M. Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17: , C. de Boor. A practical guide to splines. Springer, P. D’haeseleer, X. Wen, S. Fuhrman, and R. Somogyi. Linear modeling of mrna expression levels during cns development and injury. In PSB99, G. James and T. Hastie/ Functional linear discriminant analysis for irregulary sampled curves. Jurnal of the Royal Statistical Society, to appear, Sharan R. and Shamir R. Algorithmic approaches to clustering gene expression data/ current topics in coputational Biology, To appear. O. Troyanskaya, M. Cantor, and et al/ Missing value estimation methods for dna microarrays. bioinformatics, 17: , 2001.