Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004.

Similar presentations


Presentation on theme: "Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004."— Presentation transcript:

1 Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

2 Introduction Introduction The Applications of SVD Technology The Applications of SVD Technology The Applications of NMF Technology The Applications of NMF Technology Summarization Summarization

3 Introduction 1. Gene and Genomes 1. Gene and Genomes Gene ----The basic unit of genetic function Gene ----The basic unit of genetic function Gene Expression ----The process by which Gene Expression ----The process by which genetic information at the DNA level is converted into functional proteins. genetic information at the DNA level is converted into functional proteins.

4 Introduction Genome Structure ---- each organism contains a unique genomic sequence with a unique structure.

5 Gene structure

6

7

8

9

10

11 Genome Data with unknown biological meanings exponentially increase. There are needs for mining these data.

12 Analysis of these new data requires mathematical tools that are adaptable to the large quantities of data, while reducing the complexity of the data to make them comprehensible.

13 2. A Microarray A small analytical device. A small analytical device. That allows genomic exploration with speed and precision unprecedented in the history of biology. This technology was presented in 1990s.

14 3. Microarray Analysis The process of using microarrays for scientific exploration. Massive Technologies for microarray analysis have been adopted since the early 1990s.

15 4. Type of Microarray

16 5. The Roles of Microarray To monitor gene expression levels on a genomic scale To enhance fundamental understanding of life on the molecular level regulation of gene expression regulation of gene expression gene function gene function cellular mechanisms cellular mechanisms medical diagnosis, treatment, medical diagnosis, treatment, drug design drug design

17 The microarray data form a matrix The microarray data form a matrix

18 Applications of SVD Mathematical definition of the SVD U is an m x n matrix U is an m x n matrix S is an n x n diagonal matrix S is an n x n diagonal matrix V T is also an n x n matrix V T is also an n x n matrix

19 One important result of the SVD of X

20 X (l) is the closest rank-l matrix to X. X (l) is the closest rank-l matrix to X. The term “closest” means that X (l) minimizes the sum of the squares of the difference of the elements of X and X (l) The term “closest” means that X (l) minimizes the sum of the squares of the difference of the elements of X and X (l) ∑ ij |x ij – x (l) ij | 2 =min ∑ ij |x ij – x (l) ij | 2 =min

21 SVD analysis of gene expression data

22 The results for Elutriation Dataset

23 Pattern Inference

24 The result analysis for Pattern Inference (a) Raster display of v ’, the expression of 14 eigengenes in 14 arrays. (a) Raster display of v ’, the expression of 14 eigengenes in 14 arrays. (b) Bar chart of the fractions of eigenexpression (b) Bar chart of the fractions of eigenexpression (c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phase = 2*3.14/13, respectively. (c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phase = 2*3.14/13, respectively.

25 Data Sorting

26 The results analysis for data sorting Fig.3.Genes sorted by relative correlation with r1 and r2 of normalized elutriation. (a) Normalized elutriation expression of the sorted 5,981 genes in the 14 arrays, showing traveling wave of expression. (b) Eigenarrays expression; the expression of a1 and a2, the eigenarrays corresponding to r1 and r2, displays the sorting. (c) Expression levels of a1(red) and a2(green) fit normalized sine and cosine functions of period Z=N-1= 5,980 and phase Q=2*3.14/13 (blue), respectively.

27 Other Applications for SVD Missing data Missing data Comparison between two genomic sequences Comparison between two genomic sequences

28 The Applications of NMF Mathematical definition of the NMF V (n  m) = W (n  r). H (r  m) V (n  m) = W (n  r). H (r  m) In general, (n+m)r < nm. It can be used to extract the features that are hidden in dataset. It can be used to extract the features that are hidden in dataset.

29

30 Comparison with SVD

31 The results for Elutriation Dataset

32 The results for a - factor Dataset

33 Summarization 1. SVD : Normalization 。 1. SVD : Normalization 。 no data limitation no data limitation NMF : No Normalization NMF : No Normalization Positive data Positive data 2. SVD: Missing data, Cluster, Pattern inference, 2. SVD: Missing data, Cluster, Pattern inference, weak pattern extraction, Comparison weak pattern extraction, Comparison NMF: Pattern inference, Cluster, Finding NMF: Pattern inference, Cluster, Finding similarity similarity 3. ICA is used to mining DNA microarray data. 3. ICA is used to mining DNA microarray data.

34 Thanks a lot! Thanks a lot!


Download ppt "Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004."

Similar presentations


Ads by Google