Download presentation
Presentation is loading. Please wait.
Published byJoleen Goodwin Modified over 9 years ago
1
Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004
2
Introduction Introduction The Applications of SVD Technology The Applications of SVD Technology The Applications of NMF Technology The Applications of NMF Technology Summarization Summarization
3
Introduction 1. Gene and Genomes 1. Gene and Genomes Gene ----The basic unit of genetic function Gene ----The basic unit of genetic function Gene Expression ----The process by which Gene Expression ----The process by which genetic information at the DNA level is converted into functional proteins. genetic information at the DNA level is converted into functional proteins.
4
Introduction Genome Structure ---- each organism contains a unique genomic sequence with a unique structure.
5
Gene structure
11
Genome Data with unknown biological meanings exponentially increase. There are needs for mining these data.
12
Analysis of these new data requires mathematical tools that are adaptable to the large quantities of data, while reducing the complexity of the data to make them comprehensible.
13
2. A Microarray A small analytical device. A small analytical device. That allows genomic exploration with speed and precision unprecedented in the history of biology. This technology was presented in 1990s.
14
3. Microarray Analysis The process of using microarrays for scientific exploration. Massive Technologies for microarray analysis have been adopted since the early 1990s.
15
4. Type of Microarray
16
5. The Roles of Microarray To monitor gene expression levels on a genomic scale To enhance fundamental understanding of life on the molecular level regulation of gene expression regulation of gene expression gene function gene function cellular mechanisms cellular mechanisms medical diagnosis, treatment, medical diagnosis, treatment, drug design drug design
17
The microarray data form a matrix The microarray data form a matrix
18
Applications of SVD Mathematical definition of the SVD U is an m x n matrix U is an m x n matrix S is an n x n diagonal matrix S is an n x n diagonal matrix V T is also an n x n matrix V T is also an n x n matrix
19
One important result of the SVD of X
20
X (l) is the closest rank-l matrix to X. X (l) is the closest rank-l matrix to X. The term “closest” means that X (l) minimizes the sum of the squares of the difference of the elements of X and X (l) The term “closest” means that X (l) minimizes the sum of the squares of the difference of the elements of X and X (l) ∑ ij |x ij – x (l) ij | 2 =min ∑ ij |x ij – x (l) ij | 2 =min
21
SVD analysis of gene expression data
22
The results for Elutriation Dataset
23
Pattern Inference
24
The result analysis for Pattern Inference (a) Raster display of v ’, the expression of 14 eigengenes in 14 arrays. (a) Raster display of v ’, the expression of 14 eigengenes in 14 arrays. (b) Bar chart of the fractions of eigenexpression (b) Bar chart of the fractions of eigenexpression (c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phase = 2*3.14/13, respectively. (c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phase = 2*3.14/13, respectively.
25
Data Sorting
26
The results analysis for data sorting Fig.3.Genes sorted by relative correlation with r1 and r2 of normalized elutriation. (a) Normalized elutriation expression of the sorted 5,981 genes in the 14 arrays, showing traveling wave of expression. (b) Eigenarrays expression; the expression of a1 and a2, the eigenarrays corresponding to r1 and r2, displays the sorting. (c) Expression levels of a1(red) and a2(green) fit normalized sine and cosine functions of period Z=N-1= 5,980 and phase Q=2*3.14/13 (blue), respectively.
27
Other Applications for SVD Missing data Missing data Comparison between two genomic sequences Comparison between two genomic sequences
28
The Applications of NMF Mathematical definition of the NMF V (n m) = W (n r). H (r m) V (n m) = W (n r). H (r m) In general, (n+m)r < nm. It can be used to extract the features that are hidden in dataset. It can be used to extract the features that are hidden in dataset.
30
Comparison with SVD
31
The results for Elutriation Dataset
32
The results for a - factor Dataset
33
Summarization 1. SVD : Normalization 。 1. SVD : Normalization 。 no data limitation no data limitation NMF : No Normalization NMF : No Normalization Positive data Positive data 2. SVD: Missing data, Cluster, Pattern inference, 2. SVD: Missing data, Cluster, Pattern inference, weak pattern extraction, Comparison weak pattern extraction, Comparison NMF: Pattern inference, Cluster, Finding NMF: Pattern inference, Cluster, Finding similarity similarity 3. ICA is used to mining DNA microarray data. 3. ICA is used to mining DNA microarray data.
34
Thanks a lot! Thanks a lot!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.