Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
BioInformatics (3).
Basic Gene Expression Data Analysis--Clustering
Outlines Background & motivation Algorithms overview
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.
Dairian Wan | Bioinformatics © 2003, Genentech 1 6/1/2015 Bioinformatics Overview 8 November 2004 Dairian Wan.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Gene expression analysis summary Where are we now?
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
The Human Genome Project and ~ 100 other genome projects:
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
An Introduction to DNA Microarrays Jack Newton University of Alberta
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Analysis of microarray data
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
What is Biotechnology?.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Data Type 1: Microarrays
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Finish up array applications Move on to proteomics Protein microarrays.
Bioinformatics Brad Windle Ph# Web Site:
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY A Comparison of Methods for Aligning Genomic Sequences Ja’Nera Mitchom Fisk University Research.
Gene expression analysis
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Introduction to Bioinformatics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 21, 2004 ChengXiang Zhai Department of Computer Science University.
Ritesh Krishna Department Of Computer Science WPCCS July 1, 2008.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Lecture 7. Functional Genomics: Gene Expression Profiling using
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Li Jia Le 3O3. Definition  the technology of creating machines or robots at or close to the microscopic scale of a nanometer (10 −9 meters)  a robot.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Other uses of DNA microarrays
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Computational Biology
EQTLs.
High-throughput Biological Data The data deluge
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Topic: Medicine of the future Reading: Harbron, Chris (2006)
Mapping Global Histone Acetylation Patterns to Gene Expression
Predicting Gene Expression from Sequence
Inferring Cellular Processes from Coexpressing Genes
Presentation transcript:

Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Introduction Introduction The Applications of SVD Technology The Applications of SVD Technology The Applications of NMF Technology The Applications of NMF Technology Summarization Summarization

Introduction 1. Gene and Genomes 1. Gene and Genomes Gene ----The basic unit of genetic function Gene ----The basic unit of genetic function Gene Expression ----The process by which Gene Expression ----The process by which genetic information at the DNA level is converted into functional proteins. genetic information at the DNA level is converted into functional proteins.

Introduction Genome Structure ---- each organism contains a unique genomic sequence with a unique structure.

Gene structure

Genome Data with unknown biological meanings exponentially increase. There are needs for mining these data.

Analysis of these new data requires mathematical tools that are adaptable to the large quantities of data, while reducing the complexity of the data to make them comprehensible.

2. A Microarray A small analytical device. A small analytical device. That allows genomic exploration with speed and precision unprecedented in the history of biology. This technology was presented in 1990s.

3. Microarray Analysis The process of using microarrays for scientific exploration. Massive Technologies for microarray analysis have been adopted since the early 1990s.

4. Type of Microarray

5. The Roles of Microarray To monitor gene expression levels on a genomic scale To enhance fundamental understanding of life on the molecular level regulation of gene expression regulation of gene expression gene function gene function cellular mechanisms cellular mechanisms medical diagnosis, treatment, medical diagnosis, treatment, drug design drug design

The microarray data form a matrix The microarray data form a matrix

Applications of SVD Mathematical definition of the SVD U is an m x n matrix U is an m x n matrix S is an n x n diagonal matrix S is an n x n diagonal matrix V T is also an n x n matrix V T is also an n x n matrix

One important result of the SVD of X

X (l) is the closest rank-l matrix to X. X (l) is the closest rank-l matrix to X. The term “closest” means that X (l) minimizes the sum of the squares of the difference of the elements of X and X (l) The term “closest” means that X (l) minimizes the sum of the squares of the difference of the elements of X and X (l) ∑ ij |x ij – x (l) ij | 2 =min ∑ ij |x ij – x (l) ij | 2 =min

SVD analysis of gene expression data

The results for Elutriation Dataset

Pattern Inference

The result analysis for Pattern Inference (a) Raster display of v ’, the expression of 14 eigengenes in 14 arrays. (a) Raster display of v ’, the expression of 14 eigengenes in 14 arrays. (b) Bar chart of the fractions of eigenexpression (b) Bar chart of the fractions of eigenexpression (c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phase = 2*3.14/13, respectively. (c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phase = 2*3.14/13, respectively.

Data Sorting

The results analysis for data sorting Fig.3.Genes sorted by relative correlation with r1 and r2 of normalized elutriation. (a) Normalized elutriation expression of the sorted 5,981 genes in the 14 arrays, showing traveling wave of expression. (b) Eigenarrays expression; the expression of a1 and a2, the eigenarrays corresponding to r1 and r2, displays the sorting. (c) Expression levels of a1(red) and a2(green) fit normalized sine and cosine functions of period Z=N-1= 5,980 and phase Q=2*3.14/13 (blue), respectively.

Other Applications for SVD Missing data Missing data Comparison between two genomic sequences Comparison between two genomic sequences

The Applications of NMF Mathematical definition of the NMF V (n  m) = W (n  r). H (r  m) V (n  m) = W (n  r). H (r  m) In general, (n+m)r < nm. It can be used to extract the features that are hidden in dataset. It can be used to extract the features that are hidden in dataset.

Comparison with SVD

The results for Elutriation Dataset

The results for a - factor Dataset

Summarization 1. SVD : Normalization 。 1. SVD : Normalization 。 no data limitation no data limitation NMF : No Normalization NMF : No Normalization Positive data Positive data 2. SVD: Missing data, Cluster, Pattern inference, 2. SVD: Missing data, Cluster, Pattern inference, weak pattern extraction, Comparison weak pattern extraction, Comparison NMF: Pattern inference, Cluster, Finding NMF: Pattern inference, Cluster, Finding similarity similarity 3. ICA is used to mining DNA microarray data. 3. ICA is used to mining DNA microarray data.

Thanks a lot! Thanks a lot!