Download presentation
Presentation is loading. Please wait.
Published byCamron Booth Modified over 8 years ago
1
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics Center for Biointelligence Technology (CBIT) Biointelligence Laboratory Seoul National University
2
Contents Introduction Kernel canonical correlation analysis (kernel CCA) Datasets & Experiments Experimental results Conclusion 2(c) 2009 Biointelligence Laboratory, Seoul National University
3
Introduction One of the major challenges in gene regulation studies is to identify regulators affecting the expression of their target genes in specific biological processes. In the present study, we propose a kernel-based approach to efficiently identify core regulatory elements involved in specific biological processes using gene expression profiles. Using yeast cell cycle data, we explored significant relationships between motifs and expression profiles, and searched for regulatory motifs and their pairs correlated with specific expression patterns. S G2 M G1 3(c) 2009 Biointelligence Laboratory, Seoul National University
4
Kernel methods The kernel trick is a method to solve a non-linear problem by mapping the original non-linear observations into a higher-dimensional space. (c) 2009 Biointelligence Laboratory, Seoul National University4 Φ: x → φ(x)
5
Canonical correlation analysis (CCA) is a classical multivariate statistical method for finding linearly correlated features from a pair of datasets. Suppose there is a pair of multivariates x i and x j, CCA finds a pair of linear transformations such that the correlation coefficient between extracted features is maximized. Canonical correlation analysis (CCA) (c) 2009 Biointelligence Laboratory, Seoul National University5 xixi xjxj … … uiui ujuj … aiai ajaj
6
Kernel CCA offers a solution for overcoming the linearity problem by projecting the data into a higher dimensional feature space. While CCA is limited to linear features, kernel CCA can capture nonlinear relationships. Kernel canonical correlation analysis (kernel CCA) (c) 2009 Biointelligence Laboratory, Seoul National University6 sequence data … x seq x exp … … … u seq u exp Φ seq Φ exp … expression profiles f seq f exp
7
Preparation of datasets Gene expression datasets –Expression profiles of all ORFs (open reading frames) during the yeast cell cycle that consists of 18 time points by Spellman et al. Sequence datasets 1.Upstream sequences of ORFs scanned for the presence of 42 known motifs extracted by Pilpel et al. using the AlignACE program 2.Raw upstream sequences extracted ~1kb upstream sequences of each gene. (c) 2009 Biointelligence Laboratory, Seoul National University7
8
Experiments Identification of the relationship between gene expression and known motifs using a set of motifs extracted by AlignACE –42 motifs Identification of cell cycle-related motifs from raw upstream sequence –A total of 1,024 features (window size l=5) Combinatorial effects of regulatory motifs –Searching the motif pairs that have synergistic or co-regulatory effects in the yeast cell cycle (c) 2009 Biointelligence Laboratory, Seoul National University8
9
Known regulatory motifs in yeast Motifs RAP1RPN4GCN4MCBHAP234 MIG1AFT1STRE’CCACSRE PHO4STE12HSEABF1ATRepeat GALLeu3LYS14MET31-32OAF1 PACPDRPHOREB1STRE ECBndt80 (MSE)Yap1SCBGcr1 zap1MCM1’MCM1SFFSFF’ BAS1Ume6 (URS1)SWI5ALPHA1’ALPHA1 ALPHA2’ALPHA2 (c) 2009 Biointelligence Laboratory, Seoul National University9
10
Relationship between gene expression and sequence motifs (c) 2009 Biointelligence Laboratory, Seoul National University10
11
The list of top ranked motifs by the kernel CCA MotifWeightFunction SWI50.89026Transcription activation in G1 phase SFF’0.45399FKH1 binding site that regulate the cell cycle MCB0.29633MBF binding site that activates in late G1 phase LYS140.21796Lysine biosysthesis pathway ALPHA20.16532Encoding a homeobox-domain (c) 2009 Biointelligence Laboratory, Seoul National University11
12
Weight distributions for motifs derived from cell cycle and non cell cycle-related datasets (c) 2009 Biointelligence Laboratory, Seoul National University12 SWI5 SFF’ MCB SFF’ SWI5 MCB
13
Correlation between expression profiles and motifs derived by using the raw upstream sequence data (c) 2009 Biointelligence Laboratory, Seoul National University13
14
High-scored motifs in the first and the second components using 5-mer raw upstream sequences SequenceMotif DescriptionWeightComponentRank GCGTGMCB (ACGCGT)0.07956711 CGTGTMATalpha2 (CRTGTWWWW)0.07534012 CATGTMATalpha2 (CRTGTWWWW)0.046299112 CCACGSCB (CACGAAA)0.01899224 CGCGTMCB (ACGCGT)0.01787025 GTGTTMATalpha2 (CRTGTWWWW)0.01659529 (c) 2009 Biointelligence Laboratory, Seoul National University14
15
Measurement of the effect of motif pairs ECRScore (Expression Coherence coRrelation Score) –It is calculated by a Pearson correlation coefficient of expression profiles for all possible pairs of genes whose upstream regions had the two motifs, m i and m j. N(m i ∩ m j ) is the number of all pairs of genes whose upstream regions have the two motifs. N τ (m i ∩ m j ) is the number of gene pairs whose correlation coefficient is larger than the threshold τ. The threshold was chosen based on the fifth percentile of the distribution for correlation coefficients of randomly sampled gene pairs. (c) 2009 Biointelligence Laboratory, Seoul National University15
16
Heat map of weight values of motif pairs related to cell cycle regulation (c) 2009 Biointelligence Laboratory, Seoul National University16
17
Combinational effects of regulatory motifs WeightMotif PairECRScore# of ORFs 2.5368MCBMCM10.39015 2.5018MCBECB0.43912 2.0177PHOMCM1’0.08817 1.848ECBALPHA20.08814 1.7535MCM1ALPHA20.07417 1.7263ATRepeatMCM10.07612 1.6995PHOECB0.12711 1.6823REB1SWI50.09914 1.6476REB1MCM1'0.11513 1.4256REB1ALPHA10.06715 (c) 2009 Biointelligence Laboratory, Seoul National University17
18
Conclusion We presented a novel method that can identify the candidate conditional specific regulatory motifs by employing kernel- based methods. In summary, given expression profiles, our method was able to identify regulatory motifs involved in specific biological processes. The method could be applied to the elucidation of the unknown regulatory mechanisms associated with complex gene regulatory processes. In the future research, we will apply the proposed method to diverse gene expression datasets, especially cancer-related datasets. (c) 2009 Biointelligence Laboratory, Seoul National University18
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.