Download presentation
Presentation is loading. Please wait.
1
Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA
2
Objective of Project Analysis of Sets of Differentially Expressed Genes in Plus and Minus Copper Conditions For Arabidopsis WT Identify Spl7 Regulated Genes Potential Upstream Motifs That Regulate the Genes
3
Project Significance To Further the Development of Techniques Used in High Throughput Analysis. The Study of Copper Regulation in Arabidopsis. This Data Could Be Used to Help Increase Our Understanding of Copper Regulation in the Human Body.
4
Arabidopsis Thaliana Tools Used Solexa Sequencing Low Level Data Analysis Downstream Data Analysis Future Work Outline of Presentation
5
Arabidopsis Thaliana A Small Flowering Plant Related to Cabbage and Mustard Found in Europe, Asia, and Northwestern Africa First Plant Genome to be Sequenced and it is Well Annotated http://www.steve.gb.com/images/science/arabidopsis_thaliana.jpg
6
TAIR Tools Used SOAP www.arabidopsis.org http://soap.genomics.org.cn MATLAB www.mathworks.com Excel www.microsoft.comwww.pythonwin.org
7
Solexa Sequencing 1.Prepare Genomic DNA Sample 2.Attach DNA to Flow Cell Surface 3.Amplification 4.Determine First Base 5.Image First Base 6. Determine Second Base 7.Sequence Reads Over Multiple Chemistry Cycles http://seqanswers.com/forums/showthread.php?t=21
8
AAAA > 250 – 500 Mb 33 nt sequence Random Hexamer Primed 1st Strand cDNA Synthesis End Repair and Adaptor Ligation PCR AAAA 2nd Strand cDNA Synthesis Metal Catalyzed Fragmentation Sequence 60 – 200 nt Size Selection 200 bp Illumina mRNA Sample Preparation by Whole Transcriptome Analysis (WTA)
9
Experimental Conditions of Analyzed Data Root CellShoot Cell Arabidopsis Wild TypeSpl7 Mutant +Cu and -Cu Root CellShoot Cell
10
Data Analysis Solexa Data Align Data TAIR Refseq Calculate Hits per Gene Normalize Regularize Check For Reproducibility Differentially Expressed Gene Statistical Analysis Spl7 Motif Statistical Analysis Spreadsheet of Results SOAP MATLAB Excel
11
Data Reproducibility Replicate 1 (Alignment Hits per Million) Replicate 2 (Alignment Hits per Million) Arabidopsis WT Root Cell Minus Copper Condition
12
Statistical Analysis for Differential Expression Differential Expression of Genes in Plus Copper vs. Minus Copper Statistical Problems Only Two Replicates Large Dynamic Range of Data
13
Student’s T-test Fails With Large Dynamic Range Bayesian T-test Makes Use of Genes With Similar Expression Levels Currently Still Fails With Large Dynamic Range Binomial Test Combined Replicates Fails When Reproducibility is Bad Statistical Analysis for Differential Expression
14
Top Differentially Expressed Genes with Binomial Test Root Reference mRNA Sequence Hits per million (unique hits) Gene WT +Cu GAN1 WT +Cu GAN5 WT -Cu GAN2 WT -Cu GAN6 WT log10 (P value) Bayesian WT log10 (P value) Binomial WT log10 (P value) t-test WT log2 (fold change) spl7 +Cu GAN4 spl7 +Cu GAN8 spl7 -Cu GAN3 spl7 -Cu GAN7 Spl7 log10 (P value) Bayesian spl7 log10 (P value) Binomial Spl7 log10 (P value) t- test Spl7 log2(fold change) Glycosyl Hydrolase Family 17 protein (AT4G16260) 5913369341833-2.93-400.00-0.73-1.583524443133 -8.53-400.00-3.55-2.98 Copper Ion Transporter (COPT2) 1426478531-8.61-400.00-2.51-4.6611203338-2.83-7.17-1.29-1.22 Copper Chaperone (CCH) 21815510321055-7.44-400.00-2.81-2.4914113094116-0.94-4.19-0.910.37 Ferric-Chelate Reductase (ATFRO5/FRO5) 0.50 11141141-8.49-400.00-3.84-11.140.501.450.50 -0.43-0.38-0.370.97 Zinc Ion Transporter (ZIP2) 2132425486-8.23-308.36-2.28-4.102.874.326.344.97-0.63-0.91-0.76-0.65 Peroxidase, Putative (AT1G49570) 273195734985-5.43-280.62-1.38-1.8826120921442635-6.94-400.00-1.89-3.35 Pentatricopeptide (PPR) Repeat- Containing Protein (AT1G07590) 5162447869536625-3.96-279.75-1.45-0.493649447436495982-0.68-55.87-0.22-0.25 Manganese Ion Binding (GLP5) 80487117341805-4.71-267.44-2.56-1.08871100612651306-1.42-46.38-1.41-0.45 Copper Ion Binding (UCC2) 233318889961202-2.36-257.78-1.270.941873211119432434-0.34-9.03-0.26-0.14 Peroxidase, Putative (AT5G19890) 1209162020772586-2.24-184.61-0.97-0.721309159539984478-6.88-400.00-2.00-1.55 Min: Bayesian -13.87 Binomial –inf Student T-test -5.63
15
Motifs Analysis: The First Approach Select Potential Targets of transcription factor SPL7 Statistical Test Background Distribution Derived From Word Counts In the Whole Genome Retrieve Promoter Sequences From the Genome Calculate Word Count For SPL7 Motif
16
Future Work Research New Statistical Methods to Better Identify Differentially Expressed Genes Use of Non Fixed Window For Bayesian T-test Finish Analysis of Motifs That Regulate the Differentially Expressed Genes Identify Transcribed Non Coding RNAs (e.g. microRNAs)
17
Acknowledgements UCLA and the Pellegrini Lab Dr. Matteo Pellegrini Dr. David Casero Díaz-Cano Dr. Shawn Cokus Collaborators Ute Krammer University of Heidelberg, Germany Sabeeha Merchant University of California Los Angeles SoCalBSI Instructors and Fellow Researchers Funding www.ucla.edu Dr. Jamil Momand Dr. Sandy Sharp Dr. Nancy Water-Perez Dr. Wendie Johnston Dr. Beverly Krilowicz Dr. Silvia Heubach Dr. Jennifer Faust National Institutes of Health National Science Foundation Economic & Workforce Development The Department of Energy http://instructional1.calstatela.edu/jmomand2/index.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.