Presentation is loading. Please wait.

Presentation is loading. Please wait.

Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano,

Similar presentations


Presentation on theme: "Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano,"— Presentation transcript:

1 Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA

2 Objective of Project  Analysis of Sets of Differentially Expressed Genes in Plus and Minus Copper Conditions For Arabidopsis WT  Identify Spl7 Regulated Genes  Potential Upstream Motifs That Regulate the Genes

3 Project Significance  To Further the Development of Techniques Used in High Throughput Analysis.  The Study of Copper Regulation in Arabidopsis.  This Data Could Be Used to Help Increase Our Understanding of Copper Regulation in the Human Body.

4  Arabidopsis Thaliana  Tools Used  Solexa Sequencing  Low Level Data Analysis  Downstream Data Analysis  Future Work Outline of Presentation

5 Arabidopsis Thaliana  A Small Flowering Plant Related to Cabbage and Mustard  Found in Europe, Asia, and Northwestern Africa  First Plant Genome to be Sequenced and it is Well Annotated http://www.steve.gb.com/images/science/arabidopsis_thaliana.jpg

6  TAIR Tools Used  SOAP www.arabidopsis.org http://soap.genomics.org.cn  MATLAB www.mathworks.com  Excel www.microsoft.comwww.pythonwin.org

7 Solexa Sequencing 1.Prepare Genomic DNA Sample 2.Attach DNA to Flow Cell Surface 3.Amplification 4.Determine First Base 5.Image First Base 6. Determine Second Base 7.Sequence Reads Over Multiple Chemistry Cycles http://seqanswers.com/forums/showthread.php?t=21

8 AAAA > 250 – 500 Mb 33 nt sequence Random Hexamer Primed 1st Strand cDNA Synthesis End Repair and Adaptor Ligation PCR AAAA 2nd Strand cDNA Synthesis Metal Catalyzed Fragmentation Sequence 60 – 200 nt Size Selection 200 bp Illumina mRNA Sample Preparation by Whole Transcriptome Analysis (WTA)

9 Experimental Conditions of Analyzed Data Root CellShoot Cell Arabidopsis Wild TypeSpl7 Mutant +Cu and -Cu Root CellShoot Cell

10 Data Analysis Solexa Data Align Data TAIR Refseq  Calculate Hits per Gene  Normalize  Regularize  Check For Reproducibility  Differentially Expressed Gene Statistical Analysis  Spl7 Motif Statistical Analysis Spreadsheet of Results SOAP MATLAB Excel

11 Data Reproducibility Replicate 1 (Alignment Hits per Million) Replicate 2 (Alignment Hits per Million) Arabidopsis WT Root Cell Minus Copper Condition

12 Statistical Analysis for Differential Expression  Differential Expression of Genes in Plus Copper vs. Minus Copper  Statistical Problems  Only Two Replicates  Large Dynamic Range of Data

13  Student’s T-test  Fails With Large Dynamic Range  Bayesian T-test  Makes Use of Genes With Similar Expression Levels  Currently Still Fails With Large Dynamic Range  Binomial Test  Combined Replicates  Fails When Reproducibility is Bad Statistical Analysis for Differential Expression

14 Top Differentially Expressed Genes with Binomial Test Root Reference mRNA Sequence Hits per million (unique hits) Gene WT +Cu GAN1 WT +Cu GAN5 WT -Cu GAN2 WT -Cu GAN6 WT log10 (P value) Bayesian WT log10 (P value) Binomial WT log10 (P value) t-test WT log2 (fold change) spl7 +Cu GAN4 spl7 +Cu GAN8 spl7 -Cu GAN3 spl7 -Cu GAN7 Spl7 log10 (P value) Bayesian spl7 log10 (P value) Binomial Spl7 log10 (P value) t- test Spl7 log2(fold change) Glycosyl Hydrolase Family 17 protein (AT4G16260) 5913369341833-2.93-400.00-0.73-1.583524443133 -8.53-400.00-3.55-2.98 Copper Ion Transporter (COPT2) 1426478531-8.61-400.00-2.51-4.6611203338-2.83-7.17-1.29-1.22 Copper Chaperone (CCH) 21815510321055-7.44-400.00-2.81-2.4914113094116-0.94-4.19-0.910.37 Ferric-Chelate Reductase (ATFRO5/FRO5) 0.50 11141141-8.49-400.00-3.84-11.140.501.450.50 -0.43-0.38-0.370.97 Zinc Ion Transporter (ZIP2) 2132425486-8.23-308.36-2.28-4.102.874.326.344.97-0.63-0.91-0.76-0.65 Peroxidase, Putative (AT1G49570) 273195734985-5.43-280.62-1.38-1.8826120921442635-6.94-400.00-1.89-3.35 Pentatricopeptide (PPR) Repeat- Containing Protein (AT1G07590) 5162447869536625-3.96-279.75-1.45-0.493649447436495982-0.68-55.87-0.22-0.25 Manganese Ion Binding (GLP5) 80487117341805-4.71-267.44-2.56-1.08871100612651306-1.42-46.38-1.41-0.45 Copper Ion Binding (UCC2) 233318889961202-2.36-257.78-1.270.941873211119432434-0.34-9.03-0.26-0.14 Peroxidase, Putative (AT5G19890) 1209162020772586-2.24-184.61-0.97-0.721309159539984478-6.88-400.00-2.00-1.55 Min: Bayesian -13.87 Binomial –inf Student T-test -5.63

15 Motifs Analysis: The First Approach Select Potential Targets of transcription factor SPL7 Statistical Test Background Distribution Derived From Word Counts In the Whole Genome Retrieve Promoter Sequences From the Genome Calculate Word Count For SPL7 Motif

16 Future Work  Research New Statistical Methods to Better Identify Differentially Expressed Genes  Use of Non Fixed Window For Bayesian T-test  Finish Analysis of Motifs That Regulate the Differentially Expressed Genes  Identify Transcribed Non Coding RNAs (e.g. microRNAs)

17 Acknowledgements  UCLA and the Pellegrini Lab  Dr. Matteo Pellegrini  Dr. David Casero Díaz-Cano  Dr. Shawn Cokus  Collaborators  Ute Krammer University of Heidelberg, Germany  Sabeeha Merchant University of California Los Angeles  SoCalBSI Instructors and Fellow Researchers  Funding www.ucla.edu  Dr. Jamil Momand  Dr. Sandy Sharp  Dr. Nancy Water-Perez  Dr. Wendie Johnston  Dr. Beverly Krilowicz  Dr. Silvia Heubach  Dr. Jennifer Faust  National Institutes of Health  National Science Foundation  Economic & Workforce Development  The Department of Energy http://instructional1.calstatela.edu/jmomand2/index.html


Download ppt "Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano,"

Similar presentations


Ads by Google