Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –Arabidopsis mutants –Yeast Cell Cycle –Yeast Rosetta
Human Tissue Atlas 79 human tissues (in duplicates) Among the tissues are: –brain samples –heart and liver samples –fetal samples probe sets (HGU133A subset) Preprocessed data: –normalized –expression index calculated Used for investigation of global trends and chromosomal organization of transcription, evaluation of gene prediction Su et al.,
Leukemia Data 84 bone marrow samples from children with acute lymphoblastic leukemia (ALL) 70 B-cell ALL, with 4 different subtypes: –15 BCR-ABL –18 E2A-PBX1 –17 Hyperdiploid –20 TEL-AML 14 T-cell Platform: Affymetrix HGU133A Dataset has previously been used for classification problems For more information: Ross et al., Blood,
Spike In Dataset Subset of the SPIKE-IN HGU95 Latin square data “Normal” sample + spike-in of transcripts that hybridize to 14 probe sets (The concentrations of the spike-in is known) 2 series of concentrations: Each probe set is spiked in, in two different concentrations (pM). 12 replicates for each series - four replicates on three GeneChip batches (24 GeneChip CEL files are available in total) Previous usage: a benchmark data set for preprocessing methods ABCDEFGHIJKLMN Probe set: Series 1: Series 2:
Arabidopsis Mutants Data Set Samples (3x) WT- Wild type mpk4- MAP Kinase 4 ctr1- Constitutive Triple Response mpk4/ctr1- Double mutant Platform: –All data is from Affymetrix ATH1 GeneChip ® –22810 probe sets, ~ all genes Background: –MPK4 is central to the response to the plant hormone salicylic acid (SA). –CTR1 plays a key role in ET perception. –SA and ET are partially antagonistic. –MPK4 may play a key role in this mechanism.
LOCAL RESPONSE SA PR1 PR2 PR5 BIOTROPH RESISTANCE MPK4 PDF1.2 b-CHI GST NECROTROPH RESISTANCE ET ETR1-s CTR1 EINs-ERFs MPK6 ARABIDOPSIS SYSTEMIC IMMUNITY PATHWAYS ? NPR1
Yeast Cell Cycle Data The experiment: –Three time-series, where samples were taken from a synchronized yeast cell culture as it progresses through the cell cycle. Three different synchronization methods to arrest the cell cycle: –Two temperature sensitive mutant strains (Cdc15 and Cdc28) that cannot pass the cell cycle at high temperature –Rapid removal of mating factor alpha from the culture, which releases it from arrest. Aim of the original studies: – to determine the genes that fluctuate in expression during the cell cycle – to characterize when in the cell cycle these genes are expressed and repressed. The data set: –three separate files, normalized and preprocessed data.
Yeast Rosetta Compendium Dataset consisting of a compendium of expression profiles: –276 deletion mutants (69 of which where unknown at the time) –11 tetracycline-regulatable essential genes –13 compound treatments Data: –P-values and logratios –generated by comparison with 63 control experiments. Data originally used for identifying gene clusters and profiling of unknown ORFs and drug targets. For more information: Hughes et al., Cell, 2000
Data Set Overview # genes / probe sets platformorganism# samplesdata Human Tissue 22215Affymetrix HGU133A (custom) Homo sapiens 158 (2 x 79) expression values Leukemia22215Affymetrix HGU133A Homo sapiens 84 ( ) expression values Spike-in12559Affymetrix HGU95A Homo sapiens 24 (3 x 4 x 2) CEL files Arabidopsis mutants 22746Affymetrix ATH1 Arabidopsis thaliana 12 (3 x 4) CEL files Cell cycle~6000cDNA arrays Affymetrix S. cerevisiae59 ( ) scaled expression values Rosetta6251cDNA arraysS. cerevisiae300 ( ) logratios + p-values
Practical stuff Where: Data.sets directory, see link in your home directories When: Week 1: Wednesday: Problem formulation Thursday: Public data - available for project, discussion (Human tissue, Spike-in, Cell cycle) Week 2: Monday:Public data - available for project, discussion (Leukemia, Plant mutants, Rosetta compendium) Tuesday:Project outline Wednesday: 13:00, deadline for problem formulation - hand in written P.F.