Using ArrayStar with a public dataset Jean-Yves Sgro
ArrayStar Part of Lasergene Genomics Suite for analysis: Global gene expression Variation across groups Optional Qseq module to analyze RNA-Seq, ChIP-Seq, CNV, and miRNA data.
Public repositories 2 cross referenced world databases for microarray and Next Gen data: USA: Gene Omnibus (GEO) (www.ncbi.nlm.nih.gov/geo) Europe (UK): ArrayExpress (EMBL/EBI) (www.ebi.ac.uk/arrayexpress)
Search example: rhinovirus www.ebi.ac.uk/arrayexpress
Today: GEO series GSE27973 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27973 Or short URL version: (see page 14) 1.usa.gov/WiUkmP
Today: GEO series GSE27973 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27973
4 groups Our analysis today with 2 groups: find the difference between medium alone and RV16 alone
Sample files 2 types: CEL CHP We want the CEL (raw data) files
Essential: rename files RENAME GSM692115.CEL medium_Donor_1.CEL RENAME GSM692116.CEL RV16_Donor_1.CEL RENAME GSM692117.CEL CSE_Donor_1.CEL RENAME GSM692118.CEL RV16+CSE_Donor_1.CEL RENAME GSM692119.CEL medium_Donor_2.CEL RENAME GSM692120.CEL RV16_Donor_2.CEL RENAME GSM692121.CEL CSE_Donor_2.CEL RENAME GSM692122.CEL RV16+CSE_Donor_2.CEL RENAME GSM692123.CEL medium_Donor_3.CEL RENAME GSM692124.CEL RV16_Donor_3.CEL RENAME GSM692125.CEL CSE_Donor_3.CEL RENAME GSM692126.CEL RV16+CSE_Donor_3.CEL RENAME GSM692127.CEL medium_Donor_4.CEL RENAME GSM692128.CEL RV16_Donor_4.CEL RENAME GSM692129.CEL CSE_Donor_4.CEL RENAME GSM692130.CEL RV16+CSE_Donor_4.CEL Files are ALREADY RENAMED and on the Windows Desktop
Replicate names: automatic
Grouped automatically
Register with Affymetrix Affymetrix.com
From home / outside Biochem department Step 1: VPN connection Step 2: Lasergene software: ArrayStar See also page 8 dept-ra-cssc.vpn.wisc.edu
High Density Oligonucleotide Affymetrix Arrays http://www.digizyme.com/portfolio/3d/dnachip.html
http://www. affymetrix http://www.affymetrix.com/estore/about_affymetrix/outreach/educator/microarray_anim_ppt_resources.affx The next six slides will provide a simple graphical explanation of the interaction of molecules in the GeneChip® microarray at a single feature. They will help explain how this interaction leads to a fluorescing feature that illustrates a gene is expressed in a cell. Most of the molecules in this example are simplified, such as the probes being shown as 7 bases long instead if 25 base pairs. This slide shows the dimensions of a typical chip and the connection from chip to feature to probe
When the tagged (red ball) RNA (in purple) is added to the array, the RNAs “look” for their complimentary match
Match | no match This illustrates what happens when there is no match The RNA essentially bounces off the probes Even one base pair is enough to prevent the match
Match | no match The visualization occurs because the fluorescent molecules are washed over the entire array and stick to a RNA fragment tagged with the Biotin When hit with a laser in the scanner, any place the fluorescent molecule sticks, it will glow and be picked up by the computer
Credit: slide IBC course 2004 http://bioinf.wehi.edu.au/marray/ibc2004/lect2-affy.pdf
Credit: slide IBC course 2004
CEL and CDF files Image analysis of each scanned Affy array produces a CEL file containing the mean intensity (and other statistics) for each square probe cell To do anything with this file you also need the CDF (Chip Description File) which specifies the probe and the probe set that each cell belongs to CDF information is provided by Affymetrix (download) Credit: slide IBC course 2004
RMA Histogram of log(PM) with fitted model Robust Multichip Average (RMA) PM data on log2 scale: raw and fitted model Histogram of log(PM) with fitted model Credit: slide IBC course 2004
RMA normalization Force the empirical distribution of probe intensities to be the same for every chip in an experiment The common distribution is obtained by averaging each quantile across chips Credit: slide IBC course 2004