Download presentation
Presentation is loading. Please wait.
Published byAshley Beasley Modified over 9 years ago
1
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH
2
Outline Background on GEO datasets What is GEO2R and how can it help you How to use GEO2R Options and features Limitations and caveats Hands-on exercise
3
An international public repository that archives and freely distributes high-throughput microarray & NGS data submitted by the scientific community About a billion individual gene expression measurements, derived from over 100 organisms, wide range of biological issues Data can be explored, queried and visualized using user-friendly Web-based tools
4
GEO data organization [ GPLxxx ][ GSMxxx ][ GSExxx ] [ GDSxxx ]
5
What kinds of data does GEO host? GEO was designed around the common features of most of the high-throughput and parallel molecular abundance-measuring technologies in use today. These include: – Gene expression profiling by microarray or next-generation sequencing – Non-coding RNA profiling by microarray or next-generation sequencing – Chromatin immunoprecipitation (ChIP) profiling by microarray or next- generation sequencing – Genome methylation profiling by microarray or next-generation sequencing – Genome variation profiling by array (arrayCGH) – SNP arrays – Serial Analysis of Gene Expression (SAGE) – Protein arrays
6
What is GEO2R ? Interactive web tool that allows users to compare two or more groups of Samples in a GEO Series in order to identify genes that are differentially expressed across experimental conditions Uses GEOquery and Limma R packages from Bioconductor project Simple interface that allows users to perform R statistical analysis without command line expertise Does not rely on curated ‘DataSets’ and interrogates the original Series Matrix data file directly
7
How to use GEO2R Enter a Series accession number – Follow a link from a Series record OR – Enter a Series accession number Define Sample groups – Atleast 2, upto 10 groups can be defined Assign Samples to each group – Not all samples in a series need to be selected Perform the test – Assess sample value distributions – Edit default test parameters Interpret the results – Table of the top 250 genes ranked by p-value – Select columns to be included in the output table – Edit the test parameters -> Recalculate to apply edits – Download the tab-delimited table and open in Excel
8
Options and features Value distribution – Number summary or boxplot – Median centered values indicative that data are normalized and cross- comparable Options – Apply adjustment of p-values – Apply log transformation to the data – Category of Platform annotation to display on results (NCBI generated (preferred) or Submitter supplied) Profile graph R script
9
Limitations & caveats Check that Sample values are comparable – Assess the value distribution boxplot – Review the GEO Series experiment description Data type restriction – Some GEO data do not have data tables (eg. High-throughput sequencing or genome tiling arrays) Within-Series restriction – No cross-series comparisons 255 Sample limit 10 minute timeout
10
Summary statistics from Limma
11
Hands-on exercise Google: GSE18388 Microarray Analysis of Space-flown Murine Thymus Tissue
13
Further learning resources on GEO2R Full description: – http://www.ncbi.nlm.nih.gov/geo/info/geo2r.html http://www.ncbi.nlm.nih.gov/geo/info/geo2r.html Youtube Video: – https://www.youtube.com/watch?v=EUPmGWS8ik0 https://www.youtube.com/watch?v=EUPmGWS8ik0 Example walkthrough: – http://www.bioinformatics.polimi.it/masseroli/bcbm m/material/practices/E2_GEO2R_Bioconductor_Tutori al.docx http://www.bioinformatics.polimi.it/masseroli/bcbm m/material/practices/E2_GEO2R_Bioconductor_Tutori al.docx
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.