ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

Slides:



Advertisements
Similar presentations
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Advertisements

Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Gene Ontology John Pinney
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
Evidence-Based Information Retrieval in Bioinformatics
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL
Gene Expression Omnibus (GEO)
The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
EBI is an Outstation of the European Molecular Biology Laboratory. EBI Bioinformatics Roadshow ILRI/BecA Nairobi Campus 2 nd - 3 rd March 2011.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Ranking-Aware Integration and Explorative Search of Distributed Bio-Data Dipartimento di Elettronica e Informazione NETTAB 2012 Integrated Bio-Search November.
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Copyright OpenHelix. No use or reproduction without express written consent1.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
The Stanley Neuropathology Consortium Integrative Database: A novel web-based tool for exploring neuropathological traits, gene expression and associated.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Mining Functional Genomics Data ArrayExpress and Gene Expression Atlas: Amy Tang, PhD ArrayExpress Production Team Functional Genomics.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Dr Sarah Morgan Training team
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Describing Bioinformatic Metadata at EBI James Malone
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
ArrayExpress Ugis Sarkans EMBL - EBI
CCRC Cancer Conference November 8, 2015.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
ArrayExpress and Gene Expression Atlas:
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Using ArrayExpress.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
How to store and visualize RNA-seq data
Gene Expression Omnibus (GEO)
Presentation transcript:

ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

What is functional genomics (FG)? The aim of FG is to understand the function of genes and other parts of the genome FG experiments typically utilize genome-wide assays to measure and track many genes (or proteins) in parallel under different conditions High-throughput technologies such as microarrays and high-throughput sequencing (HTS) are frequently used in this field to interrogate the transcriptome 2ArrayExpress

What biological questions is FG addressing? When and where are genes expressed? How do gene expression levels differ in various cell types and states? What are the functional roles of different genes and in what cellular processes do they participate? How are genes regulated? How do genes and gene products interact? How is gene expression changed in various diseases or following a treatment? 3ArrayExpress

Components of a FG experiment ArrayExpress4

FG public repositories: ArrayExpress  Is a public repository for FG data, which provides easy access to well annotated data in a structured and standardized format  Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ  Facilitates the sharing of experimental information associated with the data such as microarray designs, experimental protocols,……  Based on community standards: MIAME guidelines & MAGE-TAB format for microarray, MINSEQE guidelines for HTS data ( ArrayExpress5

Community standards for data requirement  MIAME = Minimal Information About a Microarray Experiment  MINSEQE = Minimal Information about a high-throughput Nucleotide SEQuencing Experiment  The checklist: ArrayExpress6 RequirementsMIAMEMINSEQE 1. Experiment design / background description 2. Sample annotation and experimental factor 3. Array design annotation (e.g. probe sequence) 4. All protocols (wet-lab bench and data processing) 5. Raw data files (from scanner or sequencing machine) 6. Processed data files (normalised and/or transformed)

MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray or HTS experiments IDFInvestigation Description Format file, contains top-level information about the experiment including title, description, submitter contact details and protocols. SDRFSample and Data Relationship Format file contains the relationships between samples and arrays, as well as sample properties and experimental factors, as provided by the data submitter. ADF Array Design Format file that describes probes on an array, e.g. sequence, genomic mapping location (for array data ony) Data filesRaw and processed data files. The ‘raw’ data files are the trace data files (.srf or.sff). Fastq format files are also accepted, but SRF format files are preferred. The trace data files that you submit to ArrayExpress will be stored in the European Nucleotide Archive (ENA). The processed data file is a ‘data matrix’ file containing processed values, e.g. files in which the expression values are linked to genome coordinates.European Nucleotide Archive Standards for microarray & sequencing MAGE-TAB format ArrayExpress7

What is an experimental factor?  The main variable(s) studied in the experiment  It often is the independent variable of the microarray or HTS experiment. Values of the experimental factor (“factor values”) should vary.  Examples: ArrayExpress8 Experiment designFactor Factor ValuesNot factor  human blood samples vs mouse blood samples organismHomo sapiens, Mus musculus organism part (blood only) lung samples from male C57BL/6 mice vs lung samples from male 129 mice strainC57BL/6, 129 organism part (lung only), sex (male only)

ArrayExpress9 ArrayExpress – two databases

What is the difference between them? ArrayExpress Archive Central object: experiment Query to retrieve experimental information and associated data Expression Atlas Central object: gene/condition Query for gene expression changes across experiments and across platforms ArrayExpress10

ArrayExpress – two databases ArrayExpress11

ArrayExpress Archive – when to use it? Find FG experiments that might be relevant to your research Download data and re-analyze it. Often data deposited in public repositories can be used to answer different biological questions from the one asked in the original experiments. Submit microarray or HTS data that you want to publish. Major journals will require data to be submitted to a public repository like ArrayExpress as part of the peer-review process. ArrayExpress12

How much data in AE Archive? (as of mid-September 2012) ArrayExpress13

HTS data in AE Archive (as of mid-September 2012) Microarray vs HTS RNA-, DNA-, ChIP- seq breakdown ArrayExpress14

ArrayExpress ArrayExpress15

Browsing the AE Archive Search results in tabular format; each row represents an experiment. For each experiment data is available for download and metadata for further exploration ArrayExpress16 Filters are available to refine a search as well as a search box to search experiments for any term of interest, using the Experimental Factor Ontology (EFO)

Searching AE with the Experimental factor ontology (EFO)  Application focused ontology modeling the relationship between experimental factors (EFs) in AE  Developed to: increase the richness of annotations that are currently made in AE Archive to promote consistency to facilitate automatic annotation and integrate external data  EFs are transformed into an ontological representation, forming classes and relationships between those classes  Combine terms from a subset of well-maintained and compatible ontologies, e.g. Gene Ontology, NCBI Taxonomy ArrayExpress17

Building EFO An example sarcoma cancer neoplasm disease Kaposi’s sarcoma Take all experimental factors sarcoma cancer neoplasm Kaposi’s sarcoma disease is the parent term is a type of disease is synonym of neoplasm is a type of cancer is a type of sarcoma Find the logical connection between them disease neoplasm cancer sarcoma Kaposi’s sarcoma [-] Organize them in an ontology ArrayExpress18

Exploring EFO An example ArrayExpress19 More information at:

Hands-on exercise 1 Find RNA-seq experiments studying human prostate adenocarcinoma ArrayExpress20

ArrayExpress – two databases ArrayExpress21

Expression Atlas – when to use it? Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas; Discover which genes are differentially expressed in a particular biological condition that you are interested in. ArrayExpress 22

Array (platform) designs relating to the experiment must be provided. Probe annotation must be adequate to enable re- annotation of external references (e.g. Ensembl gene ID, Uniprot ID) At least 3 replicates for each value of the experimental factor Maximum 4 experimental factors Adequate sample annotation using EFO terms Presence of data files: CEL raw data files for Affymetrix assays, processed data files for non-Affymetrix ones Expression Atlas construction Experiment selection criteria during curation ArrayExpress23

Expression Atlas construction Analysis pipeline genes Cond.1Cond.2Cond.3 Linear model* (Bio/C Limma ) Cond.1 Cond.2 Cond.3 Input data (Affy CEL, non-Affy processed) 1= differentially expressed 0 = not differentially expressed Output: 2-D matrix * More information about the statistical methodology: ArrayExpress24

Expression Atlas construction Analysis pipeline “Is gene X differentially expressed in condition 1 in this experiment?” Cond.1 mean Cond.2 mean Cond.3 mean Mean of all samples = a single expression value for gene X Compare and calculate statistic ArrayExpress25

genes Cond.1Cond.2Cond.3 Exp.1 genes Cond.4Cond.5Cond.6 Exp. 2 genes Cond.XCond.YCond.Z Exp. n Statistical test Statistical test Statistical test Each experiment has its own “verdict” or “vote” on whether a gene is differentially expressed or not under a certain condition ArrayExpress26 Expression Atlas construction

Summary of the “verdicts” from different experiments ArrayExpress27

Expression Atlas ArrayExpress28

Expression Atlas home page Query for genes Query for conditions Restrict query by direction of differential expression The ‘advanced query’ option allows building more complex queries ArrayExpress29

Hands-on exercise 2 Find genes in the “androgen receptor signaling pathway” which are (i) expressed in prostate carcinoma and (ii) involved in regulation of transcription from RNA Pol II Hands-on exercise 3 Find information on Tbx1 expression in human brain ArrayExpress30