ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL

Slides:

Advertisements

Similar presentations

The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.

Advertisements

The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.

1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.

Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.

MIAME Minimum Information About a Microarray Experiment

Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

NCBI resources III: GEO and expression data analysis Yanbin Yin Fall

Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

MyiLibrary® ‘Search & View’ Website Training June 8, 2010.

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.

Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.

Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research

Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.

Gene Expression Omnibus (GEO)

The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.

EBI is an Outstation of the European Molecular Biology Laboratory. EBI Bioinformatics Roadshow ILRI/BecA Nairobi Campus 2 nd - 3 rd March 2011.

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

Copyright OpenHelix. No use or reproduction without express written consent1.

ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.

CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.

GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Management Information Systems MS Access MS Access is an application software that facilitates us to create Database Management Systems (DBMS)

Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.

Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.

Copyright OpenHelix. No use or reproduction without express written consent1.

1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.

3/24/2005 TIGP 1 Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005.

Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.

ITGS Databases.

PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v

Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,

Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.

Gene Expression Omnibus (GEO)

Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.

This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,

1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

UoS Libraries 2011 EndNote X5 - basic graduate session.

Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

ArrayExpress and Expression Atlas: Mining Functional Genomics data Dr Sarah Morgan Training team

Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.

Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

CaArray User Community Meeting Feature Overview and Review of MAGE-TAB Update and Export Specification Call in: Participant Passcode:

Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.

Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.

Describing Bioinformatic Metadata at EBI James Malone

Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

Describing and Annotating Experimental Data: Hands On.

ArrayExpress Ugis Sarkans EMBL - EBI

GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.

AdisInsight User Guide July 2015

Using ArrayExpress.

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

How to store and visualize RNA-seq data

Functional Annotation of the Horse Genome

Gene Expression Omnibus (GEO)

Welcome to the Quantitative Trait Loci (QTL) Tutorial

Welcome - webinar instructions

Regulatory Genomics Lab

Presentation transcript:

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL

2 Services available from EBI Genomes DNA & RNA sequence Gene expression Protein sequence Protein families, motifs and domains Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Literature and ontologies

ArrayExpress3 Talk structure  Why do we need a database for functional genomics data?  ArrayExpress database Archive Gene Expression Atlas  Database content  Query the database  Data download  Data submission

What is functional genomics? Functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and interactions. Unlike genomics and proteomics, functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein-protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. Functional genomics attempts to answer questions about the function of DNA at the levels of genes, RNA transcripts, and protein products. ArrayExpress & Atlas4

Components of a functional genomics experiment Array design information Location of each element Description of each element Hybridization protocol Quantification matrix Software specifications Sample source Sample treatments RNA extraction protocol Labelling protocol Control array elements Normalization method Image Scanning protocol Software specifications Array Normalized data Raw data Sample Sample source Sample treatments Template preparation Library Library preparation Chip Cluster amplification Sequencing and imaging Data analysis From images to sequences Quality Control Sequence alignment Assembly Specific steps depending on the application Data analysis

Why do we need a database for functional genomics data? ArrayExpress & Atlas6 E-MEXP-2451 Transcription profiling of grape berries during ripening Grape berries undergo considerable physical and biochemical changes during the ripening process. Ripening is characterized by a number of changes, including the degradation of chlorophyll, an increase in berry deformability, a rapid increase in the level of hexoses in the berry vacuole, an increase in berry volume, the catabolism of organic acids, the development of skin colour, and the formation of compounds that influence flavour, aroma, and therefore, wine quality. The aim of this work is to identify differentially expressed genes during grape ripening by microarray and real- time PCR techniques. Using a custom array of new generation, we analysed the expression of 6000 grape genes from pre-veraison to full maturity, in Vitis vinifera cultivar Muscat of Hamburg, in two different years (2006 and 2007). Five time points per year and two biological replicates per stadium were considered. To reduced intra- plant and inter-plant biological variability, for each ripening stadium we collected around hundred berries from several bunch grapes of five plants of V. vinifera cv Muscat of Hamburg. We will use the real- time PCR technique to validate microarray data.Muscat of Hamburg. We will use the real-time PCR technique to validate microarray data. E-MEXP-2451 Transcription profiling of grape berries during ripening Grape berries undergo considerable physical and biochemical changes during the ripening process. Ripening is characterized by a number of changes, including the degradation of chlorophyll, an increase in berry deformability, a rapid increase in the level of hexoses in the berry vacuole, an increase in berry volume, the catabolism of organic acids, the development of skin colour, and the formation of compounds that influence flavour, aroma, and therefore, wine quality. The aim of this work is to identify differentially expressed genes during grape ripening by microarray and real- time PCR techniques. Using a custom array of new generation, we analysed the expression of 6000 grape genes from pre-veraison to full maturity, in Vitis vinifera cultivar Muscat of Hamburg, in two different years (2006 and 2007). Five time points per year and two biological replicates per stadium were considered. To reduced intra- plant and inter-plant biological variability, for each ripening stadium we collected around hundred berries from several bunch grapes of five plants of V. vinifera cv Muscat of Hamburg. We will use the real- time PCR technique to validate microarray data.Muscat of Hamburg. We will use the real-time PCR technique to validate microarray data.

Why do we need a database for functional genomics data? ArrayExpress & Atlas7

ArrayExpress  Is a public repository for functional genomics data, mostly generated using microarray or high throughput sequencing (HTS) assays  Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ  Provides easy access to well annotated microarray data in a structured and standardized format  Facilitates the sharing of microarray designs and experimental protocols  Based on FGED standards: MIAME checklist, MAGE-TAB format and MO Ontology.  MINSEQE checklist for HTS data ( ArrayExpress8

Reporting standards Data standardization efforts focus on answering 3 major questions: 1.What information do we need to capture? 2.What syntax (or file format) should we use to exchange data? 3.What semantics (or ontology) should we use to best describe its annotation? ArrayExpress9

What information do we need to capture? MIAME checklist  Minimal Information About a Microarray Experiment  The 6 most critical elements contributing towards MIAME are: 1.Essential sample annotation including experimental factors and their values (e.g. compound and dose) 2.Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….) 3.Sufficient array annotation (e.g. gene identifiers, genomic coordinates, probe sequences or array catalog number) 4.Essential laboratory and data processing protocols (e.g. normalization method used) 5.Raw data for each hybridization (e.g. CEL or GPR files) 6.Final normalized data for the set of hybridizations in the experiment ArrayExpress10

What information do we need to capture? MINSEQE checklist  Minimal Information about a high-throughput Nucleotide SEQuencing Experiment  The proposed guidelines for MINSEQE are (still work in progress): 1.General information about the experiment 2.Essential sample annotation including experimental factors and their values (e.g. compound and dose) 3.Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….) 4.Essential experimental and data processing protocols 5.Sequence read data with quality scores, raw intensities and processing parameters for the instrument 6.Final processed data for the set of assays in the experiment ArrayExpress11

MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray experiment: IDFInvestigation Description Format file, contains top-level information about the experiment including title, description, submitter contact details and protocols. SDRFSample and Data Relationship Format file contains the relationships between samples and arrays, as well as sample properties and experimental factors, as provided by the data submitter. ADFArray Design Format file, describes the design of an array, i. e. the sequence located at each feature on the array and annotation of the sequences. Data filesRaw and processed data files. The ‘raw’ data files are the files produced by the microarray image analysis software, such as CEL files for Affymetrix or GPR files from GenePix. The processed data file is a ‘data matrix’ file containing processed values, as provided by the data submitter. 12 What syntax (or file format) should we use to exchange data?

What semantics (or ontology) should we use to best describe its annotation?  Ontology, which is a formal specification of terms in a particular subject area and the relations among them.  Its purpose is to provide a basic, stable and unambiguous description of such terms and relations in order to avoid improper and inconsistent use of the terminology pertaining to a given domain. ArrayExpress13

Experimental factor ontology (EFO)  Application focused ontology modeling experimental factors (EFs) in AE  Developed to:  Consistent curation – ensure that curators use same vocabulary  Query support (e.g, query for 'cancer' and get also ‘leukemia')  EFs are transformed into an ontological representation, forming classes and relationships between those classes  EFO terms map to multiple existing domain specific ontologies, such as the Disease Ontology and Cell Type Ontology ArrayExpress14

ArrayExpress & Atlas15 What semantics (or ontology) should we use to best describe its annotation?

ArrayExpress16 ArrayExpress – two databases

ArrayExpress17 How to query AE and Atlas? AE Archive Query by experiment, sample and experimental factor annotations Filter on species, array platform, molecule assayed and technology used Gene Expression Atlas Gene and/or condition queries Query across experiments and across platforms

ArrayExpress – two databases ArrayExpress18

ArrayExpress Archive ArrayExpress & Atlas19...and counting

How much data in AE Archive? ArrayExpress20

ArrayExpress21 Archive by species

ArrayExpress22 Browsing the AE Archive

Species investigated Curated title of experiment The date when the data released AE unique experiment ID ArrayExpress23 Raw sequencing data available in ENA The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed The total number of experiments and assay retrieved The direct link to raw and processed data. An icon indicates that this type of data is available. Loaded in Atlas flag Number of assays

ArrayExpress24 Browsing the AE Archive

Experimental factor ontology (EFO) Example: ArrayExpress25 carcinoma acinar cell carcinoma adrenocortical carcinoma bladder carcinoma breast carcinoma cervical carcinoma esophageal carcinoma follicular thyroid carcinoma hepatocellular carcinoma pancreatic carcinoma papillary thyroid carcinoma renal carcinoma signet ring cell carcinoma uterine carcinoma adenocarcinoma adenoid cystic carcinoma endometrioid carcinoma gastric carcinoma hereditary leiomyomatosis and renal cell cancer hypopharyngeal carcinoma acinar cell carcinoma adrenocortical carcinoma bladder carcinoma breast carcinoma cervical carcinoma esophageal carcinoma follicular thyroid carcinoma hepatocellular carcinoma pancreatic carcinoma papillary thyroid carcinoma renal carcinoma signet ring cell carcinoma uterine carcinoma adenocarcinoma adenoid cystic carcinoma endometrioid carcinoma gastric carcinoma hereditary leiomyomatosis and renal cell cancer hypopharyngeal carcinoma

Searching AE Archive Simple query - EFO ArrayExpress26 Matches to exact terms are highlighted in yellow Matches to child terms in the EFO are highlighted in pink

Searching AE Archive Simple query  Search across all fields: AE accession number e.g. E-MEXP-568 Secondary accession numbers e.g. GEO series accession GSE5389 Experiment name Submitter's experiment description Sample attributes, experimental factor and values, including species (e.g. GeneticModification, Mus musculus, DREB2C over-expression) Publication title, authors and journal name, PubMed ID  Synonyms for terms are always included in searches e.g. 'human' and 'Homo sapiens’ ArrayExpress27

AE Archive query output Matches to exact terms are highlighted in yellow

AE Archive – experiment view ArrayExpress29 MIAME or MINSEQE scores show how much the experiment is standard compliant Experimental factor(s) and its values Link to files available. This varies between sequencing and microarray data. For microarray experiments you also have array design file. Link to files available. This varies between sequencing and microarray data. For microarray experiments you also have array design file.

AE Archive – SDRF file ArrayExpress30

SDRF file – sample & data relationship ArrayExpress31

AE Archive – ADF file ArrayExpress32

AE Archive – all files ArrayExpress33

ArrayExpress34 AE Archive – all files

Searching AE Archive Advanced query  Combine search terms Enter two or more keywords in the search box with the operators AND, OR or NOT. AND is the default search term; a search for kidney cancer will return hits with a match to ‘kidney' AND ‘cancer’ Search terms of more than one word must be entered inside quotes otherwise only the first word will be searched for. E.g. “kidney cancer”  Specify fields for searches Particular fields for searching can also be specified in the format of fieldname:value ArrayExpress35

Searching AE Archive Advanced query - fieldnames ArrayExpress36 Field nameSearchesExample accession Experiment primary or secondary accessionaccession:E-MEXP-568 array Array design accession or namearray:AFFY-2 OR array:Agilent* ef Experimental factor, the name of the main variables in an experiment. ef:celltype OR ef:compound efv Experimental factor value. Has EFO expansion.efv:fibroblast expdesign Experiment design typeexpdesign:”dose response” exptype Experiment type. Has EFO expansion.exptype:RNA-seq gxa Presence in the Gene Expression Atlas. Only value is gxa:true. ef:compound AND gxa:true pmid PubMed identifierpmid: sa Sample attribute values. Has EFO expansion.sa:wild_type species Species of the samples. Has EFO expansion.species:”homo sapiens” AND ef:cellline

Searching AE Archive Advanced query ArrayExpress37 FilterWhat is filtered assaycount:[x TO y]filter on the number of of assays where x <= y and both values are between 0 and 99,999 (inclusive). To count excluding the values given use curly brackets e.g. assaycount:{1 TO 5} will find experiments with 2-4 assays. Single numbers may also be given e.g. assaycount:10 will find experiments with 10 assays. efcount:[x TO y]filter on the number of experimental factors samplecount:[x TO y]filter on the number of samples sacount:[x TO y]filter on the number of sample attribute categories rawcount:[x TO y]filter on the number of raw files fgemcount:[x TO y]filter on the number of final gene expression matrix (processed data) files miamescore:[x TO y]filter on the MIAME compliance score (maximum score is 5) date:yyyy-mm-ddfilter by release date date: will search for experiments released on 1st of Dec 2009 date:2009* - will search for experiments released in 2009 date:[ ] - will search for experiments released between 1st of Jan and end of May 2008  Filtering experiments by counts of a particular attribute Experiments fulfilling certain count criteria can also be searched for e.g. having more than 10 assays (hybridizations)

ArrayExpress & Atlas38 Searching AE Archive Advanced query – Examples

Exercise 1 ArrayExpress39

ArrayExpress40 ArrayExpress – two databases

ArrayExpress41  The criteria we use for selecting experiments for inclusion in the Atlas are as follows: Array designs relating to experiment must be provided to enable re-annotation using Ensembl or Uniprot (or have the potential for this to be done) High MIAME scores Experiment must have 6 or more hybridizations Sufficient replication and large sample size EF and EFV must be well annotated Adequate sample annotation must be provided Processed data must be provided or raw data which can be renormalized must be available Gene Expression Atlas Experiment selection criteria

ArrayExpress42  New meta-analytical tool for searching gene expression profiles across experiments in AE  Data is taken as normalized by the submitter  Gene-wise linear models (limma) and t-statistics are applied to calculate the strength of genes’ differential expression across conditions across experiments  The result is a two-dimensional matrix where rows correspond to genes and columns correspond to conditions, rather than samples.  The matrix entries are p-values together with a sign, indicating the significance and direction of differential expression Gene Expression Atlas Atlas construction

ArrayExpress43 Gene Expression Atlas Atlas construction

ArrayExpress45 Gene Expression Atlas

ArrayExpress46 Atlas home page Query for genes Query for conditions Restrict query by direction of differential expression The ‘advanced query’ option allows building more complex queries

ArrayExpress47 Atlas home page The ‘Genes’ search box

ArrayExpress48 Atlas home page The ‘Conditions’ search box

ArrayExpress49 Atlas home page A single gene query

Atlas gene summary page ArrayExpress50

ArrayExpress & Atlas51 Atlas home page A ‘Conditions’ only query

ArrayExpress52 Atlas heatmap view

Atlas list view Click the ‘expression profile’ link to view the experiment page ArrayExpress53

Atlas data download ArrayExpress54

ArrayExpress55 Atlas experiment page Box plot showing differential expression of Mt2,Agxt2l1 and Insig2 across the growth conditions studied. Data is available for 3 separate array probes. Search for a specific gene, experimental factor or expression level (up/down)

ArrayExpress & Atlas56 Atlas experiment page – HTS data

ArrayExpress57 Atlas gene-condition query

ArrayExpress58 Atlas query refining

ArrayExpress59 Atlas gene-condition query

ArrayExpress60 Atlas query refining

ArrayExpress61 Atlas query refining

ArrayExpress62 Exercises 2 & 3

ArrayExpress63 Data submission to AE

ArrayExpress64 Data submission to AE

MIAMExpress- Overview MIAMExpress is a web based tool for submitting microarray data and microarray designs to the ArrayExpress database MIAMExpress can be used for experiments up to 50 hybridizations in size ArrayExpress & Atlas65 Array design (custom) Experiment Protocol

MIAMExpress- Getting Started ArrayExpress & Atlas66

MIAMExpress- Protocol Submission ArrayExpress & Atlas67 Change the automatically generated protocol name to something meaningful like Sanger Lab Growth Protocol

MIAMExpress- Experiment Submission ArrayExpress & Atlas68 Experiment description SamplesExtracts Labeled extracts Hybs Upload raw and norm data Upload combined data files Batchloader interface where you fill in your experiment information in a spreadsheet-like format Web page interface in which a series of web forms are filled out.

MIAMExpress-Batch Upload Tool ArrayExpress & Atlas69 Experiment description SamplesExtracts Labeled extracts Hybs Upload raw and norm data Upload combined data files HELP

MIAMExpress-Batch Upload Tool ArrayExpress & Atlas70 Experiment description SamplesExtracts Labeled extracts Hybs Upload raw and norm data Upload combined data files Click multiply to create as many rows as you need. Hint: If you complete all the values in the first row prior to this it will save you time Toolbar Cross document functionality

MIAMExpress-Batch Upload Tool ArrayExpress & Atlas71 Experiment description SamplesExtracts Labeled extracts Hybs Upload raw and norm data Upload combined data files Double click and select one or more samples per extract

MIAMExpress-Batch Upload Tool ArrayExpress & Atlas72 Experiment description SamplesExtracts Labeled extracts Hybs Upload raw and norm data Upload combined data files Use the folder icon to select the required data files Affymetrix users submit the.CEL file here (and.EXP if available). For all other submissions the raw data must be a single.txt or.gpr file.

MIAMExpress-Batch Upload Tool ArrayExpress & Atlas73 Experiment description SamplesExtracts Labeled extracts Hybs Upload raw and norm data Upload combined data files If you are supplying a Final Gene Expression Data File you must supply a transformation protocol

MIAMExpress-Data File Formats for Submission Submitted normalized data files must contain data from a single hybridization only. If your normalization procedure creates a file containing data from all your hybs then you can submit this as a final gene expression matrix (FGEM). A normalization protocol should be submitted along with your normalized data files-be precise when describing how the data was calculated A final gene expression matrix (FGEM) or combined data file is a file containing data from several hybridizations. The creation of your FGEM must be described in your transformation protocol. The format of the FGEM is as follows: each line corresponds to an array element each column corresponds to your calculated value ArrayExpress & Atlas74

Array Submission- Background An array design describes how a microarray was manufactured, what was printed/synthesized at each position on the array and what biological sequences these represent. If the array design has already been submitted to ArrayExpress, you do not need to re-submit it. If you have used a custom array design you will need to submit the array design as an Array Design Format (ADF) file After submitting an array design you can immediately continue with your experiment submission The ADF file can be created in any spreadsheet application but must be saved as a tab delimited text file. ArrayExpress & Atlas75

MIAMExpress-Array Submission ArrayExpress & Atlas76 Enter some general information about your array Checker- This tool will check for errors in the format and content of an ADF file.

MIAMExpress-Array Submission ArrayExpress & Atlas77 Location and nameLocation and name of each reporter (probe) on the array AnnotationAnnotation such as the nucleotide sequence or database accession associated with the reporter e.g. RefSeq Type of reporterType of reporter (cDNA, oligonucleotide, RNA, DNA etc) that is present Group of reporterGroup of reporter - which are the control and which are the experimental reporters Optional additional informationOptional additional information - this can be used to show that several reporters are associated with the same gene for example, or add a comment about a reporter. control_biosequence - for example a spike control_buffer - buffer spotted on the array control_empty - nothing spotted on the array control_genomic_DNA - e.g. salmon sperm DNA control_label - landing lights control_reporter_size - size standard control_spike_calibration - spike at varying concentrations control_unknown_type control_biosequence - for example a spike control_buffer - buffer spotted on the array control_empty - nothing spotted on the array control_genomic_DNA - e.g. salmon sperm DNA control_label - landing lights control_reporter_size - size standard control_spike_calibration - spike at varying concentrations control_unknown_type

MAGE-TAB Example: IDF ArrayExpress & Atlas78

MAGE-TAB Example: SDRF ArrayExpress & Atlas79

MAGE-TAB Example: SDRF ArrayExpress & Atlas80

MAGE-TAB Submission ArrayExpress & Atlas81 Submitter is directed to either submit an experiment or download a template to the desktop Ontology Link Indicate submission type

MAGE-TAB Submission ArrayExpress & Atlas82

What happens after submission? confirmation Curation The curation team will review your submission and will you with any questions. Possible reopening for editing We will send you an accession number when all the required information has been provided. We will load your experiment into ArrayExpress and provide you with a reviewer login for viewing the data before it is made public. ArrayExpress & Atlas83

Submission of High-throughput sequencing gene expression data ArrayExpress & Atlas84

MAGE-TAB Example: SDRF

ArrayExpress & Atlas86 MAGE-TAB Example: SDRF

What needs to be included in the Spreadsheet? Include Assay Name and Technology Type columns Raw files must go in the Array Data File column A sequencing protocol must be provided. The sequencing protocol should have a performer- this is used as the run center name. This protocol must have a Protocol Hardware value saying which sequencing instrument was used (e.g. 454 GS, Illumina Genome Analyzer, AB SOLiD System Reference this in the Protocol REF column before the Assay Name column. These 4 extra Comment[] columns should be added after Extract Name to provide information about how the library was prepared 1. Comment[LIBRARY_LAYOUT] - Specifies whether to expect single or paired reads. The column should contain one of the following: SINGLE or PAIRED 2. Comment[LIBRARY_SOURCE] - Specifies the type of source material that is being sequenced. The column should contain one of the following: TRANSCRIPTOMIC, METAGENOMIC, SYNTHETIC, VIRAL RNA, OTHER * 3. Comment[LIBRARY_STRATEGY] - Sequencing technique intended for the library. The column should contain one of the following: WGS, WCS, WXS, CLONE, CLONEEND, POOLCLONE, FINISHING, AMPLICON, RNA-Seq, EST, FL-cDNA, CTS, ChIP-Seq, MNase-Seq, DNase-Hypersensitivity, Bisulfite-Seq, MRE-Seq, MeDIP-Seq, MBD-Seq, OTHER * 4. Comment[LIBRARY_SELECTION] - Method was used to select and/or enrich the material being sequenced. The column should contain one of the following: RANDOM, PCR, RANDOM PCR, RT-PCR, cDNA, CAGE, RACE, ChIP, MNase DNAse, HMPR, MF, MSLL, 5-methylcytidine antibody, MBD2 protein methyl-CpG binding domain, Hybrid Selection, Reduced Representation, Restriction Digest, size fractionation, CF-S, CF-M, CF-H, CF-T, other, unspecified * ArrayExpress & Atlas87

Submissions- Key Point Don’t be afraid to ask ArrayExpress & Atlas88

Summary: ArrayExpress Search by experiment Search by keyword Link to sample properties and experiment design View experiment Search by gene across experiments Browse results summary Search by gene name, species and experimental condition Search by gene name, species and experimental condition View expression under different conditions and profiles

ArrayExpress90 Acknowledgements ArrayExpress/Atlas training is funded by the European Commission under SLING, grant agreement number (Integrating Activity) within Research Infrastructures Action of the FP7 Capacities Specific Programme.