The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and Penn Center for Bioinformatics, U. Penn School of Medicine Philadelphia, PA

Goal: Developing integrated data repositories, e.g. genomics, transcriptomics, etc. along with clinical data. Integration requires standards: For efficient loading and access For data sharing Your data repositorySome other public repository

Goal: Developing integrated data repositories, e.g. genomics, transcriptomics, etc. along with clinical data. Integration requires standards: For efficient loading and access For data sharing Your data repositorySome other public repository MAGE-TAB for microarray, UHTS data OBI for describing biomedical (including clinical) data

The MGED Society Mission The MGED Society is an international organization of biologists, computer scientists, and data analysts that aims to facilitate biological and biomedical discovery through data integration.

The MGED Society Mission The MGED Society is an international organization of biologists, computer scientists, and data analysts that aims to facilitate biological and biomedical discovery through data integration. Our approach is to promote the sharing of large data sets generated by high throughput functional genomics technologies. Historically, MGED began with a focus on microarrays and gene expression data. However, the scope of MGED now includes data generated using any technology when applied to genome- scale studies of gene expression, binding, modification and other related applications. Members of MGED work to establish standards for data quality, management, annotation and exchange; facilitate the creation of tools that leverage these standards; and work with other standards organizations and promoting the sharing of high quality, well annotated data within the life sciences and biomedical communities.

MGED Standards What information is needed for a microarray experiment? –MIAME: Minimal Information About a Microarray Experiment. Brazma et al., Nature Genetics 2001 How do you “code up” microarray data? –MAGE-OM: MicroArray Gene Expression Object Model. Spellman et al., Genome Biology 2002 –MAGE-TAB Rayner et al., BMC Bioinformatics 2006 What words do you use to describe a microarray experiment? –MO: MGED Ontology. Whetzel et al. Bioinformatics 2006

New MGED-Related Activities The MGED Society mission includes facilitating deposition of functional genomics datasets (e.g. microarray studies) in public archives. In addition to addressing what and how data gets deposited, we are very much concerned with seeing that authors adhere to journal requirements for data deposition. Unfortunately, the requirement for data deposition is not being sufficiently met and important datasets are not accessible (see for example Ochsner et al Nature Methods 2008). Therefore, we ask that investigators seeking microarray and UHTS functional genomics datasets from studies published in journals requiring deposition contact us if they are unable to get them. We will then contact the authors on your behalf and inform the journal where the study was published. We will document the results on the MGED web site to assist others seeking the same dataset and to aid reviewers of related publications and grants. http://www.mged.org/wiki/index.php/Published_Dataset_Availability

New MGED-Related Activities UHTS submission to repositories –Both ArrayExpress and NCBI GEO accept functional genomic experiment submissions generated by ultra-high-throughput sequencing (UHTS) technologies. ArrayExpress and GEO have entered into a metadata exchange agreement, meaning that UHTS sequence experiments will appear in both databases regardless of where they were submitted. This complements the exchange of underlying raw data between the short read archives, SRA and ERA. Raw sequencing data submitted to ArrayExpress or GEO will be sent to ERA or SRA respectively. You do not need to submit to the sequence repositories separately. –See Helen Parkinson (ArrayExpress) and Tanya Barrett (GEO) for details.

New MGED-Related Activities UHTS Quality Working Group –Marc Salit (NIST) –Best practices for RNA-Seq Illumina (Solexa) Ambion (ABI SOLID)

New Directions for MGED Standards What information is needed for a UHTS experiment? –MINSEQE: Minimal Information about a high throughput SEQuencing Experiment. –http://www.mged.org/minseqe/ How do you annotate microarray and gene expression data? –Annotare: Tool to create MAGE-TAB. –http://code.google.com/p/annotare/ What words do you use to describe an investigation? –OBI: Ontology for Biomedical Investigations. –http://obi-ontology.org/

A draft proposal for the required Minimum Information about a high-throughput Nucleotide SeQuencing Experiment – MINSEQE (April 1, 2008) The description of the biological system and the particular states that are studied The sequence read data for each assay The 'final' processed (or summary) data for the set of assays in the study The experiment design including sample data relationships General information about the experiment Essential experimental and data processing protocols

Annotare - An open source standalone MAGE-TAB editor

MAGE-TAB Format What’s MAGE-TAB? MAGE-TAB is a simple spreadsheet view which has two files IDF - describing the experiment design, contact details, variables and protocolsIDF SDRF - a spreadsheet with columns that describe samples, annotations, protocol references, hybridizations and dataSDRF Linked data files, e.g. CEL files, these are referenced by the SDRF For single channel data one row in the SDRF = 1 hybridization, for two channel data one row = 1 channel MAGE-TAB can also be used to annotate Next Gen Sequencing data Where can I get MAGE-TAB from? ~10,000 MAGE-TAB files are available for download from ArrayExpress (GEO derived and ArrayExpress data caArray also provides MAGE-TAB files for download.

IDF file for E-TABM-34

SDRF file for E-TABM-34

Annotare Annotare - an open source MAGE-TAB Editor Annotare is an annotation tool for high throughput gene expression experiments in MAGE-TAB format. Biologists can describe their investigations with the investigators’ contact details, experimental design, protocols that were employed, references to publications, details of biological samples, arrays, and experimental data produced in the investigation. Annotare Features Intuitive graphical user interface forms for editing Ontology support, an inbuilt ontology and web services connectivity to bioportal Searchable standard templates Design wizard Validation module for syntactic and semantic checking Mac and Windows Support

Annotare Features - Templates Search, choose, and save templates

Annotare Features – Design Wizard Define species, common array designs and protocols can be pre-loaded

Ontology Support Autcomplete using preloaded EFO, or ontology term lookup at BioPortal

Excel like, or form driven annotation

Validation

Supporting Applications caArray upload ArrayExpress submissions SOFT-MAGE-TAB converter (for GEO) Similarity Search – AnnotCompute –/www.cbil.upenn.edu/RAD/php/annotCompute/ MeV data upload MAGE-TAB Bioconductor Import Generic limpopo parser for MAGE-TAB

Links Code and documentation - code.google.com/p/annotare Limpopo parser –sourceforge.net/projects/limpopo/

Annotare Acknowledgements Annotare: Catherine A. Ball, Tony Burdett, Junmin Liu, Emma K. Hastings, Michael Miller, Sarita Nair, Helen Parkinson, Ravi Shankar, Rashmi Srinivasa, Joseph White NHGRI grant P41 HG003619

OBI – Ontology for Biomedical Investigations MGED is one of many communities contributing to OBI Whereas the MGED Ontology is primarily a controlled vocabulary for use with MAGE, OBI is a well-founded ontology with logical definitions and restrictions to be used for multiple purposes (e.g., database models, text mining, file annotation)

OBI and IAO (Information Artifact Ontology) classes are shown in blue. Classes imported from other external ontologies are shown in red. Some example subclasses, such as PCR product and cell culture are included to illustrate the use of the class processed material. Partial high level structure of OBI classes

OBI – Ontology for Biomedical Investigations OBI intends to be part of the OBO Foundry Interoperable with Gene Ontology, CheBI, Phenotypic qualities (PATO), Cell Type (CL)… Learn more at –http://purl.obolibrary.org/obo/obi OBI is available through browsers like the NCBO BioPortal

Measuring the glucose concentration in blood From The OBI Consortium, The Ontology for Biomedical Investigations, under revision

An OBI representation of a MAGE-TAB file Focus on where MO terms were used in E-TABM-34

Utility of these standards for CTSA? Integration requires standards: Use Annotare to generate MAGE-TAB Use OBI when possible for source of controlled terms, modeling protocols, assays, investigations Your data repositorySome other public repository MAGE-TAB for microarray, UHTS data OBI for describing biomedical (including clinical) data

For more information see http://www.mged.org

more about standards at http://biostandards.info/ follow us on twitter @MGED_Society

MGED Meetings It’s about the science! Keeping up with the latest advances Making connections with potential collaborators

Thank you! Questions?

The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Similar presentations

Presentation on theme: "The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Similar presentations

Presentation on theme: "The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and."— Presentation transcript:

Similar presentations

About project

Feedback