Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008
Agenda Brief overview of caArray 2.0 caArray 2.0 and MAGE-TAB MAGE-TAB feedback
What is caArray? caArray is a caBIG™-compliant microarray data repository at the NCICB Developed to support a federated model of microarray data sharing Developed in line with MIAME and MAGE guidelines caArray 1.6caArray 2.0
Goals of caArray 2.0 Address Adopter feedback gained from our 1.x experience Improve the user experience for storing and retrieving data produced Simplify and improve the performance of data access through the API and grid service, for analytical applications Harmonize with caBIG™ tissue repository (caTissue) and annotation repository (caBIO) Support additional array platforms, including SNP arrays Organize the application around workflow between investigators and the labs that serve them Use an agile software development approach that will allow more frequent feature additions and better responsiveness to the user community
Features of caArray 2.0 Store array data associated with experiment and sample annotations Data entry through graphical user interface or MAGE-TAB Parse Affymetrix, Illumina and GenePix formats for expression and SNP arrays Role-based permissions for data access Programmatic access via a Java API and grid service Manage protocols and controlled vocabularies MGED Ontoloty comes pre-loaded Basic Browse and Search Functionality
caArray 2.0 Annotations Capture information for Experiment information Contacts Publications Sample Annotations Source Sample Extract Labeled Extracts Hybridizations
caArray 2.0 supported formats Parsable file formats Annotation MAGE-TAB.ADF, IDF, SDRF Array data - parsed Affymetrix Expression and SNP. CDF,.CEL,.CHP Illumina Expression and SNP.CSV GenePix.GAL,.GPR Unparsed formats Affymetrix:.dat,.exp,.rpt,.txt Illumina:.txt,.idat Agilent:.txt,.tsv ImaGene:.txt,.tiv Nimblegen:.txt,.gff
caArray 2.0 permissions Role-based permissions for each Installation Anonymous user System Administration Principle investigator/Biostatistician/Lab Administrator/Lab Scientist Data is Private until made Public Experiment title, PI, # samples are visible but experiment content is not available to the anonymous user Collaboration groups can be managed by the PI for pre-public collaboration CSM 4.0 Experiment-level and samples-level security
caArray 2.0 API and Grid Service Support for MAGE-TAB level of annotation – Simplified implementation of MAGE API provides a data service and analytical services Data service allows users to use CQL to issue queries that traverse the domain model Analytical services provide convenience methods for data access
caArray 2.0 browse and search Browse by Experiments Organism Provider Array design Search by specifying Keyword Category
MAGE-TAB in caArray 2.0 Support MAGE-TAB v1.0 – ADF, IDF, SDRF Term Source providers and associated Terms are captured as Controlled Vocabularies (Manage Vocabularies) Protocols imported and viewable in Manage Protocols Characteristics displayed on the relevant detail pages Original files are stored in association with the Experiment Edits made to the information in the UI are not reflected in these files Future feature – MAGE-TAB export based on current database values
MAGE-TAB for data migration caArray 1.6 >> caArray 2.0 Experiments in caArray 1.6 being migrated to 2.0 are being exported in MAGE-TAB format along with the associated native array data files Challenges included MAGE-OM >>MAGE-TAB mapping Most challenges due to validation that all data “made it” over (not really a MAGE-TAB issue) Manual checking still needed Jackson Labs internal MAD database >> caArray 2.0
MAGE-TAB Feedback Initial experience with end-user-type customers is that there is a learning curve associated with using the SDRF, especially with regard to applying controlled vocabularies Need tools to facilitate this Source vs. Sample vs. Extract vs. Labeled Extract Often confusion over “what goes where” From Jackson Labs: Documentation is good for a biologist-type end-user, but software engineer would like more detail More real-life examples would be helpful
Specific requests to consider Need a way to specify required fields for particular implementations caArray UI has certain required fields – need to be able to specify these in a MAGE-TAB template Associate “Supplemental” files with an experiment In IDF, recommend adding a field to specify the type of array experiment (Gene Expression, SNP, aCGH, etc.)