Download presentation
Presentation is loading. Please wait.
Published byCollin Ramsey Modified over 8 years ago
1
MESA A Simple Microarray Data Management Server
2
General MESA is a prototype web-based database solution for the massive amounts of initial data generated by microarray analysis. (Currently stored in piles of DVDs) Uses open source software, easy to install and maintain.
3
Designed to follow the initial work- flow of microarray analysis: Manage raw data and images Enable searching and slicing of data Provide initial data filtering and normalization Export data in predefined formats – for usage in existing software analysis tools
4
Terminology Design was based on NCBI’s Gene Expression Omnibus software, and we have adopted some of their terminology: http://www.ncbi.nlm.nih.gov/geo/http://www.ncbi.nlm.nih.gov/geo/ Platform: List of elements (target IDs) that may be detected and quantified in a microarray experiment (e.g., cDNAs, oligonucleotide probesets). Currently the software is compatible with some Illumina platforms (e.g., Sentrix Human-6 BeadChip). Sample: A Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it. Each Sample record is assigned a unique ID. A Sample entity must reference only one Platform. Series: A Series ID links together a group of related Samples.
5
A Sample Record Contains: Annotation Data: (e.g., ID, series, platform, description, date, contact name) These will be used for searching and retrieving data. Raw Data: A table containing rows of target IDs, signal and quality values. (Currently supported platform contains 47,000 rows of data). It will be exported in predefined formats. TargetID AVG_Signal Detection-value GI_10047089-S 71.0 0.11865524 GI_10047091-S 5957.3 1.00000000 GI_10047093-S 581.4 0.99670402 GI_10047099-S 351.1 0.99077126 GI_10047103-S 2012.8 1.00000000 GI_10047105-S 141.2 0.88595913 GI_10047121-S 82.0 0.34541859 … Data Files (e.g., Cell/Microarray Images) Need to be archived for future reference.
6
Exporting Data Currently the system uses raw/GCT (Gene Cluster Text file) format for exporting Data. The GCT format is extremely useful and can be used as input for leading analysis software: GSEA GenePattern Mike Eisen’s Clustering Software MATISSE - Integrated Analysis of Functional Modules and High-Throughput Data
7
Implementation Uses Metadot, a leading open source portal software that can be installed in one click, very easy to customize. MySQL database backend. Apache Server. PERL plug-in scripts. Can run on Windows/Linux machine.
8
Database Implementation MySQL Database Sample Annotation Table Platform Table Chip Data Table Sample Annotation Table – Searchable fields describing each sample. Platform Table – Each platform has a blob of target IDs. (A BLOB is a binary large object that can hold a variable amount of data, our current platforms have about 47,000 target IDs) Chip Data Table – Each sample has a blob of data containing signal and p-values for each target ID. The blob is retrieved and parsed after a user requests the data.
9
Installation and Access MESA can be installed on a local server in a microarray laboratory, or on a central server. The server is accessed via any web browser using personal login accounts with administrated access levels.
10
Installation Process: Install the Metadot package: One click installs Apache, MySQL and the Metadot content management system. Install the PERL script: Copy it to the Gizmo directory, restart the server and use the ‘Manage’ menu to add the gizmo into the system. Start adding data into the system.
11
Workflow Search for Subset of Samples Select Samples Filter/Normalize/Export Data Insert Data into DB
12
Inserting Data Into the Database Annotation Data and file upload/update for each sample may be done manually using a specific form. Raw data may be uploaded from the ‘Beadstudio’ format in batches using the ‘Upload’ menu. Data may NOT be deleted from the database by a regular user.
13
Access the Server using a Web Browser
14
Searching for Data Data may be retrieved by searching zero or more fields from the annotation data. When no field is selected: all data is retrieved. Free text fields use ‘Like’ to query data without having to type the exact phrase. This is a good way to retrieve mistyped data. Future implementation should add more user defined functionality.
15
Sample Search Screen
16
Search Results Screen: You Can Click on a Sample ID
17
Sample Data Screen
18
Search Results Screen: Select 1 Sample for Annotation Update
19
Update Annotation Screen
20
Search Results Screen: Select Samples to Export Raw Data
21
Raw Data Files May be Downloaded
22
Search Results Screen: Select Samples to Export into one GCT file
23
GCT File For Client Download
24
Conclusions and Further Work The system is still in its prototype phase and has not been used intensively. Filtering and normalization functions need to be added. The data input format should be standardized. Allowing user definition of annotation fields would be very useful.
25
Thanks to Alexey and Roy Williams for the guidance and ideas.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.