Presentation is loading. Please wait.

Presentation is loading. Please wait.

GigaDB – revolutionizing data dissemination, organization and use

Similar presentations


Presentation on theme: "GigaDB – revolutionizing data dissemination, organization and use"— Presentation transcript:

1 GigaDB – revolutionizing data dissemination, organization and use
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe1 , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman. Abstract GigaScience, the online open-access open-data journal, has recently developed GigaDB, a new integrated database of ‘big-data’ studies from the life and biomedical sciences. The initial goals of GigaDB are to assign DOIs to datasets to allow them to be tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and files. We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community. GigaDB currently accepts submissions in Excel format. Example submission and template files can be found on the website ( To date, GigaDB comprises over 56 datasets and includes Genomic, Transcriptomic, Epigenomic and Metagenomic dataset types but we accept many other dataset types including proteomic and neuroimaging studies. Future goals include integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis. We are also working with ISA-Tab and other scientific standards groups to support and extend the usability and interoperability model. Keywords: DOI, Galaxy, big-data, database, informatics platform, GigaScience Background GigaDB Growing replication gap: Home page: Datasets public in GigaDB 10/18 microarray papers cannot be reproduced Ioannidis: “Most Published Research Findings Are False” >15X increase in retracted papers in last decade Lack of incentives to make data/methods available Poor metadata quality and lack of interoperability GigaSolution: deconstructing the paper Combine and integrate (via citable DOIs): Open-access journal Aspera data transfer Faster download speeds Data Publishing Platform gigadb.org GigaDB Submission Workflow Data Analysis Platform galaxy.cbiit.cuhk.edu.hk Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues). Submitter logs in to GigaDB website and uploads Excel submission Fail – submitter is provided error report Linking papers to data and analyses Curator Review Excel submission file Submitter provides files by ftp or Aspera Open-Paper Open-Data Validation checks DOI assigned DOI: /100038 Data sets 78GB CC0 data Files GigaDB Linked to Pass – dataset is uploaded to GigaDB. DOI Open-Pipelines DOI Open-Workflows XML is generated and registered with DataCite Linked to Analyses DOI: /100044 Curator makes dataset public (can be set as future date if required) DataCite XML file Public GigaDB dataset DOI /100003 Genomic data from the crab-eating macaque/cynomolgus monkey (Macaca fascicularis) (2011) Acknowledgements Thanks to: Laurie Goodman, Chris Hunter, Scott Edmunds, Tam Sneddon (GigaScience), Shaoguang Liang (BGI-SZ), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), Rob Davidson and Mark Viant (Birmingham Uni), Marco Galardini (Unifi) doi: /m9.figshare.xxxxx Cite this poster as: GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis. Scott C. Edmunds, Peter Li, Huayan Gao, Chris Hunter, Si Zhe Zhao, Ruibang Luo, Dennis Chan, Alex Wong, Zhang Yong, Tin-Lap Lee, ISA-TAB team. figshare Financial support from: Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: Correspondence: 1. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong SAR, China. 2. BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China. 3. School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 4. CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 5. HKU-BGI  Bioinformatics Algorithms and Core Tecnology Research Laboratory & Department of Computer Science, University of Hong Kong, Pok Fu Lam, Hong Kong 6. Oxford e-Research Centre, University of Oxford, Oxford, UK. No space constraints, and unlimited data and workflow hosting in GigaDB and GigaGalaxy Article processing charges for all submissions in 2013 covered by BGI Open access, open data and highly visible work freely available for distribution Inclusion in PubMed and Google Scholar © 2013 Edmunds et al. This is an Open Access poster distributed under the terms of the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Download ppt "GigaDB – revolutionizing data dissemination, organization and use"

Similar presentations


Ads by Google