Introduction an Open Source, Open Data international collaboration, based entirely in the internet started following a CECAM meeting in Zaragoza: 010/ motivated by 3 key ideas: scientific data (and ideally codes too) should be "open". a standard data model/format is a Very Good Thing. universally accessible and open databases of the results of calculations are scientifically highly valuable. create a useful infrastructure and consolidate the model around the tools; the "If you build it, they will come". approach.
Ideas Standard Data Model: different codes can interoperate to create complex workflows. tools (e.g. GUI's) can operate on the input and output of any code supporting the format. if a semantic model underlies the format, data can easily be validated. Open results databases: codes can be easily validated and benchmarked. are essential for the development of new methods. avoid costly duplication of results. provide a valuable resource for data mining. an easy, automated way of generating and archiving supporting information for publications.
Approach modular tools, so the same technology for creating community databases as for indexing local files on a desktop (personal use without forcing data openness). where possible use existing tools, protocols and technologies and collaborate with other open source projects. CML as the data format. JUMBO-Converters (Java) for legacy output formats (e.g. Gaussian, NWChem log files ) to CML and other transformations (looking at ANTLR as a complementary approach). Lensfield scours filestores, converts and organises files. RESTful system for uploading and aggregation. EMMA embargo system can control what is published from local to external respositories. repositories expose atom feeds for aggregation/indexing/status updates.
Status and Future plans CML already supports a wide range of chemical data; currently working to extend to e.g. basis sets. converters to (a subset of) CML from Gaussian, NWChem, GAMESS-US and GAMESS-UK. can upload files, and have the tools to index and search data. aim to grow the community and continue developing the tools and data model to support a wider range of codes/data. develop interfaces with codes such as Avogadro and work closely with related tools such as Openbabel. ensure the tools as user-friendly as possible. work with journal publishers to integrate the tools into the publishing workflow. help specific communities develop databases of calculations of interest to them.