Large-scale (meta)Data Aggregators & Infrastructure Requirements the case of agriculture Nikos Manouselis Agro-Know Technologies & ARIADNE 2012, Dubai, 13/12/12
Publications, theses, reports, other grey literature Educational material and content, courseware Primary data: – Structured, e.g. datasets as tables – Digitized : images, videos, etc. Secondary data (elaborations, e.g. a dendogram) Provenance information, incl. authors, their organizations and projects Experimental protocos & methods Social data, tags, ratings, etc. (agricultural) research data
stats gene banks gis data blogs, journals open archives raw data technologies learning objects ……….. educators view
stats gene banks gis data blogs, journals open archives raw data technologies learning objects ……….. researchers view
stats gene banks gis data blogs, journals open archives raw data technologies learning objects ……….. practioners view
aim is: promoting data sharing and consumption related to any research activity aimed at improving productivity and quality of crops ICT for computing, connectivity, storage, instrumentation data infrastructure for agriculture
aim is: promoting data sharing and consumption related to any research activity aimed at improving productivity and quality of crops ICT for computing, connectivity, storage, instrumentation data infrastructure for agriculture
Publisher DateCatalog Subject ID Author Title we actually share metadata
e.g. an educational resource
…metadata reflect the context
…sometimes, data also included
metadata aggregations concerns viewing merged collections of metadata records from different sources useful: when access to specific supersets or subsets of networked collections – records actually stored at aggregator – or queries distributed at virtually aggregated collections 12
typically look like this 13 Ternier et al., 2010
typical problem: computing
typical problem: hosting
an ideal scenario
Data provider in need of hosting & storage of small- scale CMS sets up own CMS instance Data provider in need of large scale hosting & replication CMS requests space/accounts in large-scale CMS Data provider hosting CMS at own or external/commercial infrastructure interested to expose (meta)data to e- infrastructure register as data source hosted over cloud computed over grid
shares (meta)data e.g. through OAI-PMH indexed & available through CIARD RING shares (meta)data e.g. through OAI-PMH (META)DATA AGGREGATOR supported by scientific gateway computed & hosted over agINFRA grid/cloud computed over grid & hosted over cloud
computed over grid computed over grid & hosted over cloud …
its all about efficient metadata management storage issues: where components are hosted, how metadata aggregations & their versions handled/stored, scaling up computing issues: harvesting takes time/resources and needs to be invoked often, automatic tagging tasks demanding often recurring, similar workflows are needed (validate, transform, harvest, auto-tag, index) overall need
why should you care?
promoting course descriptions 22 push course information to various syndication/aggregation sites to allow users discover them – OCW search engine ( – Moodle Hub concept (
including relevant content 23 allow course creator/author to find relevant material and resources to enrich course – Europeana ingestion widget ( pe_2012) pe_2012 suggest to learners additional courses and material relevant to what they access – Eummenas Moodle Widget (
developing more end-user services 24 Web portals to support user communities (e.g. thematic, geographical, social, cultural) – MACE portal ( – Photodentro Greek school collections portal ( – VOA3R social platform for researchers (
wrap up
considerations easily replicated cloud-hosted software applications (e.g. DSPACE instances) portal/service owners and software developers to use the infrastructure as a basis power up existing data & service networks
interesting: TERENA OER pilot interconnecting open educational resource repositories of NRENs
interesting: GLOBE Global Learning Objects Brokering Exchange Alliance
thank you!