Hemera KickOff October 5th, 2010 Working Group B5 Efficient management of very large volumes of information for data- intensive applications Gabriel Antoniu, Jean-Marc Pierson
Challenges Tremendous volumes of data (up to Petabytes), increasing every year Cloud infrastructures enforce this trend Large span of diverse applications Different modalities of data: images, text, video, raw values Distributed, heterogeneous, structured or not, semantically (en-)riched, confidential Stored in DFS or DDB, Cloud storage services, Warehouses
Aim of the WG Explore research issues related to high-level services for information management (search, mining, visualisation, processing) For large volumes of distributed data Taking into account –security, efficiency and heterogeneity –applications requirements –and the execution infrastructure (grids, clouds)
Issues to be addressed Low-level: –Fault-tolerance, caching, transport, security (encryption, confidentiality), consistency, location transparency Intermediate-level: –Interoperability among storage systems –Data indexing High-level: –Data mining, data classification, data assimilation, knowledge extraction, data visualization –Metadata management
Communities involved Distributed applications Distributed systems –clusters, grids, P2P, clouds Fault-tolerant systems Databases, data mining Security Numerical algorithms
Roadmap Identify research teams –Active in the area of the WG –With experience in data-intensive applications on Aladdin-G5K –And new comers… Organize workshops and possibly schools to share and disseminate experience and knowledge