Download presentation
Presentation is loading. Please wait.
Published byAdrian Jenkins Modified over 9 years ago
1
Making Linked Data Diachronic Vassilis Christophides University of Crete & FORTH-ICS Heraklion, Crete
2
Data as an asset! One of the most significant changes of the past decade has been the widespread recognition of data as an asset – Data is the new “raw material of business” – Economist Data Products
3
Emerging Data Ecosystem Big Data has blurred the distinction between public and private Public Volunteered Data Curated Data Observed Data
4
Emerging Data Subjects data marketers data brokers data aggregators http://www.ftc.gov/bcp/workshops/privacyroundtables/personalDataEcosystem.pdf A series of data stewards, custodians, and curators are producing, consuming and brokering data products forming a far more complex value making chain than in traditional enterprise or scientific contexts
5
What to Do with this Data? Search: – Find structured data when it’s relevant to search queries Visualize, enhance, communicate to relevant audiences – Support Communities [bio- diversity, climate, water, …] Relate data across sources Fusion data from multiple sources – Data integration! Microsoft’s Approach to Big Data
6
Emerging Data Life-cycle http://www.ipsr.ku.edu/naddi/about.shtml
7
Data as a Service (DaaS) Data as a Service Software as a Service Platform as a Service Infrastructure as a Service © www.emc.com/collateral/software/white-papers/h10839-big-data-as-a-service-perspt.pdf DaaS promises that data products can be provided on demand to the user regardless of geographic or organizational separation of provider & consumer DaaS brings the notion that data related services can happen in a centralized place – aggregation, quality, cleansing and enriching data and offering it to different systems, applications or mobile users, irrespective of where they were – Virtualized – On-demand – Self-service – Scalable – Pay as you go
8
Data Marketplaces Services that make it easy to find data from a range of secondary data sources, then consume the data in a usable and unified format – Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers (DataMarket.com) Data Aggregation and Curation Layer Data Connection Layer Data Visualization and Analysis Layer Data Hosted by Third Party Data Hosted by Data Provider Data Hosted in Marketplace Data as a Service Preservation Service
9
9 Vertical Data Markets François Bancilhon Data Publica “de data rerum” WOD Tutorials 2013 Paris VerticalExampleSize (M€) FinancialReuters300 PressPress Index250 LegalFrancis Lefebvre240 SolvabilityAltarès160 Scientific Technical Medical Meteo France160 ImageSipa60 EconomySociété.com55 MarketingAcxiom55 PatentsReuters25
10
Only a Small Portion of Big Data! idgknowledgehub.com/idc-releases-first-worldwide-big-data-technology-and-services-market-forecast-shows-big-data-as-the-next-essential-capability-and-a-foundation-for-the-intelligent-economy/2012/05/07/
11
Data Hub for Market Intelligence Source Hjalmar Gislason DataMarket, Inc Emerging DaaS business models: A case study European Data Forum (EDF), Dublin 2013
12
hortonworks.com/blog/7-key-drivers-for-the-big-data-market
13
Potential Benefits of Linked Data for Data Marketplaces Abstraction layer for virtualized data access across sources – Basis for enabling automation of datasets discovery, linking&fusion Flexible data representation model (RDF) and global identifiers for all objects (URI) – Makes easier incremental data integration, interactive exploration and ad hoc analysis of data Interlinked datasets – Newly added data can be integrated with existing ones in the marketplace – Network effects Data marketplace interoperability – Data from different marketplaces can be easily federated Derived knowledge / facts – RDF inference of additional implicit facts
14
Web Data of Increasing Standardization Not all linked data is open and not all open data is linked! ★ Available on the web (whatever format) but with an open license, to be Open Data ★★ Available as machine-readable structured data (e.g. excel vs. image scan of a table) ★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel) ★★★★ as (3), plus using open standards from W3C (RDF and SPARQL ) to identify things through dereferenceable HTTP URIs, to ensure effective access ★★★★★ as all the above plus establishing links between data of different sources File format Recommendations (on a scale of 0-5) csv ★★★ xls ★ pdf ★ doc ★ xml ★★★★ rdf ★★★★★ shp ★★★ ods ★★ tiff ★ jpeg ★ json ★★★ txt ★ html ★★
15
Key Players Offers Classification Data Cube +
16
DIACHRON Objectives & Approach Appraising Integrating Archiving Producing Publishing Cleaning Preserve (semi-)structured, interrelated, evolving data by keeping them constantly accessible & reusable from an open framework such as the Data Web Calls for effective & efficient techniques to manage the lifecycle of web data involving data producers, curators, brokers and consumers – Pay-as-you-go data preservation spreading costs among key players in a community of interest Diachronic Data: Enhance data with temporal and provenance annotations as data products are re-used through complex value making chains
17
DIACHRON Research Agenda How can we assess the quality of harvested datasets in order to decide which (the data quality dimensions problem) and how many versions of them deserve to be preserved for future use (the appraisal problem)? How can we understand dependencies of datasets (the provenance problem) and how can metadata (temporal, spatial, thematic) can be smoothly represented along the data (the annotation problem)? How can we monitor changes of third-party datasets (the evolution tracking problem) or how can local/remote data imperfections (e.g., due to change propagation) can be repaired (the curation problem)? How do we cite particular versions of a dataset (the citation problem), and how will we be able to retrieve them when looking up a reference (the long term accessibility problem)? How do we maintain the consistency of multiple versions of dependent datasets (the archiving problem) and how we will access the datasets along their evolution history (the longitudinal querying problem)?
18
qq WP4 WP6 WP5 WP9 WP3 WP2 WP8 WP7 DIACHRON Data Services & Work Plan
19
Diachronic Data Services Lifecycle Data Repurposing Data Archiving Data Evolution Data Appraisal Data Citation
20
Concluding Remarks The integrated DIACHRON platform and services aim to support long term usability of open and/or linked data published in the Web and within Enterprise Intranets The concept of diachronic data intends to foster self- preserving data embedding an understanding of their evolving semantics, use contexts, and interpretations DIACHRON is expected to: Improve our understanding of how linked/open data evolves Reduce the maintenance costs when integrating linked/ open data Foster data accountability and transparency in open dynamic data spaces Address sustainability issues for preserving Big Data Fix Overall Data Preservation Effort
22
Business Models for Linked Data Publishers http://chiefmartec.com/2010/03/business-models-for-linked-data-and-web-30
23
Business Webs as Types of Value Creation Agora: Open electronic marketplaces with regard to pricing and offered products (e.g. Android marketplace) Aggregation: Closed, controlled electronic marketplaces (e.g. Apple App Store) Distributed Network: Value Network Value Chain: ICT-enabled Value Chains Alliance: Loosely cooperation market players (e.g. Open Source projects)
24
Data-Driven Business Models Source Michalis Vafopoulos
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.