On The Evolution of Terms “appellavitque Adam nominibus suis cuncta animantia et universa volatilia caeli et omnes bestias terrae...“ Genesis, 2:20 Orri Erling - Program Manager, Virtuoso Yrjänä Rankka - Developer, Virtuoso © 2008 OpenLink Software, All rights reserved.
“Adam called all things by their names” We are not the first to try The history of the "search for the perfect language" favors natural languages What gets used becomes fit for the task Language building from scratch has generally not been successful Forcing nature to fit preconceived, idealized taxonomies has generally failed © 2008 OpenLink Software, All rights reserved. 2
© 2008 OpenLink Software, All rights reserved. Usage for Linked Data New layer to the document web Use de-referenceable HTTP URIs Use #this to distinguish subject matter from document Reuse terms where can Human readable URI's are best From HTML to XML, most formats are somewhat human readable, same for the data web © 2008 OpenLink Software, All rights reserved.
Universal “Data Language”? RDF will do for grammar Useful vocabulary can only evolve in a community of practice Distributed evolution of vocabulary and diversity of names for things is a given Application determines the data structure and vocabulary © 2008 OpenLink Software, All rights reserved.
© 2008 OpenLink Software, All rights reserved. Emerging Hubs SIOC/FOAF For Web Data Spaces Dbpedia/UMBEL/OpenCYC for names of real world entities BFO for epistemology of structures and processes © 2008 OpenLink Software, All rights reserved.
Can’t Put Genie Back in the Bottle Once data is out, there it stays Single identifiers for entities are possible only in closed, application specific DWs Heterogeneity of names, overlap of descriptions, taxonomies etc is a given Meaning exists only in context, so make this explicit © 2008 OpenLink Software, All rights reserved. 6
Infrastructure Scenarios Application-specific warehouse or mapped RDBMS General warehouse with lots of graphs, a la search engine or Billion Triples Challenge Query driven harvesting a la OpenLink Sponger On-line discovery and federated SPARQL © 2008 OpenLink Software, All rights reserved.
Implications for Query Make it explicit Report what data sets and SameAs's and graphs went into producing an answer Allow the app to explicitly choose what graphs, SameAs's, taxonomies, etc. are considered Search and discoverability will drive vocabulary convergence © 2008 OpenLink Software, All rights reserved.
© 2008 OpenLink Software, All rights reserved. Sameness SameAs and equivalent are necessary and permanent features What can be considered identical depends on context Universal agreement will not happen, so let people choose whose SameAs they trust SameAs adds query cost and must be resolved at time of query SameAs cannot be forward chained at web scale because which of them are relevant is not fixed, Malicious/SPAM SameAs is inevitable © 2008 OpenLink Software, All rights reserved.
Implications For Publishers SPARQL and data self description practices are urgently needed Describe what names are used and what other data meshes with yours Explicit license © 2008 OpenLink Software, All rights reserved.
Implications for Entity NS DNS is good because of distributed, resilient storage and admin Convergence cannot be forced but should be encouraged Make administration compartmentalized a la DNS for No SPAM, No censorship People can say things in their own spaces Offer classification, e.g. UMBEL In searching for terms, rank most reused the highest © 2008 OpenLink Software, All rights reserved. 11
© 2008 OpenLink Software, All rights reserved. Conclusion Since uniformity is impossible, make diversity of identifiers explicit Build alongside the document web, de- referenceably and with #this Encourage reuse but allow innovation Needs of communities will differ according to stage of development © 2008 OpenLink Software, All rights reserved.
© 2008 OpenLink Software, All rights reserved. Thank You! http://www.openlinksw.com © 2008 OpenLink Software, All rights reserved.