Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS
What this talk will cover Open data The web of data RDF triples RDF graphs The Linked Open Data project Publishing to the web of data Consuming the web of data
Open data The philosophy and practice of making data freely available to everyone, without restrictions from copyright, patents or other mechanisms of control.
Why make data open? Public money was used to fund the work, so it should be available to the public. Facts cannot legally be copyrighted. Sponsors of research do not get full value for money unless the resulting data are made freely available In scientific research, the rate of discovery is accelerated by better access to data. Source: How to Make the Dream Come True: The Astronomers Data Manifesto (Norris, 2007)
How to make open data useful… Principles Make it easy to find Make it available to everyone Separate it from the applications that use it Interlink it with related datasets in a meaningful way Make it machine processable
The web of data The web of data = a naming model + a data model on the web It’s a web of interlinked data that machines can read (whereas the web is a web of interlinked documents for people to read) Also known as the “Semantic Web” because of its formal semantics for reasoning and its relationship to meaning
The web of data It is an initiative of the World Wide Web Consortium (W3C), and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything.
The web of data It is an initiative of the World Wide Web Consortium (W3C) and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything. However, they need to say it in RDF, not HTML.
The web of data It is an initiative of the World Wide Web Consortium (W3C) and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything. However, they need to say it in RDF, not HTML. And anything they want to talk about has to be a URI.
URI = Uniform Resource Identifier The naming model for the web of data A URI is a unique name that identifies a resource A resource is anything to which we can attach identity A resource can be an information object, like a document or a webpage, but it can also be a real world object, like a person. It can be anything at all. For example: A URL is a kind of URI that names the resource and also indicates a means of acting upon or obtaining it via its primary access mechanism e.g. http, ftp URL: rg/People/Berne rs-Lee/ URL: TR/rdf-concepts/
RDF = Resource Description Framework A framework for describing and linking resources on the web Allows URIs to be connected into a directed graph Based on the idea of triples Subject Predicate Object
RDF = Resource Description Framework A framework for describing and linking resources on the web Allows URIs to be connected into a directed graph Based on the idea of triples: e.g. intersect.org.au/inter sect- team/AnneCregan intersect.org.au doac:organization
RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Putting triples together creates a graph intersect.org.au/inter sect- team/AnneCregan
RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Putting triples together creates a graph Nodes of the graph are URIs and literals intersect.org.au/inter sect- team/AnneCregan “Anne” foaf:firstName
RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Has a schema to describe relationships between things, called RDF Schema intersect.org.au/inter sect- team/AnneCregan “Anne” foaf:firstName
RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Is a World Wide Web consortium (W3C) Recommendation Is part of the Semantic Web “stack” intersect.org.au/inter sect- team/AnneCregan “Anne” foaf:firstName
Semantic Web Technology Stack The Semantic Web standards build on each other URI is the naming mechanism RDF, RDF-Schema and OWL are the languages for describing resources and relationships between them SPARQL is a query language for querying RDF graphs
RDF Graphs Putting triples together creates a directed graph
RDF Graphs Putting triples together creates a directed graph
RDF Graphs Graphs can be interconnected by referring to URIs in other graphs
RDF Graphs
Linking Open Data Project Community project of the W3C Semantic Web and Outreach (SWEO) group Started in 2007 Has grown rapidly by members of the community adding open datasets Has created the largest existing RDF graph – over 18 billion triples!
Linking Open Data Project October 2007
Linking Open Data Project September 2008
Linking Open Data Project July 2009
Linking Open Data Project July 2009
Linking Open Data Project April 2010
Linking Open Data Project As at May 2009 had created a linked open data cloud of 4.7 billion RDF triples; in April 2010 Linked Open Numbers added another 14 billion triples Datasets include: – DBpedia – linked data version of wikipedia – US Census – 2000 US Census data set – Gene Ontology – annotations from Gene Ontology db – Drug bank – info about FDA approved drugs – UniProt – life sciences data set – Lots of bio/life sciences data sets - BIO2RDF cloud More info at cts/LinkingOpenData/DataSets cts/LinkingOpenData/DataSets
Publishing to the Linked Open Data Cloud – Principles 1.Use URIs to name things 2.Use HTTP URIs so you can look up those things on the web 3.When someone looks up a URI, provide useful information (“dereference-able”) 4.Include RDF statements that link to other URIs so that they can discover related things These principles are from Tim Berners-Lee‘s 2007 note:
Consuming linked open data Browsing linked data is easy You need an RDF Browser like Tabulator, Disco, Zitgist, Marbles and OpenLink Let’s go for a ride on Disco: berlin.de/rdf_browser/ Start here: berlin.de/rdf_browser/ We can travel through the linked open data cloud between URIs linked using RDF RDF Browsers include Marbles
Consuming linked open data eResearch example: Enabling drug discovery Data sets published to the data cloud: – Linked CTLinked Clinical Trials 60,000 trials in 158 countries – DrugBankFDA-approved drugs 5,000 small molecule and biotech drugs – DiseasomeDisorders and Disease genes 4,300 Disorders, disease genes and associations – DailyMedChemical structures of marketed drugs 124,000 triples and 29,600 links – SWAN Alzheimers Hypothesis Browser Knowledgebase
Consuming linked open data Using an RDF browser: See all drugs in trials for Alzheimer’s disease in Linked CT, including a Phase III trial for Varenicline Follow a link to data from DailyMed showing that Varenicline is already on the market for nicotine addition. The typical dose is 1mg twice daily and the Linked CT trial used no higher than that so no new safety issues. Link to DrugBank to find that Varenicline is an alpha-4 beta-2 neuronal nicotine acetylcholine receptor agonist. Diseasome indicates that the corresponding genes are only important in nicotine addiction, not Alzheimers. But the SWAN Knowledgebase shows there are hypotheses relating Alzheimers to nicotinic receptors through amyloid beta.
Consuming linked open data Using the linked open data cloud with an RDF browser, able to : Browse data relating to companies, clinical trials, drugs, diseases and genetic variation See when extra data is available Gain access to data without needing to map identifiers and synonyms – interlinking has already been done Gain additional insights about interesting questions to ask Jentzsch et al “Enabling Tailored Therapeutics with Linked Data” events.linkeddata.org/ldow2009/papers/ldow2009_paper9. pdf
Consuming linked open data Querying using SPARQL Queries A SPARQL endpoint enables users (human or other) to query a knowledge base via the SPARQL language. Results are typically returned in one or more machine-processable formats. Examples:
Types of Queries Selection and extraction queries retrieve parts of the data based on its content, structure, or position Reduction queries specify which part of the data not to include in the answer Restructuring queries restructure data into possible formats/serialisations Aggregation queries aggregate several data item into one new data item Combination and inference queries combine information that is not explicitly connected
Summary Open data The web of data RDF triples RDF graphs The Linked Open Data project Publishing to the web of data Consuming the web of data
Thankyou More details are at – – yProjects/LinkingOpenDatahttp://esw.w3.org/topic/SweoIG/TaskForces/Communit yProjects/LinkingOpenData – Questions and comments may be ed to