Presentation is loading. Please wait.

Presentation is loading. Please wait.

https://tinyurl.com/WikidataRepo2

Similar presentations


Presentation on theme: "https://tinyurl.com/WikidataRepo2"— Presentation transcript:

1

2 https://tinyurl.com/WikidataRepo2
The free and open knowledge base Part 2: Consuming the data Repo Fringe 2017 Ewan McAndrew Navino Evans

3 Wikipedia is only one of approximately 12 projects that Wikimedia, the charitable foundation, supports. Wikidata is the newest project, created in 2012 and coming up for only its 5th birthday in October. Yet it is generating excitement because of the advantages it has over Wikipedia.

4 What is Wikidata? Wikidata is a free linked database of secondary data that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others. Bibliographic Biographic Biomedical Geographic Taxonomic Authority file And more besides We’ll start with a short introduction - What is Wikidata?

5 What form does this data take
What form does this data take? Well data on Wikidata is organised into triples. Each item of data, like David Bowie here, will have a unique identifier. With this being the English label for this unique identifier. Within this item are a series of statements. Statements consist of a property (identified with a unique P number) and a value for that property.

6 Here’s how that information is organised on a typical Wikidata page.
This information can also be used to create articles on Wikidata by adding a few simple words to turn the structured data into sentences until the article can be expanded by an editor. This is something that acts as an article placeholder for smaller language Wikipedias. The power of structured data like this is it can be queried (i.e. ask questions of the data)

7 SECTION 1 SHOWCASING USES OF WIKIDATA QUERIES
So how can we consume the data in Wikidata? What use is the data being put to?

8 National Library of Wales
5000 artworks added to Wikidata by a Wikidata Visiting Scholar at Leeds Library. Items were created for artworks in their collections and statements were added to these items according to genre, type of materials, place depicted. Because Wikidata is a linked database, all these connections are richer than they would be if they were on their own. E.g. William Crane on his own doesn’t tell us much but Wikidata has linked data about William Crane too. Extra exposure in all kinds of third party tools. Demonstrate the timeline and bubble chart links Timeline of NLW collection works Link to Crotos Sum of all paintings project Lists.

9 Sharing open knowledge about Voltaire’s histories
Link to Histropedia Wikidata Timeline Viewer Blog article by Martin Poulter. To raise awareness of Voltaire as a historian, Martin Poulter used three tools: Wikipedia Histropedia: a free tool for creating engaging, interactive visualisations Wikidata: a free database and sister site of Wikipedia that drives Histropedia and other visualisations

10 Panama Papers P106: occupation P793: significant event
Q : Panama Papers Using Wikidata to discover facts that would otherwise have taken years to uncover.

11 MPs’ occupations and place of education.
Link to Wikidata query - occupation. Link to Wikidata query - education. Image of Ken Clarke by Chris McAndrew (CC-BY) MySociety is a global family of projects using open data to connect people to democracy. They’re keen to use Wikidata, but have been held back so far by lack of representation of politicians in Wikidata. A task for the democratic institutions (Scottish Parliament, Westminster, the European Parliament and councils) and citizens is to collaborate on an open-data model of these institutions, the offices and office-holders, and how they relate to issues and locations. EveryPolitician is one of the MySociety projects and has certainly been using WD for a while:

12 Doctoral Thesis Metadata
Oxford Research Archive has 3237 Oxford doctoral theses on open access for anyone to download and read. ORA are sharing their doctoral thesis metadata with Wikidata. Query showing all doctoral theses on Wikidata. New property: P Dissertation submitted to How Wikidata links the Oxford theses - query result And the query itself. Now the standard is set, anyone with a bibliographic data about theses can share it openly, with links back to the full records and scanned files in official repositories. At the moment, Wikidata has more data about Oxford doctoral theses than all other theses put together, but more open data from more institutions will mean more useful queries and greater ease of finding the theses and linking them from Wikipedia.

13 Scholia - 2.3 million scientific articles
The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. Among several display formats available are lists of publications for individual researchers and organizations, publications per year, employment timelines, co-author networks and citation graphs. Example, Blog article + Video presentation Paper on arxiv.org by Finn Årup Nielsen. Scholia is a tool to handle scientific bibliographic information in Wikidata. A profile page for any researcher or any organization could be made on the fly. And now it is no longer just authors and organizations where there is a profile page, but also works, venues (journals or proceedings), series, publishers, sponsors (funders) and awards. We have also “topics” and individual pages showing specialized information about chemicals, proteins, diseases and biological pathways. A rudimentary search interface is implemented.

14 Uta Frith co-author graph. Location of Turing Award recipients
Slide from Dario Taraborelli at WikiCite 2017. Uta Frith co-author graph Location of Turing Award recipients

15

16 WikiCite - 3 million citations in Wikidata
WikiCite project started in 2016 Building a universal repository of sources in Wikidata. 500,000+ PMID references in Wikidata. Slide from Dario Taraborelli at WikiCite 2017.

17

18 The Zika Corpus The ZikaCorpus timeline
The Zika Corpus project on Wikidata In February 2016, the World Health Organization declared a public health emergency over the Zika virus outbreak and its links (then suspected, by now confirmed) to microcephaly and Guillain-Barré syndrome. By that time, around 150 scholarly articles had been published about the virus since its discovery in 1947, and the majority of these articles had already been assigned Wikidata items. Since then, the literature on the topic has grown about tenfold (see timeline), and the Wikidata coverage has mostly kept pace, with a typical time lag of less than a week. While not complete, this corpus covers most PubMed-indexed English-language articles reporting or reviewing original research about the Zika virus and the infections it can cause in mosquitoes, humans and animal models, as well as about approaches to prevention, diagnostics, therapy, or surveillance. The Zika corpus served as a nucleus for creating a citation graph on Wikidata and for exploring co-author networks and similar information on Wikidata. It is now slowly expanding to encompass literature about related subjects, e.g., flaviviridae and mosquito-borne diseases more broadly, epidemiological modeling or data sharing in public health emergencies.

19 Other notable examples of use cases
YLE - The Finnish Broadcasting Company, Yle, has since April 1st 2016 tagged online news and feature articles with concepts from Wikidata. Inventaire - Create an inventory of your books with Wikidata at inventaire.io WikiGenomes - A freely open, editable, and centralized model organism database for the biological research community. Paper on WikiGenomes at Biorxiv.org Quora - Links to Quora topics will be available through the Wikidata entities and also from Quora topic pages to Wikidata entities. Crotos - search and display engine for visual artworks powered by Wikidata. And much more besides. WikiGenomes (wikigenomes.org), a web application that facilitates the consumption and curation of genomic data by the entire scientific community.

20 SECTION 2 QUERYING WIKIDATA PRATICAL

21 Federated queries Run queries that combine data from Wikidata and other selected data sources on the web. List of 3rd party services supported for federated queries Simple example federated query: Works by Lope de Vega, retrieved from the BVMC digital library Lope de Vega’s unique BVMC id is determined from Wikidata This id is then used to retrieve works by Lope de Vega on the BVMC digital library Explain the principle of federated queries Only a selection of sources available

22 Getting data out of Wikidata
API For getting data about individual Wikidata items (or groups of up to 50) SPARQL Endpoint Run advanced queries and get back data for up to around 200k items Data Dump Download all available data for large scale local processing of any size Run through the options. Also explain that every Wikidata item has a linked data url for direct access. Read more about Wikidata data access →

23 Go to https://query.wikidata.org/ Try loading some example queries
SPARQL TRAINING Go to Try loading some example queries Just going to use this as a chance for them to play around while I show the first basics of the interface (points below) Demo points: Examples section Hovering Download Visualisation options Short url links / Embed links

24 Practical - Editing a query
Step 1: Load the sample query: Step 2: Modify the query to find a different set of results, by: Changing values Changing properties Removing lines Step 3: Share your query on Twitter and/or add to etherpad! DEMO: Load the sample query and explain the basic sections. Explain OPTIONAL just means get the data, IF it’s there. Explain 2 step data retrieval for coordinates (item -> country -> coordinates) Show CTRL + SPACE trick Change occupation Change country Remove gender Show language change Reminder of sharing options

25 Anything is possible! CC-0 licensed data so you can build what you want from it. Possibilities are limitless. Ways to take it forward. Thanks for listening!

26 Why contribute to another repository?
This slide and the next come from Martin Poulter, Wikimedian in Residence at the University of Oxford.

27 Enrich both repositories by combining datasets.

28 Edinburgh - data capital of Europe
What data can we share?

29 For Edinburgh to function as a “data city” would mean to link up the data created and curated by the City of Edinburgh. For instance, all these representations of aspects of RLS need to be connected, and Wikidata is the ideal platform for this.

30 Multiple gauntlets thrown down at once!
What can you share? Bibliographical data Biographical data Biomedical medical Geographical data Taxonomical data Authority file data The sharing of simple facts and statements costs nothing and benefits us all. Image taken by Robin Jay via Flickr CC-BY-SA

31 The free and open knowledge base
Wikidata The free and open knowledge base Thank you! Repo Fringe 2017 Welcome


Download ppt "https://tinyurl.com/WikidataRepo2"

Similar presentations


Ads by Google