Download presentation
Presentation is loading. Please wait.
Published byBuddy Bailey Modified over 8 years ago
1
Getting More from Your VIVO Mike Conlon, UF Melissa Haendel, OHSU Kristi Holmes, Northwestern
2
VIVO Data VIVO represents data in “triples” – subject, predicate, object The ontology is a semantic model that describes the world of scholarship For example (not actual VIVO-ISF) – mike isA faculty-member – mike wrote paper – paper hasTitle “The Long and Winding Road” – paper publishedIn “Journal of Irreproducible Results”
3
All Things have URI All things in VIVO have a Uniform Resource Locator (URI) In VIVO, URI usually look like http://yourplace/individual/nxxxxxx Mike Conlon’s URI at UF VIVO is: – http://vivo.ufl.edu/individual/n25562 http://vivo.ufl.edu/individual/n25562
4
A real triple Mike is a faculty member We have seen that “Mike” has the URI – http://vivo.ufl.edu/individual/n25562 http://vivo.ufl.edu/individual/n25562 – We will put this in brackets to tell the computer “the stuff in brackets is a URI” So perhaps: is a faculty member <http://vivo.ufl.edu/individual/n25562 So far so good. We have specified the subject.
5
And now for the predicate We want to say “is a,” as in Mike is a faculty member On the semantic Web, we say “Mike hasType Faculty Member” And the way we say “has type” is to use the rdf ontology. The rdf ontology is fundamental to the semantic web. VIVO uses a few rdf predicates. The common is rdf:type.
6
More than one ontology rdf:type – Ontology is rdf – Predicate is type So we say <http://vivo.ufl.edu/individual/n25562 rdf:type Faculty Member You might expect that “Faculty Member” here is english, not a precise statement in an ontology
7
On to the predicate VIVO-ISF defines a class FacultyMember We write vivo:FacultyMember To specify that class So to say, “Mike is a Faculty Member” we write <http://vivo.ufl.edu/individual/n25562 rdf:type vivo:FacultyMember.
8
Two Addenda 1.“rdfs:type” can be abbreviated “a” So we write: <http://vivo.ufl.edu/individual/n26652 a vivo:FacultyMember 2.When we write triples, we always end them with a period: <http://vivo.ufl.edu/individual/n25562 a vivo:FacultyMember.
9
Voila! Everything that follows has to do with specifying the triples you want, and what you want to do with them
10
We want to use VIVO Data “Use” – Make lists – Count things – Make reports – Get data out of VIVO for use in Excel Or statistics software (R, SAS, SPSS, Stata, …) Or reporting software (Crystal Reports,...) Or visualization software (R, …)
11
We will write SPARQL Queries SPARQL is a “query language” for asking for data from a set of triples SPARQL is fun and easy. But like any computer language, it is precise, in other words, you will get what you ask for.
12
We will use University of Florida Data Why? – Because we can – Because alternatives are lacking (mid to large scale real-world data on scholarship, accessible for training purposes)
13
About UF
14
Get Signed In Each table will share an account and sign in to an account at UF VIVO Navigate to “Site Admin” “SPARQL Query”
15
Prefixes for the names of ontologies Comments describe your query An overly complex sample query Choices regarding output formats Beyond what we will cover today DON’T PUSH THIS BUTTON!
16
Our First Query SELECT ?x WHERE { ?x a vivo:FacultyMember. }
17
Running This Query Delete the query in the window. Keep the prefixes Type the query into the window Look at it carefully Select an output format (RS-TEXT is fine for now) Push Submit
18
Query #2 SELECT ?x ?label WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. }
19
What happened? The good: – We got faculty members and their names – We got the columns in the order on the SELECT statement Not so good: – Some people have more than one name – The names are not in order
20
Query #3 SELECT ?x ?label WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. } GROUP BY ?x
21
What happened? We did not get labels! When we use a GROUP BY, we need to say what we want to have happen to the potentially multiple values of label that might be in the data It’s going to look complex, but you’ll get used to it
22
Query #4 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. } GROUP BY ?x ORDER BY ?label
23
Query #5 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } GROUP BY ?x ORDER BY ?label
24
Let’s get a spreadsheet Select CSV Rerun the query A CSV file downloads Open it in spreadsheet software (Excel or other)
25
We’ve got data! Time to rejoice! Everything is a triple We can write SPARQL queries to select triples and get data
26
A second look at Query #5 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } GROUP BY ?x ORDER BY ?label
27
Variables in Query #5 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } GROUP BY ?x ORDER BY ?label
28
Ontology prefixes Query #5 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } GROUP BY ?x ORDER BY ?label
29
Predicates in Query #5 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } GROUP BY ?x ORDER BY ?label
30
Object Classes in Query #5 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } GROUP BY ?x ORDER BY ?label
31
Statement order in Query #5 SELECT ?x (MIN(?label) AS ?label) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } GROUP BY ?x ORDER BY ?label X must be a FacultyMember AND ?x must have at least one label AND ?x must have at least one label AND X must be a UFCurrentEntity AND X must be a UFCurrentEntity
32
How to count To count things in SPARQL, use the COUNT aggregation Let’s count the faculty
33
Query #6 # Count the UF faculty SELECT (COUNT(DISTINCT ?x) AS ?nfaculty) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. }
34
Inspecting Query #6 # Count the UF faculty SELECT (COUNT(DISTINCT ?x) AS ?nfaculty) WHERE { ?x a vivo:FacultyMember. ?x rdfs:label ?label. ?x a ufVivo:UFCurrentEntity. } Comments begin with # Comments are a good idea Comments begin with # Comments are a good idea COUNT the URI Use DISTINCT to be sure you are counting values of “?x”, not rows in a query result Same triple selected as before No GROUP BY or ORDER BY
35
Break Time!
36
Where were we? Triples: Subject Predicate Object – URI, Ontology prefixes, predicates, class names SPARQL Queries – SELECT, WHERE, GROUP BY, ORDER BY – MIN, COUNT DISTINCT – Triple patterns with variables ending with periods – Comments Thinking about data – Unique URI – Multiple labels
37
How do we know what’s in the ontologies? Several methods: 1.Look at pictures 2.Read the files 3.Search the web 4.Ask friends 5.Reverse engineering This is kind of important. We need to pin this up somewhere
38
Relationship diagrams can be found in the VIVO wiki
39
The diagram tells us: What articles are called What journals are called and how articles and journals are connected What dates are called and how dates and articles are collected How articles are connected to people through Authorships How contact information is arranged And much more It’s a treasure map! Much of what we need to know to work with publications is on display Similar relationship diagrams for grants, positions, education background, advising, membership, courses
40
Query #7: Inspect a pub # Get all the triples for a specified subject URI SELECT ?p ?o WHERE { ?p ?o. }
41
The SPARQL output tell us: The article has nine type assertions UF adds data regarding date of harvest and harvester name, grants cited Also has a datetime, a doi, publication venue, start and end pages, pmid, pmcid, nihmsid, title, volume Has 2 subject area assertions Has 5 “relatedBy” assertions (why?) Has obo:ARG_2000028 (google it) (why?)
42
Now we area ready for recipes We can inspect entities using “reverse engineering” to inspect the triples for any subject We can review the diagrams to understand the relationships between things
43
http://mconlon17.github.io/sparql 28 SPARQL queries tested with VIVO 1.6 on UF VIVO Each is marked with SPARQL level of difficulty Reports, Data Management, People, Papers, Grants, Organizations Real-world examples. Most are the result of questions asked by stakeholders
44
Alpha List of People who have Opted In to VIVO Easy UF has an extension to tag people who have opted in to VIVO. These people are always excluded from various removal processes Query looks like the list of Faculty Members
45
List the Triples with a specified subject URI We’ve done this one already Incredibly useful for discovering the ontology What type of entity is this example inspecting?
46
Count papers by Concept Takes a minute to run Processes 54,000 academic articles
47
Find all the papers with “guideline” in the title New things – We are using InformationResource rather than AcademicArticle. Guideline might be in a report or other type of information resource – We are returning the date of publication Publication has a dateTimeValue dateTimeValue has a dateTime – We are using a FILTER command to select resources that have “guideline” in the title – We are using the DESC modifier on ORDER BY to get the results in descending order by date
48
Find 2014 papers in top journals New features – Date selection – Article has journal, journal has name – FILTER on issn list
49
List papers in college by number of collaborating departments New features – Paper relatedBy authorship – authorShip relates person – UNION – acts as an “or” Person has home department in college of medicine “OR” Person is in a unit that is a part of the college of medicine Unit has name
50
What’s next Continue to explore recipes Can highlight new features Questions
51
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.