Fetch VIVO Data via HTTP To use R for VIVO application programming, you will want to get and install the XML Library. 7 This library provides all the tools.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

UF VIVO is intended to be a comprehensive resource for scholarship, scholarly networking, and information about scholarship at the university. Automation.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
XML: Extensible Markup Language
The Semantic Web – WEEK 4: RDF
1 RDF Tutorial. C. Abela RDF Tutorial2 What is RDF? RDF stands for Resource Description Framework It is used for describing resources on the web Makes.
CS570 Artificial Intelligence Semantic Web & Ontology 2
ESDSWG2011 – Semantic Web session Semantic Web Sub-group Session ESDSWG 2011 Meeting – Semantic Web sub-group session Wednesday, November 2, 2011 Norfolk,
RDF Tutorial.
Sara Russell Gonzalez (presenter) Medha Devare, Mike Conlon, VIVO Collaboration ALA June 2010.
 Copyright 2004 Digital Enterprise Research Institute. All rights reserved. SPARQL Query Language for RDF presented by Cristina Feier.
VIVO Cornell: Lessons from the field Kathy Chiang, Jon Corson-Rikert, Elizabeth Hines, Joseph McEnerney Stella Mitchell, Christopher Westling, Tim Worrall.
The Semantic Web. The Web Today Designed for Human to read Cannot express meaning Architecture: URL –Decentralized: Link structure Language: html.
VIVO: Vision for Research Information, 2020 Brussels, September 10, 2012 Michael Conlon, PhD Clinical and Translational Science Institute University of.
Research Discovery, Social Networks and VIVO Chicago, October 8, 2012 Michael Conlon, PhD Clinical and Translational Science Institute University of Florida.
EThority as a Business Intelligence Solution for VIVO Data Mike Conlon, Alicia Turner, Will Collante UF Clinical and Translational Science Institute BackgroundAnalytics.
Monitoring, Modeling & Forecasting Tools for Fostering Innovative S&T Workforce Katy Börner (her PhD student Scott Weingart attended on Oct 5 th ) Cyberinfrastructure.
VIVO A semantic web profiling system that enables collaboration and discovery among scientists across interdisciplinary networks Chin Hua Kong Sr. System.
Ontology Notes are from:
Semantic Web Presented by: Edward Cheng Wayne Choi Tony Deng Peter Kuc-Pittet Anita Yong.
IASSIST Conference, June 2, 2010 Ellen Cramer and Jon Corson-Rikert, Presenters Co-Authors: Nicholas A. Cappadona, Brian Caruso, Valrie Davis, Medha Devare,
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
D. Krafft, V. Davis: Presenters Co-Authors: J. Corson-Rikert, M.Conlon, M. Devare, B.Lowe, B.Caruso, K.Börner, Y.Ding, L.McIntosh; M. Conlon; and VIVO.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
VIVO: Enabling National Networking of Scientists Michael Conlon, PhD Principal Investigator
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
April 5, p.m. VIVO Researcher Networking Update Leslie McIntosh Vivo National Evaluator Washington University Jonathan Corson-Rikert Vivo Development.
Institution Profiling Systems at IU: VIVO et al. Several slides are from a presentation to OVPR in Chin Hua Kong – SLIS Robert Light - SLIS Katy.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Impact Evaluation: Data, Tools, Analysis & Visualization Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization.
Using Scientometrics to Accelerate Science Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School of.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Science of Science Research and Tools Tutorial #11 of 12 Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization.
Researcher Networking, Research Management, and Research Reporting Using VIVO Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Mike Conlon Here’s Mike on a conference call from his home. Mike spends a lot of time on conference calls from his home, and from coffee shops in and around.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Mapping Interactions Within the Evolving Science of Science and Innovation Policy Community Angela M. Zoss and Dr. Katy Börner Cyberinfrastructure for.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida
The VIVO Story: Origins and Future Directions Mike Conlon University of Florida.
Towards a Macroscope for Science Policy Decision Making Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization.
VIVO: An Open Source Tool for Describing, Linking, and Discovering Researchers and Research CNI Workshop on Scholarly Identity April 4, 2012 Dean B. Krafft.
VIVO: Reaching out to faculty in support of a national network of researchers Linda Butson vivo.ufl.edu.
Overview of HTML and XML. Contents n History n Usage n Examples n Advantages n Disadvantages.
Rolando Garcia-Milian Hannah F. Norton, Beth Auten, Valrie I. Davis, Nita Ferree, Kristi L. Holmes, Margeaux Johnson, Nancy Schaefer, Michele R. Tennant,
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
VIVO and Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory, Director School of Library.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
VIVO Team: Cornell University: Dean Krafft (Cornell PI), Manolo Bevia, Jim Blake, Nick Cappadona, Brian Caruso, Jon Corson-Rikert, Elly Cramer, Medha Devare,
SPINNING THE SEMANTIC WEB APPLICATIONS FOR THE MODERN ERA LIBRARIES
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
©Silberschatz, Korth and Sudarshan10.1Database System Concepts W3C - The World Wide Web Consortium W3C - The World Wide Web Consortium.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Faculty Profiling Systems at IU: VIVO et al. Several slides are from a presentation to OVPR in Katy Borner – SLIS Ying Ding - SLIS Robert H. McDonald.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Invitation to Computer Science 6 th Edition Chapter 10 The Tower of Babel.
University of Florida’s dchecker: Software for ensuring semantic data integrity Nicholas Rejack, MS 1, Christopher P. Barnes 1, Michael Conlon, PhD 2
Semantic Web 06 T 0006 YOSHIYUKI Osawa. Problem of current web  limits of search engines Most web pages are only groups of character strings. Most web.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
VIVO Social Network Visualizations
XML: Extensible Markup Language
XML in Web Technologies
Sustaining Networks of Researchers:
Presentation transcript:

Fetch VIVO Data via HTTP To use R for VIVO application programming, you will want to get and install the XML Library. 7 This library provides all the tools you will need to fetch pages and extract values for processing and display in R. To fetch a VIVO page from a URL, execute the one line below: my.rdf<-readLines(url(myurl)) The variable named my.rdf is created and contains the RDF Schema text from the remote VIVO page as shown previously. Using the R Programming Language for VIVO Application Programming The R Programming Language R is an open source, open development computing environment and language for statistical computing and graphics 1. R is popular in biostatistics, bioinformatics, financial market analysis, social network analysis and geospatial modeling. As a programming language, R is expressive and compact with a large collection of powerful functions and tools and operators for data representation, analysis and display. On-line tutorials are available for learning both basic and advanced R programming 2. Some simple examples: x<-5# create an object x and assign # it the integer value 5 myurl<-” # assign text string to url v<-rnorm(1000)# generate 1000 random normal # variates and assign to v hist(v)# draw a histogram of v References 1 R Project Home Page 2 Resources to help you learn and use R 3 Resource Description Framework (RDF) 4 Dean Allemang and Jim Hendler (2008) Semantic Web for the Working Ontologist, Morgan Kaufmann, 352 pp. 5 VIVO Ontology owlhttp://sourceforge.net/projects/vivo/files/Ontology/vivo-core- 1.1.owl 6 RDF Vocabulary Description Language 1.0: RDF Schema Lang, Duncan Temple Tools for parsing and generating XML in R, XML Path Language (Version 1.0) 9 Mark S. Handcock, David R. Hunter, Carter T. Butts, Steven M. Goodreau, and Martina Morris (2003) Software Tools for the Statistical Modeling of Network Data. Version Project home page at URL VIVO Data is RDF VIVO represents all its data using Resource Description Framework (RDF) 3. RDF represents all data as “triples” of the form subject predicate object. Subjects, objects and predicates are represented in an ontology. See a standard text for descriptions of RDF and ontologies. 4 The VIVO ontology describes people participating in research activity, as well as elements common to these people and their activities -- grants, events, projects, publications and more. 5 Mike Conlon, PhD COO UF Clinical and Translational Science Institute, Gainesville, Florida and the VIVO Collaboration* RDF as XML RDF Schema 6 (RDFS) is a description language for RDF represented in Extensible Markup Language (XML). XML is readily processed by application programs. VIVO can present all its data as either Hypertext Markup Language (HTML), for human reading through a browser, or as XML for application programs and tools. The XML produced by VIVO conforms to the RDF Schema standard. For example, the url in the R sample above can be seen below as rendered HTML (left) or as XML/RDF Schema (right). *VIVO Collaboration: Cornell University: Dean Krafft (Cornell PI), Manolo Bevia, Jim Blake, Nick Cappadona, Brian Caruso, Jon Corson-Rikert, Elly Cramer, Medha Devare, Elizabeth Hines, Huda Khan, Brian Lowe, Joseph McEnerney, Holly Mistlebauer, Stella Mitchell, Anup Sawant, Christopher Westling, Tim Worrall, Rebecca Younes. University of Florida: Mike Conlon (VIVO and UF PI), Chris Barnes, Cecilia Botero, Kerry Britt, Erin Brooks, Amy Buhler, Ellie Bushhousen, Linda Butson, Chris Case, Christine Cogar, Valrie Davis, Mary Edwards, Nita Ferree, George Hack, Chris Haines, Sara Henning, Rae Jesano, Margeaux Johnson, Meghan Latorre, Yang Li, Paula Markes, Hannah Norton, Narayan Raum, Alexander Rockwell, Sara Russell Gonzalez, Nancy Schaefer, Dale Scheppler, Nicholas Skaggs, Matthew Tedder, Michele R. Tennant, Alicia Turner, Stephen Williams. Indiana University: Katy Borner (IU PI), Kavitha Chandrasekar, Bin Chen, Shanshan Chen, Jeni Coffey, Suresh Deivasigamani, Ying Ding, Russell Duhon, Jon Dunn, Poornima Gopinath, Julie Hardesty, Brian Keese, Namrata Lele, Micah Linnemeier, Nianli Ma, Robert H. McDonald, Asik Pradhan Gongaju, Mark Price, Yuyin Sun, Chintan Tank, Alan Walsh, Brian Wheeler, Feng Wu, Angela Zoss. Ponce School of Medicine: Richard J. Noel, Jr. (Ponce PI), Ricardo Espada Colon, Damaris Torres Cruz, Michael Vega Negrón. The Scripps Research Institute: Gerald Joyce (Scripps PI), Catherine Dunn, Brant Kelley, Paula King, Angela Murrell, Barbara Noble, Cary Thomas, Michaeleen Trimarchi. Washington University School of Medicine in St. Louis: Rakesh Nagarajan (WUSTL PI), Kristi L. Holmes, Caerie Houchins, George Joseph, Sunita B. Koul, Leslie D. McIntosh. Weill Cornell Medical College: Curtis Cole (Weill PI), Paul Albert, Victor Brodsky, Mark Bronnimann, Adam Cheriff, Oscar Cruz, Dan Dickinson, Richard Hu, Chris Huang, Itay Klaz, Kenneth Lee, Peter Michelini, Grace Migliorisi, John Ruffing, Jason Specland, Tru Tran, Vinay Varughese, Virgil Wong. This project is funded by the National Institutes of Health, U24 RR029822, "VIVO: Enabling National Networking of Scientists". VIVO Applications VIVO applications are software systems using VIVO data. Existing systems such as Drupal or Sakai can be extended to use VIVO data. Here we show simple R programs which consume and display VIVO data. VIVO applications can be written in any computer language capable of accessing web pages and processing RDF. We use R because of its simplicity and display capabilities. VIVO applications “read” VIVO data by fetching VIVO data via HTTP. There is no “application programming interface” (API) nor special VIVO software routines to learn. The format of the VIVO data is published via its ontology. 5 This makes VIVO data far easier to consume in applications and repurpose than systems requiring the use of proprietary APIs. VIVO data is open and accessible to all via simple web page fetch. Create an XML Parse Tree The resulting RDF can be parsed into a tree for further processing. Many objects in VIVO have parent-child relationships. my.tree<-xmlParse(myurl) The variable my.tree is created by fetching the remote page and parsing the XML found there. Use XPath to Extract Data Values A tree can be searched for values satisfying an XPath 8 query. my.nodes<-getNodeSet(my.tree,”//j.2:workPhone”) The matching node(s) are then stripped to get values my.workphone<-xmlValue(my.nodes[[1]]) The variable my.workphone now contains the value “ ” Single and Multiple Values VIVO RDF contains single valued elements and multi-valued elements. The R code shown above is for a single valued response. getNodeSet will return multiple values in an R list structure for further processing. Crawling RDF In some cases, the objects returned by VIVO are RDF URIs for other objects. This is the basis of the semantic web – interlinked references to objects expressed as RDF. Resolving such references can be called “crawling” or “dereferencing.” Consider the organizational structure of a university. Each “org” may have subOrganizations, which are each “orgs”. A uri for the University of Florida in VIVO returns its subOrganizations. Each is an RDF URI for the subOrganization – a college, institute or department. Using R, we can access each organization and recursively process its subOrganizations to generate a complete tree structure for the university as a whole. The code does just that. processOrg returns the entire organizational structure of the university (or any other university with a VIVO URI ). getURI is a helper function for creating URIs from RDF XML attributes. processOrg<-function(uri){ x<-xmlParse(uri) u<-NULL name<-xmlValue(getNodeSet(x,"//rdfs:label")[[1]]) subs<-getNodeSet(x,"//j.1:hasSubOrganization") if(length(subs)==0) list(name=name,subs=NULL) else { for(i in 1:length(subs)){ sub.uri<-getURI(xmlAttrs(subs[[i]])["resource"]) u<-c(u,processOrg(sub.uri)) } list(name=name,subs=u) } Displaying Results Using statnet 9 statnet is an open source suite of packages for R used for network. The organizational structure of the University of Florida is displayed as a directed graph below. The root node is in the center. Directed vertices point to subOrganizations. Next Steps If you are new to programming you will find R a bit difficult. Experienced programmers will find R to be relaxing and powerful. Writing R functions involves a bit of research to find the best functions for the task at hand. The compactness of R makes it easy to read for the experienced R programmer. If you are not an experienced programmer, you may wish to team with someone who is. R is particularly well suited for extracting data, tabulating, reporting and displaying data. The statnet community is adding social network analysis tools. R is less well-suited for interactive applications. Such applications might be written with Web 2.0 front-end tools, while using R for back-end data extraction, processing and graphics generation. The R programming language augmented by the XML tools for data extraction and the statnet tools for social network display and analysis provide a powerful and ready made toolbox for VIVO application programming. Obtaining R, Packages and Code Examples Download installers for R for Windows, Mac or Linux from the R Home Page 1. The installer does the rest. To install the XML and statnet packages, execute the R commands : install.packages("XML", repos = " library(XML) install.packages(“statnet”) library(statnet) All code displayed and used on this poster is available at vivo.sourceforge.net Large clusters represent the College of Medicine, The Institute for Food and Agricultural Sciences, the extension offices, and the College of Liberal Arts and Sciences. The figure was produced using the code above, followed by transformation to a statnet edgelist, then a network object named uf.g. The network object was plotted with the single R function plot(uf.g)