Linked Data for the Rest of Us Tim Finin, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 12 January 2012

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.

SWIMs: From Structured Summaries to Integrated Knowledge Base

Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,

© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.

Knowledge Graph: Connecting Big Data Semantics

Chapter 3 Querying RDF stores with SPARQL. TL;DR We will want to query large RDF datasets, e.g. LOD SPARQL is the SQL of RDF SPARQL is a language to query.

SmartER Semantic Cloud Sevices Karuna P Joshi University of Maryland, Baltimore County Advisors: Dr. Tim Finin, Dr. Yelena Yesha.

Information and Business Work

Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.

Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.

Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:

Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.

OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR

RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.

Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.

LOD 123: Making the semantic web easier to use Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi.

Querying RDF Data with Text Annotated Graphs Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng SSDBM’15 

Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.

Linked DataTables Automatically Generating Linked Data from Tables Varish Mulwad University of Maryland, Baltimore County November 15, 2011.

Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.

Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County

Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.

SWWG PROJECT OVERVIEW Semantic Technologies for Integrating USGS Data.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.

Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.

BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™

LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

The Semantic Web: there and back again

Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.

T2LD – An automatic framework for extracting, interpreting and representing tables as linked data Varish Mulwad Master’s Thesis Defense Advisor: Dr. Tim.

Government Linked Data Tables Automatically Generating Government Linked Data from Tables Varish Mulwad University of Maryland, Baltimore County.

PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.

Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.

Using linked data to interpret tables Varish Mulwad September 14,

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.

Using linked data to interpret tables Varish Mulwad, Tim Finin, Zareen Syed and Anupam Joshi University of Maryland, Baltimore County November 8, 2010.

Presented By- Shahina Ferdous, Student ID – , Spring 2010.

Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.

A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Chapter 3 Querying RDF stores with SPARQL

CityStateMayorPopulation BaltimoreMDS.C.Rawlings-Blake637,418 SeattleWAM.McGinn617,334 BostonMAT.Menino645,169 RaleighNCC.Meeker405,791 We are laying a.

Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,

DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.

KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,

And the Watson Plugin for the NeOn Toolkit. IST NeOn-project.org The Semantic Web is growing… #SW Pages.

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2

Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore.

Making Software Agents Smarter Tim Finin University of Maryland, Baltimore County ICAART 2010, 22 January 2010

Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

Of 24 lecture 11: ontology – mediation, merging & aligning.

Wikitology Wikipedia as an Ontology

Extracting Semantic Concept Relations

Property consolidation for entity browsing

[jws13] Evaluation of instance matching tools: The experience of OAEI

Chaitali Gupta, Madhusudhan Govindaraju

Presentation transcript:

Linked Data for the Rest of Us Tim Finin, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 12 January

Data sharing Data is important for We can do more if we share, reuse and build on data from others Sharing data also fosters collaboration Access to data is essential for scientific integrity and reproducibility

Barriers to Sharing Data There are many barriers: competition, legal concerns, IP concerns, provenance, trust, privacy, security … I will focus on this one problem… You can’t fully use shared data unless you understand its meaning and can integrate it with your own data The Semantic Web offer solutions based on shared semantic models, Web standards and protocols

Semantic web technologies allow machines to share data and knowledge using common web language and protocols. ~ 1997 Semantic Web Uses Semantic Web Technology to publish shared data & knowledge Semantic Web beginning

2007 Semantic Web => Linked Open Data Uses Semantic Web Technology to publish shared data & knowledge Data is interlinked to support integration and fusion of knowledge LOD beginning

2008 Semantic Web => Linked Open Data Uses Semantic Web Technology to publish shared data & knowledge Data is interlinked to support integration and fusion of knowledge LOD growing

2009 Semantic Web => Linked Open Data Uses Semantic Web Technology to publish shared data & knowledge Data is interlinked to support integration and fusion of knowledge … and growing

Linked Open Data 2010 LOD is the new Cyc: a common source of background knowledge Uses Semantic Web Technology to publish shared data & knowledge Data is interlinked to support integration and fusion of knowledge …growing faster

Linked Open Data 2011: 31B facts in 295 datasets interlinked by 504M assertions on ckan.netckan.net LOD is the new Cyc: a common source of background knowledge Uses Semantic Web Technology to publish shared data & knowledge Data is interlinked to support integration and fusion of knowledge

Exploiting LOD not (yet) Easy Publishing or using LOD data has inherent difficulties for the average scientist It’s difficult to explore LOD data and to query it for answers It’s challenging to publish data using appropriate LOD vocabularies & link it to existing data I’ll describe two ongoing research projects that are addressing these problems

Generating Linked Data by Inferring the Semantics of Tables Generating Linked Data by Inferring the Semantics of Tables Research with Varish Mulwad

Goal: Table => LOD* nalBasketballAssociationTeams Player height in meters dbprop:team * DBpedia

Goal: Table => yago:. is rdfs:label of dbpedia-owl:BasketballPlayer. is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:. is rdfs:label of dbpedia-owl:BasketballPlayer. is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams. RDF Linked Data All this in a completely automated way * DBpedia

Tables are everywhere !! … yet … The web – 154 million high quality relational tables

Evidence–based medicine Figure: Evidence-Based Medicine - the Essential Role of Systematic Reviews, and the Need for Automated Text Mining Tools, IHI 2010 Evidence-based medicine judges the efficacy of treatments or tests by meta-analyses of clinical trials. Key information is often found in tables in articles However, the rate at which meta-analyses are published remains very low … hampers effective health care treatment … # of Clinical trials published in 2008 # of meta analysis published in 2008

~ 400,000 datasets ~ < 1 % in RDF

2010 Preliminary System Class prediction for column: 77% Entity Linking for table cells: 66% Examples of class label prediction results: Column – Nationality Prediction – MilitaryConflict Column – Birth Place Prediction – PopulatedPlace Examples of class label prediction results: Column – Nationality Prediction – MilitaryConflict Column – Birth Place Prediction – PopulatedPlace Predict Class for Columns Linking the table cells Identify and Discover relations T2LD Framework

Sources of Errors The sequential approach let errors perco- late from one phase to the next The system was biased toward predicting overly general classes over more appropriate specific ones Heuristics largely drive the system Although we consider multiple sources of evidence, we did not joint assignment

Current Query Mechanism Michael JordanChicago BullsShooting Guard1.98 {dbpedia-owl:Place,dbpedia- owl:City,yago:WomenArtist,yago :LivingPeople,yago:NationalBask etballAssociationTeams…} Chicago Bulls, Chicago, Judy Chicago … ……… Team possible types possible entitlespossible …

KB m,n,o Joint Assignment/ Inference Model Joint Assignment/ Inference Model KB Domain Knowledge – Linked Data Cloud / Medical Domain / Open Govt. Domain Query Linked Data Domain Independent Framework Generate a set of classes/entities x,y,z a,b,c

A graphical model for tables A graphical model for tables Joint inference over evidence in a table C1 C2C3 R11 R12 R13 R21 R22 R23 R31 R32 R33 Team Chicago Philadelphia Houston San Antonio Class Instance

Parameterized graphical model C1 C2 C3 R11R12R13R21R22R23R31R32 R33 Function that captures the affinity between the column headers and row values Row value Variable Node: Column header Captures interaction between column headers Captures interaction between row values Factor Node

Challenge: Interpreting Literals Population 690, , , ,000 Age Population? Profit in $K ? Age in years? Percent? Many columns have literals, e.g., numbers Predict properties based on cell values Cyc had hand coded rules: humans don’t live past 120 We extract value distributions from LOD resources Differ for subclasses, e.g., age of people vs. political leaders vs. professional athletes Represent as measurements: value + units Metric: possibility/probability of values given distribution

Other Challenges Using table captions and other text is associated documents to provide context Size of some data.gov tables (> 400K rows!) makes using full graphical model impractical – Sample table and run model on the subset Achieving acceptable accuracy may require human input – 100% accuracy unattainable automatically – How best to let humans offer advice and/or correct interpretations?

GoRelations: Intuitive Query System for Linked Data GoRelations: Intuitive Query System for Linked Data Research with Lushan Han

Dbpedia is the Stereotypical LOD DBpedia is an important example of Linked Open Data – Extracts structured data from Infoboxes in Wikipedia – Stores in RDF using custom ontologies Yago terms The major integration point for the entire LOD cloud Explorable as HTML, but harder to query in SPARQL DBpedia

Browsing DBpedia’s Mark Twain

Querying LOD is Much Harder Querying DBpedia requires a lot of a user – Understand the RDF model – Master SPARQL, a formal query language – Understand ontology terms: 320 classes & 1600 properties ! – Know instance URIs (>1M entities !) – Term heterogeneity (Place vs. PopulatedPlace) Querying large LOD sets overwhelming Natural language query systems still a research goal

Goal Allow a user with a basic understanding of RDF to query DBpedia and ultimately distributed LOD collections – To explore what data is in the system – To get answers to question – To create SPARQL queries for reuse or adaptation Desiderata – Easy to learn and to use – Good accuracy (e.g., precision and recall) – Fast

Key Idea Reduce problem complexity by: – Having user enter a simple graph, and – Annotating the nodes and arcs with words and phrases

Semantic Graph Interface Nodes denote entities and links binary relations Entities described by two unrestricted terms: name or value and type or concept Result entities marked with ? and those not with * A compromise between a natural language Q&A system and SPARQL – Users provide compositional structure of the question – Free to use their own terms in annotating the structure

Translation – Step One finding semantically similar ontology terms For each concept or relation in the graph, generate the k most semantically similar candidate ontology classes or properties Similarity metric is distributional similarity, LSA, and WordNet.

Another Example Football players who were born in the same place as their team’s president

To assemble the best interpretation we rely on statistics of the data Primary measure is pointwise mutual information (PMI) between RDF terms in the LOD collection This measures the degree to which two terms occur together in the knowledge base In a reasonable interpretation, ontology terms associate in the way that their corresponding user terms connect in the semantic graph Translation – Step Two disambiguation algorithm

Three aspects are combined to derive an overall goodness measure for each candidate interpretation Translation – Step Two disambiguation algorithm Joint disambiguation Resolving direction Link reason- ableness

Example of Translation result Concepts: Place => Place, Author => Writer, Book => Book Properties: born in => birthPlace, wrote => author (inverse direction)

The translation of a semantic graph query to SPARQL is straightforward given the mappings SPARQL Generation Concepts Place => Place Author => Writer Book => Book Relations born in => birthPlace wrote => author

Preliminary Evaluation 33 test questions from 2011 Workshop on Question Answering over Linked Data answerable using DBpedia Three human subjects unfamiliar with DBpedia translated the test questions into semantic graph queries Compared with two top natural language QA systems: PowerAqua and True Knowledge PowerAquaTrue Knowledge

Challenges Baseline system works well for DBpedia Current challenges we are addressing are – Developing a better Web interface – Allowing user feedback and advice – Adding direct entity matching – Extending to a distributed LOD collection See for more information, Try our alpha version at

Final Conclusions Linked Data is an emerging paradigm for sharing structured and semi-structured data – Backed by machine-understandable semantics – Based on successful Web languages and protocols Generating and exploring Linked Data resources can be challenging – Schemas are large, too many URIs New tools for mapping tables to Linked Data and translating structured natural language queries help reduce the barriers