Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.

Slides:



Advertisements
Similar presentations
Efficient Selection & Integration of Data Sources Abir Qasem 1, Dimitre Dimitrov 2, Jeff Heflin 1 1 Lehigh University 2 Tech-X Corporation 11/11/07 for.
Advertisements

Heuristic Search techniques
1 ISWC-2003 Sanibel Island, FL IMG, University of Manchester Jeff Z. Pan 1 and Ian Horrocks 1,2 {pan | 1 Information Management.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
WIMS 2014, June 2-4Thessaloniki, Greece1 Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact:
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Submission doc.: IEEE /1214r1 September 2014 Leif Wilhelmsson, Ericsson ABSlide 1 Impact of correlated shadowing in ax system evaluations.
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
WIMS 2011, Sogndal, Norway1 Comparison of Ontology Reasoning Systems Using Custom Rules Hui Shi, Kurt Maly, Steven Zeil, and Mohammad Zubair Contact:
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Xyleme A Dynamic Warehouse for XML Data of the Web.
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms BNAIC 2009 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Semantic Web The Story So Far Ian Horrocks Oxford University Computing Laboratory.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
Ming Fang 6/12/2009. Outlines  Classical logics  Introduction to DL  Syntax of DL  Semantics of DL  KR in DL  Reasoning in DL  Applications.
11111 Benchmarking in KW. Sep 10th, 2004 © R. García-Castro, A. Gómez-Pérez Raúl García-Castro, Asunción Gómez-Pérez September 10th, 2004 Benchmarking.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Querying Structured Text in an XML Database By Xuemei Luo.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Ontology Summit 2015 Track C Report-back Summit Synthesis Session 1, 19 Feb 2015.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Comparison of BaseVISor, Jena and Jess Rule Engines Jakub Moskal, Northeastern University Chris Matheus, Vistology, Inc.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Algorithmic Detection of Semantic Similarity WWW 2005.
Aligner automatiquement des ontologies avec Tuesday 23 rd of January, 2007 Rapha ë l Troncy.
CSE 428 Semantic Web Topics Introduction Jeff Heflin Lehigh University.
Bigscholar 2014, April 8, Seoul, South Korea1 Trust and Hybrid Reasoning for Ontological Knowledge Bases Hui Shi, Kurt Maly, and Steven Zeil Contact:
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Introduction to the Semantic Web Jeff Heflin Lehigh University.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Benchmarking Matching Applications on the Semantic Web.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
A Place-based Model for the Internet Topology Xiaotao Cai Victor T.-S. Shi William Perrizo NDSU {Xiaotao.cai, Victor.shi,
Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.
Overview of probability and statistics
Probabilistic Data Management
Associative Query Answering via Query Feature Similarity
Ontology.
ece 720 intelligent web: ontology and beyond
Information Networks: State of the Art
A framework for ontology Learning FROM Big Data
Presentation transcript:

Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007

Talk Organization Motivation ( a.k.a. why yet another benchmark? ) and Influences The Workload Domain Ontologies, map ontologies, data sources, queries The Metrics How do we generate things? Domain ontology generation Map ontology Generation Parameters & Relationships Map Generator Algorithm Data Source Generation Query Generation Sample Workload Conclusion & Future Work

Motivation As the Semantic Web matures … OWL Ontologies and data from various organizations will gain commercial value Alignment of different ontologies and integration of data that commit to them will be a viable business enterprise Quite possibly we will have post development alignments between ontologies (Alignment tools, third parties etc.) Currently DBPedia, Hawkeye provides some form of third party alignments (non commercial) We wanted to develop a benchmark that reflects the above reality

Influences  Lehigh University Benchmark (LUBM) by Y. Guo, Z. Pan, and J. Heflin. (ISWC 2004)  Extended LUBM (can support both OWL Lite and OWL DL) by L. Ma, Y. Yang, Z. Qiu, G, Xie and Y. Pan. (ESWC 2006)  Statistical Analysis of the available Semantic Web ontologies by Tempich, C. and Volz, R. (ISWC 2003)  Benchmarking DL systems by I. Horrocks and P. Patel- Schneider. (DL Workshop 1998)  Internet topology generator by J. Winick and S. Jamin. (University of Michigan)

The Workload (1) Domain ontologies “Simple” ontologies. We can control number of classes, properties, and branching factor of the hierarchies Data sources We can control number of data sources that commit to a given ontology, number of classes that will have individuals, number of properties that will connect those individuals, number of triples. Queries Extensional queries in SPARQL. We can control the mix of classes, properties, individuals We can control selectivity

The Workload (2) Map ontologies: Main focus of this work In our work a map ontology consists solely of “mapping” axioms that establish alignment between two domain ontologies This is just for convenience of generation and analysis. Semantically they are not much different from the domain ontologies Macro level: We generate Directed acyclic graph of domain ontologies Every edge represents a map ontology Micro level: We can control the type of axioms that are used to map two domain ontologies

Metrics Systems with Centralized Approach Systems with Distributed Approach Initialization TimeTime taken to Load the knowledge base Time taken to read the index (e.g. meta-data) Query Response Time Reasoning timeLoad Time + reasoning time Query Completeness Consider queries that entail at least one answer. In determining the relative completeness of queries against a reference set. Repository Size Number of triplesN/A

Domain Ontology Generation Simple taxonomy The number to generate vary in a normal distribution with a user supplied value for the mean Given a branching factor and number of terms we generate a balanced tree Complex axioms are left for map ontologies

Map Ontology Generation Inputs  No. of Ontologies we want in the workload  Average Out-degree (referred to as out below)  Diameter The number of maps created is approximately equal to -  maps ~(total onts-terminal onts)* out However we do not have terminal onts as a parameter A reasonable approximation is Terminal ontologies ~ (onts*out)/(diameter+out) Thus we have Number of maps ~ (onts*out*diameter)/(diameter+out)

Map Generator Algorithm 1.Determine and mark the number of terminal nodes 2.Create a path of diameter length 3.Choose targets for every non-terminal ontology. Constraints: a. No Cycles b. No path greater than diameter c. Non-terminal nodes should not become terminal Create the corresponding map ontologies by generating mapping axioms 4. Update the parameters of the source and the target

Mapping axioms Given two domain ontologies and a desired distribution of OWL constructors and restrictions We choose terms from the domain ontologies and create an axiom that connects them We can generate fairly complex axioms E.g. O1:A ⊔ O1:B ⊑ ∃ O2:P.O2:C ⊓ ∀ O2:Q.O2:D Currently the algorithm is restricted to generating axioms that will keep the ontology to OWLII (a subset of OWL used by OBII, Qasem et al. 2007, ISWC NFR workshop) But this is NOT a limitation of our approach

Source Generation Choose an ontology Choose number of classes to create individuals Generate triples We can either generate random individuals or Use the domain and range information to connect the individuals with properties

Query Generation SPARQL Queries (SELECT) 1. Choose the first predicate from the classes of an ontology. 2. We bias the next predicate with a 75% chance of being one of the properties from the ontology. 3. We make use of shared variables in order to implement “joins”. A shared variable is equally likely to be in the subject as well as the object position. 4. For single predicate queries all the variables are distinguished. For others, on an average 2/3 rd of the variables are distinguished and the rest are non- distinguished. 5. There exists a 10% chance for a constant.

A Sample Workload We used the benchmark to evaluate OBII – a distributed query answering system We compared it with a “baseline” system which was essentially a KAON2 wrapper Some characteristics of the workload 50% of classes had individuals On an average we generated 75 triples in a source Generated configurations as large as 100 domain ontologies with about 1000 data sources

Conclusion and Future Work  A focus on workload that accounts for post development alignments  Micro level - controlling mapping axioms  Macro level - controlling how ontologies are mapped  Domain ontologies synthesis can be expanded to support complex axioms  Experiment with different characteristics  Hubs and Authorities (different in-degree / out-degree pattern)