Selectivity Estimation Example

Slides:



Advertisements
Similar presentations
Selectivity Estimation Example Mohammad Farhan Husain.
Advertisements

Topic 3 The Normal Distribution. From Histogram to Density Curve 2 We used histogram in Topic 2 to describe the overall pattern (shape, center, and spread)
SPARQL for Querying PML Data Jitin Arora. Overview SPARQL: Query Language for RDF Graphs W3C Recommendation since 15 January 2008 Outline: Basic Concepts.
Improved Approximation Bounds for Planar Point Pattern Matching (under rigid motions) Minkyoung Cho Department of Computer Science University of Maryland.
Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.
Graph Data Management Lab, School of Computer Scalable SPARQL Querying of Large RDF Graphs Xu Bo
Section 11.2 Systems of Linear Equations
LINKED DATA AS TRANSFORMATION Philip E. Schreur Stanford University Coalition for Networked Information April 3, 2012 Philip Schreur/Stanford University.
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Lesley Charles November 23, 2009.
3.4 Linear Programming p Optimization - Finding the minimum or maximum value of some quantity. Linear programming is a form of optimization where.
Projectile Motion Students will be able to match up projectile motion graphs with the correct verbal description.
Random Sampling Approximations of E(X), p.m.f, and p.d.f.
Alg 2 - Chapter 3 Jeopardy Solving Systems by Graphing Solving Systems Algebraically Graph & Solving Systems of Linear Inequalities Linear Programming.
Ontology based e-Real Estate Agency Information System By Moein Mehrolhasani Bijan Zamanian cmpe 588.
Formal Semantics Purpose: formalize correct reasoning.
网上报账系统包括以下业务: 日常报销 差旅费报销 借款业务 1. 填写报销内容 2. 选择支付方式 (或冲销借款) 3. 提交预约单 4. 打印预约单并同分类粘 贴好的发票一起送至财务 处 预约报销步骤: 网上报账系统 薪酬发放管理系统 财务查询系统 1.
Bin Packing Algorithms. Bin Packing Consider a set of bins, all the same cross section and height. The bin packing problem is to pack into the bins a.
Dr. Alexandra I. Cristea SPARQL Exerciess.
. Suppose I had a protractor and I wanted to determine the uncertainty of a measurement I made with it. If I measured an angle to be 30 degrees and noticed.
October 4, 2016Theory of Computation Lecture 9: A Universal Program I 1Minimalization Example 15: R(x, y) R(x, y) is the remainder when x is divided by.
Dr. Mohammad Farhan Husain (Amazan; Facebook)
Introduction Chapter 0.
Keyword Search over RDF Graphs
Objective The learner will solve problems using formulas
Basics of histograms and frequency tables
Probabilistic Data Management
Counting Sets.
ASSIGNMENT NO.-2.
Twenty Questions Subject:.
HP Printer Number Call Here :
How To Get Good Grades in Economics Assignment Without Working Hard on it
www 123 hp com setup 8710 Call Here:
www 123 hp com setup 8710 Call Here:
www 123 hp com setup 8710 Call Here:
www 123 hp com setup 8710 Call Here:
Logics for Data and Knowledge Representation
Use proper case (ie Caps for the beginnings of words)
CONCEPTS OF ESTIMATION
Using a histogram to estimate the median
Using a histogram to estimate the median
How to find the area of a parallelogram.
B. Complete Subject The complete subject tells us who or what the sentence is about. Example: All people change their environment All people is the complete.
Equation Review Given in class 10/4/13.
Lu Xing CS59000GDM Sept 7th, 2018.
G-CORE: A Core for Future Graph Query Languages
Sungho Kang Yonsei University
Суури мэдлэг Basic Knowledge
Chapter 5. The Duality Theorem
Introduction Chapter 0.
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Twenty Questions Subject:.
A set of 3 whole numbers that satisfy the equation
STORE MANAGER RESPONSIBILITIES.
7.1 – Functions of Several Variables
C. Faloutsos Query Optimization – part 2
Unit 30 Functions Presentation 1 Functions, Mappings and Domains 1
Twenty Questions Subject:.
Equation Review.
Twenty Questions Subject:.
Twenty Questions Subject:.
Twenty Questions Subject:.
Twenty Questions Subject:.
Twenty Questions Subject:.
Switching Lemmas and Proof Complexity
Unit 3 Section 3A: Standard Form
X ⦁ X = 64 ±8 ±14 X ⦁ X ⦁ X =
Twenty Questions Subject:.
Twenty Questions Subject:.
Presentation transcript:

Selectivity Estimation Example Mohammad Farhan Husain

Example Data Subject Predicate Object R1 P1 L1 R2 L2 R3 R4 R5 R6 L3 R7 R1, R2, … , R8 are resources i.e. URIs P1 and P2 are predicates, also URIs L1, L2, … , L5 are literals R = Total number of unique resources = 8 T = Total number of triples = 8 TP1 = Total number of triples having predicate P1 = 5 TP2 = Total number of triples having predicate P2 = 3 For any query: Selectivity of a bound subject s = sel(s) = 1 / R = 1 / 8 = 0.125 Selectivity of predicate P1 = sel(P1) = TP1 / T = 5 / 8 = 0.625 Selectivity of predicate P2 = sel(P2) = TP2 / T = 3 / 8 = 0.375 Selectivity of unbound subject and predicate and object = 1.0

Example Histogram for P1 Suppose there is a hash function which assigns the object values of triples having predicate P1 in two bins in the following manner: Bin 1 contains: L1, L2 and R2 Bin 2 contains: R4 and L3

Example Histogram for P2 Suppose the same hash function assigns the object values of triples having predicate P2 in two bins in the following manner: Bin 1 contains: L5 Bin 2 contains: L4 and R1

Estimation Approach Equation Notes sel(t) = sel(s) * sel(p) * sel(o) t refers to a triple pattern sel(s) = 1/R R - No. of unique Resources in knowledge store sel(p) = Tp/T T – Total No. of triples, Tp – Triples matching predicate p sel(o) = hc(p,o)/Tp where hc(p,o) represents the height of histogram bin containing predicate p in which object o falls sel(?a) = 1 when ?a is unbound subject, predicate, or object

Selectivity Estimation for Triple Pattern Example with Bound Predicate Triple Pattern: ?s P1 L2 Estimated selectivity = sel(s) x sel(P1) x sel(L2) = 1.0 x 0.625 x sel(P1, L2) = 1.0 x 0.625 x (h1(P1, L2) / TP1) = 1.0 x 0.625 x (Height of Bin 1 / TP1) = 1.0 x 0.625 x (3 / 5) = 0.375 Here, h1(P1, L2) denotes the bin of the histogram of predicate P1 where the hash function puts L2 in.

Selectivity Estimation for Triple Pattern Example with Unbound Predicate Triple Pattern: ?s ?p L2 Estimated selectivity = sel(s) x sel(p) x sel(L2) = 1.0 x 1.0 x {∑Pi ϵ P sel(Pi, L2)} = 1.0 x 1.0 x {sel(P1, L2) + sel(P2, L2)} = 1.0 x 1.0 x {h1(P1, L2) / TP1 + h1(P2, L2) / TP2} = 1.0 x 1.0 x {Height of Bin 1 of P1 Histogram / TP1 + Height of Bin 1 of P2 Histogram / TP2} = 1.0 x 1.0 x {3 / 5 + 1 / 3} = 0.933 Note that the hash function always puts the value L2 into bin 1. That is why we pick the height of Bin 1 of the histogram for P2 even though P2 does not have the value L2 as its object in any of the triples.

Selectivity Estimation for Triple Pattern Example with Unbound Object Triple Pattern: ?s P1 ?o Estimated selectivity = sel(s) x sel(P1) x sel(o) = 1.0 x 0.625 x 1.0 = 0.625