Download presentation
Presentation is loading. Please wait.
Published byBathsheba Whitehead Modified over 9 years ago
1
Research Meeting 2009-10-22 Jaeseok Myung
2
Copyright 2009 by CEBT Summary TA DB : project 3, midterm(24 명 응시 ) WEC : report, project (android), classroom, 수업 ( 정재목 이사 ) Research DESWeb 2010 – 1 st International Workshop on Data Engineering meets the Semantic Web in conjunction with ICDE 2010 – Submission : Nov 15 th, 6 pages 논문 개요 작성 LUBM 변환, Complex Query 선정 Center for E-Business Technology
3
Copyright 2009 by CEBT SPARQL Basic Graph Pattern Processing with Iterative MapReduce Abstract In this paper, we propose an iterative MapReduce(MR) algorithm for SPARQL Basic Graph Pattern (BGP). Generally, a BGP may have a lot of self-join in itself, but because of MR’s shared-nothing architecture, it is difficult to process such join operations with MR framework. In other words, an expensive MR iteration is needed for getting a shared join key between two graph patterns. For this reason, we suggest an algorithm which reduces the number of MR iteration, and we examine the algorithm with the Lehigh University Benchmark(LUBM). Our experiments are based on physically separated RDF storage and parallel data processing framework, and the result shows that the algorithm provides scalable access to large RDF data. Center for E-Business Technology
4
Copyright 2009 by CEBT Outline Introduction Related Work BGP Processing with MR MR Iteration (Join 시 MR iteration 발생이유, N-Triple 저장 구조 ) Naïve Approach (Single-Random) Our Approach Multi-Greedy Algorithm Discussion (edge preserving, type 별 performance, key selection) Experiments Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) SPARQL Processing Results (node 개수 변화, 데이터 size 변화 ) Dealing with Intermediate Result ( 중간의 파일 IO 비용 크다, CGL-MR) Conclusion (N-Triple 보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요 ) Reference Center for E-Business Technology
5
Copyright 2009 by CEBT Outline2 Introduction Related Work BGP Processing with MR MR Iteration (Join 시 MR iteration 발생이유, N-Triple 저장 구조 ) Naïve Approach (Single Point –Random Selection) Multi-point Greedy Selection Algorithm Experiments Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) SPARQL Processing Results (node 개수 변화, 데이터 size 변화 ) Discussion Discussion (edge preserving, type 별 performance, key selection) Dealing with Intermediate Result ( 중간의 파일 IO 비용 크다, CGL-MR) Conclusion (N-Triple 보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요 ) Reference Center for E-Business Technology
6
Copyright 2009 by CEBT Introduction (1/2) SPARQL is a recommendation of W3C for querying RDF data RDF 활용을 위해 SPARQL 이 중요하고, BGP 가 SPARQL Pattern matching 의 기본임을 설명 SPARQL BGP Processing is difficult, because BGP may have a significant number of self-joins which is expensive Many researches were conducted with a perspective of single machine triplestore However, for some tasks, we may need multiple machines and federated query processing techniques Center for E-Business Technology
7
Copyright 2009 by CEBT Introduction (2/2) MR is a distributed & parallel data processing framework, which is good at large-scale data analysis Unfortunately, MR has not been considered as the best option for join operations which are inherent in graph pattern matching algorithms heterogeneous 하고 shared-nothing 이기 때문 Some researchers have employed iterative MR, but the iteration is expensive In this paper, we propose an algorithm which reduces the number of MR iteration for BGP Processing The rest of the paper is organized as follow Center for E-Business Technology
8
Copyright 2009 by CEBT Related Work SPARQL Processing BGP, Join (single machine), Triplestore Data Processing with MR Google, Hadoop, Hive, Pig PDBMS vs. MR Federated SPARQL Processing DARQ, YARS2, Virtuoso, … SPARQL Processing with MR is a new approach, but it takes advantage of above researches Center for E-Business Technology
9
Copyright 2009 by CEBT An Example of BGPs 9 ?j1 ?x rdf:type ub:Faculty ?j3 rdf:type ub:Course ub:advisor ub:takesCourse ub:teacherOf ?d1 ub:publicationAuthor rdf:type ub:Chair ?y rdf:type ub:Department ub:worksFor ?m3 ub:subOrgan izationOf ?o1 rdf:type ub:Person ub:hasAl umnus ?l1 ub:subOrgani zationOf ?n1 rdf:type ub:GraduateStudent ub:memberOf ?p1 rdf:type ub:Lecturer ub:teacherOf
10
Copyright 2009 by CEBT Reference 1.M. Stocker et al, SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation, WWW 2008 2.C. Weiss et al, Hexastore: Sextuple Indexing for Semantic Web Data Management, VLDB 2008 3.D. J. Abadi et al, SW-Store: a vertically partitioned DBMS for Semantic Web data management, VLDB Journal 2009 4.T. Neumann et al, Scalable Join Processing on Very Large RDF Graphs, SIGMOD 2009 5.H. Yang et al, Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters, SIGMOD 2007 6.A. Pavlo et al, A Comparison of Approaches to Large-Scale Data Analysis, SIGMOD 2009 7.A. Abouzeid et al, HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads, VLDB 2009 8.C. Olston et al, Pig Latin: A Not-So-Foreign Language for Data Processing, SIGMOD 2008 9.J. Ekanayake et al, MapReduce for Data Intensive Scientific Analyses, ESCIENCE 2008 10.J. Cohen, Graph Twiddling in a MapReduce World, CISE 2009 11.B. Quilitz et al, Querying Distributed RDF Data Sources with SPARQL, ESWC 2008 12.A. Harth et al, YARS2: A Federated Repository for Querying Graph Structured Data from the Web, ISWC 2007 Center for E-Business Technology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.