Download presentation
Presentation is loading. Please wait.
Published byEugene Neal Modified over 8 years ago
1
Efficient Processing of Semantic Information on the Web Georg Lausen Technische Fakultät Universität Freiburg
2
The amount of available information on Web still is increasing rapidly. (Semi-)Automatic Data Extraction. Resource Description Framework (RDF). SPARQL is the standard query language for RDF. Efficiency and Scalability of query processing. Processing of Semantic Information on the Web
3
Efficiency and Scalability: A Variety of Approaches Single machine RDF stores Parallel Database Approach: Vertica and others Approaches based on Hadoop (MapReduce Paradigm) – Hadoop – Hadoop++ – Integration of databases: HadoopDB – Language translation Mapping SPARQL to Hadoop/HBase directly Mapping SPARQL to Pig Latin Non Hadoop clusters
4
Cluster-based Parallelism vs Parallel Database/Single Machine RDF-Store Each technology has its own advantages and problems. Rough characterization: QueryingLoading Parallel Database / Single Machine RDF-Store +- Cluster-based Parallelism -+ Loading in the context of Web research: Extract Transform Load schema. SPARQL provides a declarative way for specifying the transformation and querying.
5
ETL and Querying in the context of Web research Web documentsInitial RDF graphRDF store E L T Efficient Loading Efficient querying SPARQL PigSPARQL: Mapping SPARQL to PigLatin; to appear Semantic Web Information Management – SWIM 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.