Ilias Tachmazidis 1,2, Grigoris Antoniou 1,2,3, Giorgos Flouris 2, Spyros Kotoulas 4 1 University of Crete 2 Foundation for Research and Technology, Hellas.

Slides:



Advertisements
Similar presentations
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
HDFS & MapReduce Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
Tutorial at WWW 2011, Distributed reasoning: because size matters Andreas Harth, Aidan Hogan, Spyros Kotoulas,
The International RuleML Symposium on Rule Interchange and Applications Local and Distributed Defeasible Reasoning in Multi-Context Systems Antonis Bikakis,
Spark: Cluster Computing with Working Sets
Tutorial at ISWC 2011, Distributed reasoning: because size matters Andreas Harth, Aidan Hogan, Spyros Kotoulas,
Proof System HY-566. Proof layer Next layer of SW is logic and proof layers. – allow the user to state any logical principles, – computer can to infer.
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
HADOOP ADMIN: Session -2
Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
An Architecture for Distributed High Performance Video Processing in the Cloud 作者 :Pereira, R.; Azambuja, M.; Breitman, K.; Endler, M. 出處 :2010 IEEE 3rd.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Scalable Distributed Reasoning Using MapReduce Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen Department of Computer Science, Vrije.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
The International RuleML Symposium on Rule Interchange and Applications Visualization of Proofs in Defeasible Logic Ioannis Avguleas 1, Katerina Gkirtzou.
An argument-based framework to model an agent's beliefs in a dynamic environment Marcela Capobianco Carlos I. Chesñevar Guillermo R. Simari Dept. of Computer.
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
LARGE-SCALE SEMANTIC WEB REASONING Grigoris Antoniou and Ilias Tachmazidis University of Huddersfield.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Big Data is a Big Deal!.
MapReduce Compiler RHadoop
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Hadoop Aakash Kag What Why How 1.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Spark.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Edinburgh Napier University
Map Reduce.
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Ministry of Higher Education
Very VERY large scale knowledge representation In collaboration with:
Cloud Distributed Computing Environment Hadoop
Big Data - in Performance Engineering
湖南大学-信息科学与工程学院-计算机与科学系
On Spatial Joins in MapReduce
Cse 344 May 4th – Map/Reduce.
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
Distributed System Gang Wu Spring,2018.
CS639: Data Management for Data Science
MapReduce: Simplified Data Processing on Large Clusters
CS639: Data Management for Data Science
Presentation transcript:

Ilias Tachmazidis 1,2, Grigoris Antoniou 1,2,3, Giorgos Flouris 2, Spyros Kotoulas 4 1 University of Crete 2 Foundation for Research and Technology, Hellas (FORTH) 3 University of Huddersfield 4 Smarter Cities Technology Centre, IBM Research, Ireland

 Motivation  Background ◦ Defeasible Logic ◦ MapReduce Framework  Multi-Argument Implementation over RDF  Experimental Evaluation  Future Work

 Linked Datasets ◦ Huge ◦ Doubtable quality data, difficult to predict consequences of inference  Defeasible logic ◦ Intuitive  is suitable for encoding commonsense knowledge and reasoning  avoids triviality of inference due to low-quality data ◦ Low complexity  The consequences of a defeasible theory D can be computed in O(N) time, where N is the number of symbols in D

 State-of-the-art ◦ Defeasible logic has been implemented for in- memory reasoning, however, it was not applicable for huge data sets  Solution: scalability/parallelization using the MapReduce framework

 Facts ◦ e.g. bird(eagle)  Strict Rules ◦ e.g. bird(X)  animal(X)  Defeasible Rules ◦ e.g. bird(X)  flies(X)

 Defeaters ◦ e.g. brokenWing(X) ↝ ¬ flies(X)  Priority Relation (acyclic relation on the set of rules) ◦ e.g.r: bird(X)  flies(X) r’: brokenWing(X)  ¬ flies(X) r’ > r

 Inspired by similar primitives in LISP and other functional languages  Operates exclusively on pairs  Input and Output types of a MapReduce job: ◦ Input: ◦ Map(k1,v1) → list(k2,v2) ◦ Reduce(k2, list (v2)) → list(k3,v3) ◦ Output: list(k3,v3)

 Provides an infrastructure that takes care of ◦ distribution of data ◦ management of fault tolerance ◦ results collection  For a specific problem ◦ developer writes a few routines which are following the general interface

 Rule sets can be divided into two categories: ◦ Stratified ◦ Non-stratified  Predicate Dependency Graph

 Consider the following rule set: ◦ r1: X sentApplication A, A completeFor D  X acceptedBy D. ◦ r2: X hasCerticate C, C notValidFor D  X ¬acceptedBy D. ◦ r3: X acceptedBy D, D subOrganizationOf U  X studentOfUniversity U. ◦ r1 > r2.  Both acceptedBy and ¬ acceptedBy are represented by acceptedBy  Superiority relation is not part of the graph

 Initial pass: ◦ Transform facts into pairs  No reasoning needs to be performed for the lowest stratum (stratum 0)  For each stratum from 1 to N ◦ Pass1: Calculate fired rules ◦ Pass2: Perform defeasible reasoning

INPUT Literals in multiple files File File > MAP phase Input

Grouping/Sorting <App, <(John,sentApplication,+Δ,+  ), (Dep,completeFor,+Δ,+  )>> <Cert,<(John,hasCerticate,+Δ,+  ), (Dep,notValidFor,+Δ,+  )>> Reduce phase Input <matchingArgValue, List (Non-MatchingArgValue, Predicate, knowledge)> MAP phase Output <matchingArgValue, (Non-MatchingArgValue, Predicate, knowledge)>

Reduce phase Output (Final Output)

INPUT Literals in multiple files File File MAP phase Input >

MAP phase Output Grouping/Sorting > Reduce phase Input

Reduce phase Output (Final Output) No output

 LUBM (up to 1B)  Custom defeasible ruleset  IBM Hadoop Cluster v1.3 (Apache Hadoop )  40-core server  XIV storage SAN

Challenges of Non-Stratified Rule Sets  An efficient mechanism is need for –Δ and -  ◦ all the available information for the literal must be processed by a single node causing:  main memory insufficiency  skewed load balancing  Storing conclusions for +/–Δ and +/-  is not feasible ◦ Consider the cartesian product of X, Y, Z for X Predicate1 Y, Y Predicate2 Z.

 Run extensive experiments to test the efficiency of multi-argument defeasible logic  Applications on real datasets, with low- quality data  More complex knowledge representation methods such as: ◦ Answer-Set programming ◦ Ontology evolution, diagnosis and repair  AI Planning