Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan

Slides:



Advertisements
Similar presentations
Computations with Big Image Data Phuong Nguyen Sponsor: NIST 1.
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
University of Minnesota CG_Hadoop: Computational Geometry in MapReduce Ahmed Eldawy* Yuan Li* Mohamed F. Mokbel*$ Ravi Janardan* * Department of Computer.
Spark: Cluster Computing with Working Sets
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
SpatialHadoop:A MapReduce Framework
Mining High Utility Itemset in Big Data
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
MapReduce Algorithm Design Based on Jimmy Lin’s slides
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Image taken from: slideshare
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Presented by: Omar Alqahtani Fall 2016
Big Data is a Big Deal!.
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Distributed Network Traffic Feature Extraction for a Real-time IDS
Pathology Spatial Analysis February 2017
Introduction to MapReduce and Hadoop
Cloud Data Anonymization Using Hadoop Map-Reduce Framework With Qos Evaluation and Behaviour analysis PROJECT GUIDE: Ms.S.Subbulakshmi TEAM MEMBERS: A.Mahalakshmi( ).
Hadoop Clusters Tess Fulkerson.
Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
SpatialHadoop: A MapReduce Framework for Spatial Data
Dynamic Indexing in SpatialHadoop
Selectivity Estimation of Big Spatial Data
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Private and Secure Secret Shared MapReduce
湖南大学-信息科学与工程学院-计算机与科学系
On Spatial Joins in MapReduce
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
CS110: Discussion about Spark
MapReduce.
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.
Declarative Transfer Learning from Deep CNNs at Scale
Yi Wang, Wei Jiang, Gagan Agrawal
Big Data, Bigger Data & Big R Data
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
MapReduce: Simplified Data Processing on Large Clusters
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Distributed Systems and Concurrency: Map Reduce
Map Reduce, Types, Formats and Features
Presentation transcript:

HadoopViz: A MapReduce Framework for Extensible Visualization of Big Spatial Data Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan Presented by Yuanlai Liu

Outline Introduction Related Work Single-Level Visualization Multilevel Visualization Visualization Abstraction Case Study Experiments

Introduction An explosion in the amounts of spatial data Space telescopes: 150GB weekly Medical devices: 50 PB yearly NASA satellite images: 25GB daily Geotagged tweets: 10 Million daily

Introduction The need to visualize big spatial data Provides a bird’s-eye data view Allows users to quickly spot interesting patterns

Introduction HadoopViz It applies a smoothing technique that can fuse nearby records together. e.g. figure 1(b) where missing values are smoothed out. It employs partition-plot-merge approach to scale up to giga-pixel images. e.g. it takes only 90 seconds to visualize the image in Figure 1(b) It proposes a novel visualization abstraction to support dozens of image types e.g. scatter plot, road networks, or brain neurons

Introduction HadoopViz

Related Work Big Data Visualization Ermac, M4, Bin-summarise-smooth None of these techniques apply for spatial data visualization Big Spatial Data Specific problems (range query, spatial join, kNN join) Building systems(Hadoop-GIS, SciDB, SpatialHadoop) none of these systems provide efficient visualization techniques for big spatial data

Related Work SpatialHadoop

Related Work Spatial Data Visualization Single machine solutions focus on how the generated image should look like Not scalable to big data Distributed solutions EarthDB and 3D visualization SHAHED relies on a heavy preprocessing phase No giga-pixel images, No extensibility

Related Work Big Spatial Data Visualization HadoopViz Generates giga-pixel images Extensible to new visualization types Support Single-level and Multilevel Visualization

Single-Level Visualization Three phase approach: partition-plot-merge the partitioning phase splits the input into m partitions the plotting phase plots a partial image for each partition the merging phase combines the partial images into one final image

Single-Level Visualization Two algorithms that use this three phase approach Default-Hadoop Partitioning Spatial Partitioning

Single-Level Visualization Default-Hadoop partitioning partitioning: default HDFS 128MB plotting: each mapper generates a partial image Ci for each partition Pi merging: merge all intermediate matrices Ci, in parallel, into one final matrix Cf and writes it as an output image

Single-Level Visualization Spatial Partitioning partitioning: spatial partitioning plotting: each reducer generate one partial image Ci merging: merges the intermediate matrices Ci into one big matrix by stitching them together

Single-Level Visualization Default-Hadoop Partitioning VS Spatial Partitioning

Single-Level Visualization Default-Hadoop Partitioning VS Spatial Partitioning need smooth image -> Spatial Partitioning tradeoff between the partitioning and merging phases Default-Hadoop Partitioning zero-overhead partitioning phase expensive overlay merging phase Spatial Partitioning pays an overhead in spatial partitioning more efficient stitching technique in merging phase

Single-Level Visualization Default-Hadoop Partitioning VS Spatial Partitioning

Multilevel Visualization partition-plot-merge Goal: Generate gigapixel multilevel images where users can zoom in/out to see more/less details in the generated image. e.g. If z=10: pixels at level 10 = 410*(256*256)/230=64GB

Multilevel Visualization Two algorithms that use this three phase approach Default-Hadoop Partitioning Coarse-grained Pyramid Partitioning

Multilevel Visualization Default-Hadoop Partitioning partitioning: default HDFS 128MB plotting: Mapper plots each record in the assigned partition Pi to all overlapping tiles in the pyramid merging: Reducer merge partial pyramids into a final pyramid

Multilevel Visualization Coarse-grained Pyramid Partitioning partitioning: Mapper assigns each record p to select tiles, reduces overhead using k (create partitions for tiles only in levels that are multiples of k) plotting: Plot an image for each tile merging: Do nothing

Multilevel Visualization Default-Hadoop Partitioning VS Coarse-grained Pyramid Partitioning Default-Hadoop Partitioning avoids the overhead of partitioning small pyramid size -> minimal plot & merge overhead generate the top levels Coarse-grained Pyramid Partitioning lowever plot and no merge overhead generate the remaining deeper levels

Multilevel Visualization Default-Hadoop Partitioning VS Coarse-grained Pyramid Partitioning

Visualization Abstraction HadoopViz is an extensible framework that supports a wide range of visualization for various image types. User needs to define five abstract functions smooth create-canvas plot merge write

Visualization Abstraction Overview

Visualization Abstraction The Smooth abstract function optional HadoopViz tests for the existence of this function to decide whether to go for spatial or default partitioning e.g.

Visualization Abstraction The Create-Canvas abstract function creates and initializes an in-memory data structure will be used to create the requested image is used in both the plotting and merging phases The Plot abstract function the plotting phase calls this function for each record in the partition to draw the partial images can call any third party visualization package, e.g. VisIt and ImageMagick

Visualization Abstraction The Merge abstract function The merging phase calls this function successively on a set of layers to merge them into one The Write abstract function writes the final canvas to the output in a standard image format (e.g., PNG or SVG)

Case Studies Six case studies case studies I and II: non-aggregate visualization, w/ & w/o smoothing case studies III and IV: aggregate-based visualization case study V: generating a vector image with a smoothing function case study VI: reuse and scale out an existing package(ImageMagick)

Experiements Deployed on an Amazon EC2 cluster of 20 nodes Intel(R) Xeon E5472 processor with 4 cores @3 GHz 8GB of memory 250GB hard disk Baseline is a single machine with 1TB RAM Real datasets: OpenStreetMap(OSM): Up-to 1.7 billion points NASA: 14 billion points Measure the end-to-end time for generating the image

Experiements Single-Level Visualization

Experiements Multilevel Visualization

Experiements Multilevel Visualization

Thanks & Question

Experiements Single-Level Visualization

Experiements Single-Level Visualization

Experiements Multilevel Visualization

Thanks & Question