P ARALLEL A NALYSIS OF E GG D ATA WITH HADOOP ON FUTUREGRID Project Member: Rewati Ovalekar Project Guide : Gregor von Laszweski, Lizhe Wang.

Slides:

Advertisements

Similar presentations

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.

Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

MapReduce Online Veli Hasanov Fatih University.

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

Ilias Tachmazidis 1,2, Grigoris Antoniou 1,2,3, Giorgos Flouris 2, Spyros Kotoulas 4 1 University of Crete 2 Foundation for Research and Technology, Hellas.

Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.

Large-Scale Machine Learning Program For Energy Prediction CEI Smart Grid Wei Yin.

IMapReduce: A Distributed Computing Framework for Iterative Computation Yanfeng Zhang, Northeastern University, China Qixin Gao, Northeastern University,

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

Adaptive Signal Processing Class Project Adaptive Interacting Multiple Model Technique for Tracking Maneuvering Targets Viji Paul, Sahay Shishir Brijendra,

Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.

An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters Rainer Schmidt and Matthias Rella Speaker: Lin-You Wu.

Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read

Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve ， Devendra Dahiphale ， Amit Chhajer 報告 : 饒展榕.

Independent Component Analysis (ICA) A parallel approach.

CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.

Mining High Utility Itemset in Big Data

A Hierarchical MapReduce Framework Yuan Luo and Beth Plale School of Informatics and Computing, Indiana University Data To Insight Center, Indiana University.

Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.

Introducing MapReduce to High End Computing Grant Mackey, Julio Lopez, Saba Sehrish, John Bent, Salman Habib, Jun Wang University of Central Florida, Carnegie.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

TUH EEG Corpus Data Analysis 38,437 files from the Corpus were analyzed. 3,738 of these EEGs do not contain the proper channel assignments specified in.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.

Grid Appliance The World of Virtual Resource Sharing Group # 14 Dhairya Gala Priyank Shah.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.

CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

MapReduce Basics Chapter 2 Lin and Dyer & /tutorial/

MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.

A Study in Hadoop Streaming with Matlab for NMR data processing Kalpa Gunaratna1, Paul Anderson2, Ajith Ranabahu1 and Amit Sheth1 1Ohio Center of Excellence.

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

Matrix Multiplication in Hadoop

Artificial Neural Networks By: Steve Kidos. Outline Artificial Neural Networks: An Introduction Frank Rosenblatt’s Perceptron Multi-layer Perceptron Dot.

Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )

Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.

Big Data is a Big Deal!.

MapReduce “MapReduce allows us to stop thinking about fault tolerance.” Cathy O’Neil & Rachel Schutt, 2013.

Sushant Ahuja, Cassio Cristovao, Sameep Mohta

Hadoop Aakash Kag What Why How 1.

Hadoop MapReduce Framework

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Database Applications (15-415) Hadoop Lecture 26, April 19, 2016

Applying Twister to Scientific Applications

Cloud Distributed Computing Environment Hadoop

CS110: Discussion about Spark

Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &

KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner

VI-SEEM data analysis service

Group 15 Swathi Gurram Prajakta Purohit

CS639: Data Management for Data Science

MapReduce: Simplified Data Processing on Large Clusters

Map Reduce, Types, Formats and Features

Presentation transcript:

P ARALLEL A NALYSIS OF E GG D ATA WITH HADOOP ON FUTUREGRID Project Member: Rewati Ovalekar Project Guide : Gregor von Laszweski, Lizhe Wang

BACKGROUND Importance of EEG Data: Used in detecting and diagnosing brain related dis-orders EEMD algorithm is developed to analyze the signals

Drawbacks of EEG Data: EEG signals are complex in nature Analysis of EEG signals are highly data- intensive and compute- intensive Basic EEMD algorithm not time-efficient

P ARALLEL EEMD FOR EEG ANALYSIS EEMD algorithm was modified to analyze data points in parallel Multiple levels: Epoch Level Trial Level Data Channel Level

Epoch Level: Single data point is considered and is processed at each level. The output from this instance is not consumed by another.

Trial Level Each Epoch can be split into number of trials. Decomposition of each trail is performed independently. All trials for a particular epoch are combined to get an output for each epoch.

Data Channel Level Data is parallelized at each channel, then the output is combined for its corresponding trial. The grain of parallelization is coarse at this level.

M ULTI - THREAD DESIGN Each thread will process EEG data point for a particular Epoch –level. Local extrema will be calculated at each level. All local maxima and minima will be connected using cubic spline

M ULTI - THREAD DESIGN

L IMITATIONS OF MULTI - THREADED DESIGN Cannot process huge data due to availability of limited resources on a local machine S OLUTION : Develop Parallel EEMD algorithm using MapReduce on Hadoop

Why Hadoop? Hadoop provides a distributed framework to run applications on large cluster MapReduce is used to implement the parallel EEMD algorithm

M APREDUCE DESIGN : ( EPOCH LEVEL PARALLELIZATION ) Epoch Mapper: Each map function will take input as single point Calculate local extrema at each epoch level Connect minima and maxima by cubic spline Generate points which will be combined in Epoch Reducer

M APREDUCE DESIGN : ( EPOCH LEVEL PARALLELIZATION ) Epoch Reducer: Each reduce function will combine the points having the same egg data point Generates data points, 8 IMF and one left data for an individual eeg data point

M APREDUCE DESIGN : ( EPOCH LEVEL PARALLELIZATION )

P ERFORMANCE ANALYSIS OF ORIGINAL ALGORITHM

P ERFORMANCE ANALYSIS OF EEMD ALGORITHM ON HADOOP Analyzed for the same data-set by changing the number of nodes to be considered in a cluster

P ERFORMANCE ANALYSIS OF EEMD ALGORITHM ON HADOOP Analyzed huge data-set by keeping the number of nodes constant. Analyzed the data-set by changing the number of epochs to be processed at a time

CONCLUSION: New Hadoop EEMD is better in terms of performance to analyze huge data as compared to the original algorithm For better results while analyzing huge data-set consider number of mappers i.e. number of epochs to be processed at a time to be approximately double than the nodes available in the cluster

T HANK YOU !!!!