P ARALLEL A NALYSIS OF E GG D ATA WITH HADOOP ON FUTUREGRID Project Member: Rewati Ovalekar Project Guide : Gregor von Laszweski, Lizhe Wang
BACKGROUND Importance of EEG Data: Used in detecting and diagnosing brain related dis-orders EEMD algorithm is developed to analyze the signals
Drawbacks of EEG Data: EEG signals are complex in nature Analysis of EEG signals are highly data- intensive and compute- intensive Basic EEMD algorithm not time-efficient
P ARALLEL EEMD FOR EEG ANALYSIS EEMD algorithm was modified to analyze data points in parallel Multiple levels: Epoch Level Trial Level Data Channel Level
Epoch Level: Single data point is considered and is processed at each level. The output from this instance is not consumed by another.
Trial Level Each Epoch can be split into number of trials. Decomposition of each trail is performed independently. All trials for a particular epoch are combined to get an output for each epoch.
Data Channel Level Data is parallelized at each channel, then the output is combined for its corresponding trial. The grain of parallelization is coarse at this level.
M ULTI - THREAD DESIGN Each thread will process EEG data point for a particular Epoch –level. Local extrema will be calculated at each level. All local maxima and minima will be connected using cubic spline
M ULTI - THREAD DESIGN
L IMITATIONS OF MULTI - THREADED DESIGN Cannot process huge data due to availability of limited resources on a local machine S OLUTION : Develop Parallel EEMD algorithm using MapReduce on Hadoop
Why Hadoop? Hadoop provides a distributed framework to run applications on large cluster MapReduce is used to implement the parallel EEMD algorithm
M APREDUCE DESIGN : ( EPOCH LEVEL PARALLELIZATION ) Epoch Mapper: Each map function will take input as single point Calculate local extrema at each epoch level Connect minima and maxima by cubic spline Generate points which will be combined in Epoch Reducer
M APREDUCE DESIGN : ( EPOCH LEVEL PARALLELIZATION ) Epoch Reducer: Each reduce function will combine the points having the same egg data point Generates data points, 8 IMF and one left data for an individual eeg data point
M APREDUCE DESIGN : ( EPOCH LEVEL PARALLELIZATION )
P ERFORMANCE ANALYSIS OF ORIGINAL ALGORITHM
P ERFORMANCE ANALYSIS OF EEMD ALGORITHM ON HADOOP Analyzed for the same data-set by changing the number of nodes to be considered in a cluster
P ERFORMANCE ANALYSIS OF EEMD ALGORITHM ON HADOOP Analyzed huge data-set by keeping the number of nodes constant. Analyzed the data-set by changing the number of epochs to be processed at a time
CONCLUSION: New Hadoop EEMD is better in terms of performance to analyze huge data as compared to the original algorithm For better results while analyzing huge data-set consider number of mappers i.e. number of epochs to be processed at a time to be approximately double than the nodes available in the cluster
T HANK YOU !!!!