Presentation is loading. Please wait.

Presentation is loading. Please wait.

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

Similar presentations


Presentation on theme: "FREERIDE: A Framework for Rapid Implementation of Datamining Engines"— Presentation transcript:

1 FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Gagan Agrawal Department of Computer and Information Sciences Ohio State University

2 Motivation Large scale data is being created everywhere
Converting data to useful knowledge is challenging Algorithms used depend upon application and datasets available Need to keep trying different algorithms and parameters on available datasets Time required for implementing different algorithms and running them with different parameters on large datasets slows down the mining process

3 FREERIDE offers: The ability to rapidly prototype a high-performance mining implementation Distributed memory parallelization Shared memory parallelization Ability to process large and disk-resident datasets Minimal modifications to a sequential implementation for the above three

4 FREERIDE Experience Used for association mining, clustering, decision tree construction (prediction models), nearest neighbor searches Excellent speedups in both shared memory and distributed memory environments Excellent scaleups to large datasets

5 Using FREERIDE Divide computation into local and global reductions
Create an index for data Could be done easily for a number of mining algorithms User manual and distribution being prepared

6 Proposed Plan for PET Year 2
Understand the data mining needs of DOD Users Selection 1-2 demonstation problems Obtain relevant datasets Demonstrate the use of FREERIDE framework Develop presentation interface suitable for user needs Train DOD users in using the framework

7 PET Year 1 – ET-005 Apply data intensive computing tools to Liquid molding problem Post-processing of data generated from simulation of molding phenomenon Analyze the gradient of output to different parameters – avoid simulating with all possible combination of parameters Use ADR – also see if mining the simulation datasets is useful (not part of the deliverables)


Download ppt "FREERIDE: A Framework for Rapid Implementation of Datamining Engines"

Similar presentations


Ads by Google