Download presentation
Presentation is loading. Please wait.
Published byPolly Gaines Modified over 6 years ago
1
Data Mining Algorithms for Large-Scale Distributed Systems
Presenter: Ran Wolff Joint work with Assaf Schuster 2003
2
What is Data Mining? Data mining problems all deal with the automatic analysis of large database The outcome of a data mining algorithm is a model which uncovers the nature of the data
3
Main Data Mining Problems
Association rules Classification Clustering Source IP begins with packets per connection > 1000 Source IP begins with and TTL < 5 will be dropped There are three types of packets coming from : Simple, Heavy load, and Malicious. In data mining the answer precedes the question
4
Why Data Mine an LSD System?
Data mining is good, when properly used data mining yields money It is otherwise difficult to monitor an LSD system: lots of data, spread across the system, impossible to collect Many interesting phenomena are inherently distributed (e.g., DDoS), it is not enough to just monitor a few nodes
5
Our Work We developed an association rule mining algorithm that works well in LSD Systems Local and therefore scalable Asynchronous and therefore fast Dynamic and therefore incremental and robust Accurate – you get what you expect Anytime – you get early results fast
6
In a Tea Spoon A distributed data mining algorithm can be described as a series of distributed decisions Those decisions are reduced to a majority vote We developed a majority voting protocol which has all those good qualities The outcome is an LSD association rule mining (still to come: classification)
8
Main Results By the time the database is scanned once, in parallel, the average node has discovered 95% of the rules, and has less than 10% false rules.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.