Download presentation
Presentation is loading. Please wait.
Published byCharleen Lynch Modified over 9 years ago
1
DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski
2
2 Agenda General description of the problem Functionality Data Mining aspects Algorithm and optimisation Data Base aspects General entities scheme
3
3 General Description Universal Tool Different kinds of objects (e.g. preprocessed photos, hospital patients data) Finding similar objects Decision problems
4
4 Functionality Independent system – user operated Using sets of data already provided or uploading new types Influence on the way data is processed Possible usage in bigger systems as a processing engine Additional module used as a helping tool in more complex systems
5
5 General Use Case
6
6 Data Mining General Ideas Description of a object Definition of a distance K-NN algorithm Brief explanations of the algorithm Optimization Problem of comparing large number of objects Optimized solution – using grouping idea
7
7 Definitions Objects
8
8 K-NN K – Nearest Neighbors Idea standing behind k-nn Aim - finding k-similar objects to the one we are analyzing and eventually assigning appropriate decision Method - calculating distance from analyzed object to the others in our database and finding the closest ones
9
9 K-NN Graphical representation
10
10 Definitions Distance Calculations in multidimensional space Coefficients Alfa w i – weights – underlining importance of particular attributes n – number of all the attributes
11
11 Optimalisation The reason – cost of multidimensional distance computation for 1-all elements Solution – improved Knn Result – better efficiency because of reduced number of distance computations due to narrowed set of possibly similar objects
12
12 Step 1 - Group-oriented plane division
13
13 Step 2 – new Object appeares
14
14 Step 3
15
15 Step 4
16
16 Step 5
17
17 Grouping problem The problem – assigning object into appropriate groups according to chosen distance definition Solution – some clustering algorithm Brief example – k-means algorithm
18
18 DataBase – entities
19
19 DataBase General structure of database results from optimization issues Due to universal purpose of the system database may contain many different tables of objects Need of using system tables for defining experiments Group Member as a temporary table ?
20
20 Summary There is still a lot of work to do...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.