Download presentation
Presentation is loading. Please wait.
Published byRobert Rodgers Modified over 8 years ago
1
SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo 813636 Study of high-dimensional data for data integration
2
INTRODUCTION What is data integration? Where has it been applied? Why do we use data integration? Problems in data integration
3
What is data integration? Data integration methods can be define as a means of combining data that are found in different sources and presented to the users in unified form.
4
Where it has been applied? Data integration has been applied in many domain areas such as healthcare, educational section, government agencies, websites, communication, local-based services and social networks
5
Why do we use data integration? The necessity for data integration occurs rapidly as the demanding rate and necessity to give out available data increases rapidly. Therefore, data integration is used to provide fast, current and clean data to meet users’ demand.
6
Problem with data integration There are different type of problem associated with data integration, among which are: Overlapping data High-dimensional data Incomplete data Inconsistencies data High-dimensional data is the problem in merging dissimilar data sources under one request of interface (Cohen, 2000). This problem for data integration will be discussed.
7
Research Question or Hypothesis To test if match and cluster techniques can give a high accuracy with lesser error when applied in solving high- dimensional data? To confirm if the technique is more efficient in solving high-dimensional data sets.
8
Research Objective To identify the methods used in solving high-dimensional data. To implement the method of matching and clustering techniques and identify the advantage over other methods To evaluate the method in the development system by evaluating various methods in which match and clustering has been used.
9
Identifying the methods used in solving high-dimensional data set Fast correlation-based filter approach, Classifying high-dimensional data, Rough Set Theory Match & cluster method. Among all, match & cluster method was chosen since it is more efficient than other methods used.
10
Implementation of Match and Cluster in Solving High-dimensional Data Set Match and Cluster methods can be use simultaneously in solving high-dimensional data sets problem since clustering gather data with related attributes from many thousands of dimensions, and then the clustered data are later match to produce relevant information from the data Advantages of using Match & Cluster Its can capture non-linear correlation data. It gives steady idea of the interface. Interpretation of data is very clear and communicable to analyst. It increases computational efficiency.
11
Evaluation of the method The experimentation of these datasets using this method reveals that the method of clustering and matching was efficient and it can be enhanced by using canopy clustering to reduce the cost of the current method that uses pairing function of both clustering and matching. Also, comparison of the accuracy to other traditional clustering method shows that the error in using clustering is very minimal, therefore, its accuracy is high which can be measured close to 85% accurate.
12
Solving high-dimensional data Collection of data Numeric data are changed to string Clustering of data by similarity measure to reduce dimensionality Matching the data using “if statement” to simplify data for easy use. Forming a decision tree to display it accuracy.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.