SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo 813636 Study of high-dimensional data for data integration.

SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo 813636 Study of high-dimensional data for data integration

INTRODUCTION What is data integration? Where has it been applied? Why do we use data integration? Problems in data integration

What is data integration?  Data integration methods can be define as a means of combining data that are found in different sources and presented to the users in unified form.

Where it has been applied?  Data integration has been applied in many domain areas such as healthcare, educational section, government agencies, websites, communication, local-based services and social networks

Why do we use data integration?  The necessity for data integration occurs rapidly as the demanding rate and necessity to give out available data increases rapidly. Therefore, data integration is used to provide fast, current and clean data to meet users’ demand.

Problem with data integration There are different type of problem associated with data integration, among which are:  Overlapping data  High-dimensional data  Incomplete data  Inconsistencies data  High-dimensional data is the problem in merging dissimilar data sources under one request of interface (Cohen, 2000). This problem for data integration will be discussed.

Research Question or Hypothesis  To test if match and cluster techniques can give a high accuracy with lesser error when applied in solving high- dimensional data?  To confirm if the technique is more efficient in solving high-dimensional data sets.

Research Objective  To identify the methods used in solving high-dimensional data.  To implement the method of matching and clustering techniques and identify the advantage over other methods  To evaluate the method in the development system by evaluating various methods in which match and clustering has been used.

Identifying the methods used in solving high-dimensional data set Fast correlation-based filter approach, Classifying high-dimensional data, Rough Set Theory Match & cluster method. Among all, match & cluster method was chosen since it is more efficient than other methods used.

Implementation of Match and Cluster in Solving High-dimensional Data Set  Match and Cluster methods can be use simultaneously in solving high-dimensional data sets problem since clustering gather data with related attributes from many thousands of dimensions, and then the clustered data are later match to produce relevant information from the data  Advantages of using Match & Cluster Its can capture non-linear correlation data. It gives steady idea of the interface. Interpretation of data is very clear and communicable to analyst. It increases computational efficiency.

Evaluation of the method The experimentation of these datasets using this method reveals that the method of clustering and matching was efficient and it can be enhanced by using canopy clustering to reduce the cost of the current method that uses pairing function of both clustering and matching. Also, comparison of the accuracy to other traditional clustering method shows that the error in using clustering is very minimal, therefore, its accuracy is high which can be measured close to 85% accurate.

Solving high-dimensional data  Collection of data  Numeric data are changed to string  Clustering of data by similarity measure to reduce dimensionality  Matching the data using “if statement” to simplify data for easy use.  Forming a decision tree to display it accuracy.

SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo 813636 Study of high-dimensional data for data integration.

Similar presentations

Presentation on theme: "SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo 813636 Study of high-dimensional data for data integration."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo 813636 Study of high-dimensional data for data integration.

Similar presentations

Presentation on theme: "SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo 813636 Study of high-dimensional data for data integration."— Presentation transcript:

Similar presentations

About project

Feedback