Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research National University of Computers & Emerging Sciences, Islamabad
Data Structures in Data Mining Data matrix –Table or database –n records and m attributes, –n >> m C 1,1 C 1,2 C 1,3 C 1,m C 2,1 C 2,2 C 2,3 C 2,m C 3,1 C 3,2 C 3,3 C 3,m C n,1 C n,2 C n,3 C n,m … … S 1,2 S 1,3 S 1,n S 2,1 1S 2,3 S 2,n S 3,1 S 3,2 1S 3,n S n,1 S n,2 S n,3 1 … … Similarity matrix –Symmetric square matrix –n x n or m x m
Main types of DATA MINING Supervised Bayesian Modeling Decision Trees Neural Networks Etc. Unsupervised One-way Clustering Two-way Clustering Type and number of classes are NOT known in advance Type and number of classes are known in advance
Clustering: Min-Max Distance Age Salary outlier Inter-cluster distances are maximized Intra-cluster distances are minimized
How Clustering works?
One-way clustering example INPUT OUTPUT Black spots are noise White spots are missing data
Data Mining Agriculture data INPUT Clustered OUTPUT clusters
Which class? Classifier (model) Unseen Data Classification
Output Confidence Level Inputs How Classification work?
Classification Process (1): Model ConstructionTrainingData ClassificationAlgorithms IF time/items >= 6 THEN gender = ‘F’ Classifier(Model) (observations, measurements, etc.) Relationship between shopping time and items bought
Classification Process (2): Use the Model in PredictionTestingData Unseen Data (Firdous, Time= 15 Items = 1) Classifier Gender?
Clustering vs. Cluster Detection
Clustering vs. Cluster Detection ExampleA B
The K-Means Clustering
The K-Means Clustering: Example A B D C
The K-Means Clustering: Comment