Jawad Tahsin Danish Mustafa Zaidi Kazim Zaidi Zulfiqar Hadi.

Slides:



Advertisements
Similar presentations
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Advertisements

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
More Vectors.
DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested.
PARTITIONAL CLUSTERING
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
K-means clustering CS281B Winter02 Yan Wang and Lihua Lin.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Evaluating Performance for Data Mining Techniques
Supervised Learning and k Nearest Neighbors Business Intelligence for Managers.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Radial Basis Function Networks
Algorithms: The Basic Methods Witten – Chapter 4 Charles Tappert Professor of Computer Science School of CSIS, Pace University.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Clustering.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Linear Models for Classification
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Data Mining and Text Mining. The Standard Data Mining process.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Mining: Basic Cluster Analysis
PREDICT 422: Practical Machine Learning
Semi-Supervised Clustering
Microarrays Cluster analysis.
Prepared by: Mahmoud Rafeek Al-Farra
Revision (Part II) Ke Chen
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Numerical Analysis Lecture 17.
Linear Algebra Lecture 32.
Fuzzy Clustering Algorithms
Text Categorization Berlin Chen 2003 Reference:
Principal Component Analysis
Data Mining CSCI 307, Spring 2019 Lecture 24
Introduction to Machine learning
Presentation transcript:

Jawad Tahsin Danish Mustafa Zaidi Kazim Zaidi Zulfiqar Hadi

 Classification  Clustering  Minkowski distance  K-Mean Algorithm  Similarity  Cosine based similarity  Eigen value

 “Classification is a data mining (machine learning) technique used to predict group membership for data instances.  In general, in classification you have a set of predefined classes and want to know which class a new object belongs to.  For example, you may wish to use classification to predict whether the weather on a particular day will be “sunny”, “rainy” or “cloudy”. Popular classification techniques include decision trees and neural networks.”

 Clustering is a data mining (machine learning) technique used to place data elements into related groups without advance knowledge of the group definitions.  Popular clustering techniques include k- means clustering and expectation maximization (EM) clustering.

Goal: Minimise the sum of the within cluster variances K stands for number of clusters Assigns data elements to the closest cluster (centre). The algorithm is iterative in nature

Initially, the number of clusters must be known, or chosen, to be K say. The initial step is the choose a set of K instances as centres of the clusters. Often chosen such that the points are mutually “farthest apart”, in some way. Next, the algorithm considers each instance and assigns it to the cluster which is closest. The cluster centroids are recalculated either after each instance assignment, or after the whole cycle of re-assignments. This process is iterated.

 Example

 Cosine similarity is a measure of similarity between two vectors or data points.  It is determined by measuring the cosine of the angles between the two points.  If the angle between the two points is zero then the cosine would be 1 and therefore the two entities would be perfectly similar to each other  If the angle between the two points is 90’ then the cosine would be 0 and therefore the two entities would be perfectly dissimilar to each other

 Similar entities  Dissimilar entities

 Cosine between the two points, A(x1,y1) and B(x2,y2) can be calculated by: (x1.x2)+(y1.y2) √(x1²+y1²).√(x2²+y2²)

 The eigenvectors of a square matrix are the non- zero vectors that, after being multiplied by the matrix, remain parallel to the original vector.  For each eigenvector, the corresponding eigenvalue is the factor by which the eigenvector is scaled when multiplied by the matrix. Av = λv where A = square matrix v = eigen vector of A λ =Scalar