Download presentation
Presentation is loading. Please wait.
Published byRosanna Sutton Modified over 9 years ago
1
Data Mining Strategies
2
Scales of Measurement Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 677-680 Four Scales Categorical (nominal) Ordinal (only order matters) Interval (difference between two vars is meaningful) Ratio (when variable is 0.0 there is none of that data; Kelvin is but C and F are not)
3
What to Know about the Scales The measurement principle involved for each scale Examples of the measurement scales Permissible arithmetic operations for each scale
4
Categorical Scale Data The values of the scale have no numeric meaning Examples Gender Ethnicity Marital Status Hair Color Operations Counting (only)
5
Ordinal Scale Data The categories can be ordered But the intervals between adjacent scale values are indeterminate Examples Movie ratings (0, 1 or 2 thumbs up) U.S.D.A. beef (good, choice, prime) The rank order of anything Operations Counting Greater than or less than operations
6
Interval Scale Data Intervals between adjacent scale values are equal Examples Degrees Fahrenheit Most personality measures IQ intelligence score Operations Counting Greater than or less than operations Addition and subtraction of scale values.
7
Ratio Scale Data There is a rationale zero point for the scale An absolute zero Examples Degrees Kelvin Annual income in dollars Length, distance, size cm, kB, inches, km Operations All plus Multiplication and division of scale values.
8
Variables Independent Input x Dependent Output f(x) f(x) = 3+ 2x 2
9
Data Mining Strategies Unsupervised (No dependent variables used) Clustering Market Basket Analysis Information Visualization Supervised (At least one dependent variable used for training) Classification Estimation Prediction
10
Clustering Cluster analysis divides data into groups (clusters) that are meaningful, useful or both Clusters capture the natural structure of the data Clustering allows us to think about the data at a new level of abstraction Cluster analysis is often the first step in a data mining project
11
Cluster of Stars
12
Water Clusters
13
Cellular Clusters
14
Cluster Analysis Uses information found in the data that describes objects and their relationships Goal: That objects within a group be similar to one another and different from objects in other groups The greater the similarity within groups and the greater the difference between groups, the better the clustering
15
How Many Clusters?
16
Three Clusters Identified
17
Six Clusters Identified
18
Types of Clustering Partitional clustering Heirarchical clustering Exclusive clustering Overlaping clustering Fuzzy clustering Complete clustering Partial clustering
19
Partitional Clustering A division of a set of data into non- overlaping clusters Each data point is in exactly one cluster Example of Partitional Clustering Example of Partitional Clustering
20
Heirarchical clustering Permit subclusters (nested clusters within clusters) Example of Hierarchical Clustering Example of Hierarchical Clustering
21
Exclusive clustering Each object is assigned to a single cluster
22
Overlaping Clustering Non-exclusive A data point can belong to two or more clusters simultaneously
23
Fuzzy Clustering Every data point belongs to every cluster with a membership weight. Membership ranges from 0 (absolutely does not belong) to 1 (absolutely belongs) The sum of the membership weights for each point is 1 C1 40% C2 60% C1 C2 C1 01% C2 99% C1 75% C2 25%
24
Complete Clustering Assigns every data point to a cluster No data point is left out of a cluster
25
Partial Clustering Does not assign every data point to a cluster Some data points can not belong to any cluster Noise Outliers Uninteresting background Classify newspaper stories Many fall into Global warming Terrorism Some stories are unique Cable Tie just graduated from the CofC in CS
26
K-Means 1. Select K points as initial centroids 2. Repeat 1. Form K cluster by assigning each point to its closest centroid. 2. Recompute the centroid of each cluster. 3. Until centroids so not change Chris Starr: A centroid is the center of a cluster Chris Starr: A centroid is the center of a cluster
27
The centroids are repositioned until stable in the K-means algorithm.
28
Observe Your Environment Start looking for clusters around you Think about how the clusters are formed Are they hierarchical? Are they fuzzy clusters? Are they complete clusters?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.