1 Statistics & R, TiP, 2011/12  Cluster Analysis  Technique of exploratory data analysis  Do data divide into groups or ‘clusters’  Are there ‘natural’

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
CH 27. * Data were collected on 208 boys and 206 girls. Parents reported the month of the baby’s birth and age (in weeks) at which their child first crawled.
5 Minute Check Complete on the back of your homework. Tell whether each number is divisible by 2,3,4,5,6,9,10. You can use your rules sheet
Line Efficiency     Percentage Month Today’s Date
Time Series Forecasting Outline: 1.Measuring forecast error 2.The multiplicative time series model 3.Naïve extrapolation 4.The mean forecast model 5.Moving.
Phase Shifts of Sinusoidal Graphs. We will graph sine functions of the following form: The amplitude A =  A  The period T = 22  The phase shift comes.
Market Analysis & Forecasting Trends Businesses attempt to predict the future – need to plan ahead Why?
Time-Series Forecast Models EXAMPLE Monthly Sales ( in units ) Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Data Point or (observation) MGMT E-5070.
QuEST Mental Health Data for Improvement Network Run Charts Alison Pickering Senior Information Analyst.
Time-Series Forecast Models  A time series is a sequence of evenly time-spaced data points, such as daily shipments, weekly sales, or quarterly earnings.
PRODUCTION & OPERATIONS MANAGEMENT Module II Forecasting for operations Prof. A.Das, MIMTS.
The Birthday Paradox July Definition 2 Birthday attacks are a class of brute-force techniques that target the cryptographic hash functions. The.
Four files erik_a2_stockholm_temp Erik simulation scenario A2 until 2100 erik_a2_stockholm_prec Erik simulation scenario A2 until.
The 6 steps of data collection: 1. Make predictions 2. Write a questionnaire 3. Collect data (Data Collection Sheet) 4. Make results tables 5. Draw graphs.
Data Mining: Basic Cluster Analysis
Jan 2016 Solar Lunar Data.
Quantitative Analysis for Management

The 6 steps of data collection:
Analyzing patterns in the phenomena
Q1 Jan Feb Mar ENTER TEXT HERE Notes
Feeder cattle Seasonal Price trends

Project timeline # 3 Step # 3 is about x, y and z # 2
Average Monthly Temperature and Rainfall



Nuffield Free-Standing Mathematics Activities Music Festival
Mammoth Caves National Park, Kentucky
2017 Jan Sun Mon Tue Wed Thu Fri Sat

Gantt Chart Enter Year Here Activities Jan Feb Mar Apr May Jun Jul Aug
Jagdish Gangolly State University of New York at Albany
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Prepared by Lee Revere and John Large
Free PPT Diagrams : ALLPPT.com


Calendar Year 2009 Insure Oklahoma Total & Projected Enrollment
MONTH CYCLE BEGINS CYCLE ENDS DUE TO FINANCE JUL /2/2015
Jan Sun Mon Tue Wed Thu Fri Sat
HOW TO DRAW CLIMATE GRAPHS
©G Dear 2008 – Not to be sold/Free to use
Electricity Cost and Use – FY 2016 and FY 2017

Unemployment in Today’s Economy

Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Free PPT Diagrams : ALLPPT.com

SEEM4630 Tutorial 3 – Clustering.
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Project timeline # 3 Step # 3 is about x, y and z # 2

TIMELINE NAME OF PROJECT Today 2016 Jan Feb Mar Apr May Jun

Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Pilot of revised survey
Presentation transcript:

1 Statistics & R, TiP, 2011/12  Cluster Analysis  Technique of exploratory data analysis  Do data divide into groups or ‘clusters’  Are there ‘natural’ divisions into groups & sub-groups  Some attempt to answer questions of whether observed clusters are ‘real’ or by chance --- But that needs a statistical basis  Most techniques based on intuitive ideas

2 Statistics & R, TiP, 2011/12  Steps needed  Collect data  Decide on measure of similarity Packages offer a default & alternative choices Depend on type of data:- continuous, binary, categorical (unordered) categorical (ordered), mixed  Decide on rule to ‘form’ a cluster Different rules form different types Loose/compact/robust (not influenced by outliers)  Decide on ‘suitable’ number of clusters Display dendrogram (tree diagram) & visually  (validation of solution)

3 Statistics & R, TiP, 2011/12  Similarity measures  Euclidean Ordinary measurements  Binary Presence/absence  Manhatan binary  Other specialist ones

4 Statistics & R, TiP, 2011/12  Clustering method  Single link Item joins if close to one member of cluster  Complete Joins if close to all members of cluster  Average Joins if close to average of cluster members  Ward’s Method Based on average sum of squared distances –Most ‘statistical’ –Most stable when further data available

5 Statistics & R, TiP, 2011/12  Implementation in R  Step 2 distance matrix from data collected in step 1 Function dist(…) Defaul t is euclidean  Step 3 Function hclust(.,.,.,…) Default is Ward’s method  Step 4 Decide on number of clusters Function plclust(.,.,.,…) or plot(.)

6 Statistics & R, TiP, 2011/12  Library pvcluster  Facilities for calculating p-values  By simulation  More sophisticated alternatives in library cluster  Also many ‘application-orientated’ libraries  Functions agnes(), clara(), diana(), fanny(), mona()  agglomerative nesting, large data sets, divisive analysis, monothetic……..

7 Statistics & R, TiP, 2011/12  Many good sources  R help system has examples  TRY IT & SEE WHAT HAPPENS

8 Statistics & R, TiP, 2011/12  Simple Time series  Example: Air Passengers Monthly totals of air passengers Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

9 Statistics & R, TiP, 2011/12 ts.plot(AirPassengers,lwd=3,cex=2),)

10 Statistics & R, TiP, 2011/12  Numbers are generally going up  TREND  Annual pattern with peak in mid-year  SEASONALITY  Random  Extra ‘noise’  Interest to decompose series into these components

11 Statistics & R, TiP, 2011/12

12 Statistics & R, TiP, 2011/12  Helps choose statistical model for series  For understanding ‘causes  For prediction Predict trend and seasonal components separately –(cannot predict random component)  May use estimates of random components for matching with other series Cross-matching Cross-dating Tree-rings Pollen sequences ?? Ice cores???