Aksel Thomsen Erik Sommer

Slides:



Advertisements
Similar presentations
Segmenting B2B Markets Anand G. Khanna
Advertisements

Clustering II.
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Chapter 6: Memory Management
Transportation Problem (TP) and Assignment Problem (AP)
Basic Spreadsheet Functions Objective Functions are predefined formulas that perform calculations by using specific values, called arguments, in.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Transportation and Assignment Solution Procedures
Jonathan Yoo. USPS: Current System  Not OR- optimized  Based on pre- determined scheduling of trucks  Government- protected monopoly.
Elsa Nickl and Cort Willmott Department of Geography
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Geo479/579: Geostatistics Ch14. Search Strategies.
1 IRWIN  a Times Mirror Higher Education Group, Inc. company, 1996 Facilities Layout.
Clustering II.
Identifying "Good" Architectural Design Alternatives with Multi-Objective Optimization Strategies By Lars Grunske Presented by Robert Dannels.
Lecture 22 Multiple Regression (Sections )
Virtual Memory. Names, Virtual Addresses & Physical Addresses Source Program Absolute Module Name Space P i ’s Virtual Address Space P i ’s Virtual Address.
Transportation Models Transportation problem is about distribution of goods and services from several supply locations to several demand locations. Transportation.
Barents Sea fish modelling in Uncover Daniel Howell Marine Research Institute of Bergen.
Sampling Prepared by Dr. Manal Moussa. Sampling Prepared by Dr. Manal Moussa.
© John M. Abowd 2005, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2005.
Continuum Crowds Adrien Treuille, Siggraph 王上文.
Example 8.7 Cluster Analysis | 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | CLUSTERS.XLS n This file contains demographic data on 49 of.
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
19 Costing Systems: Process Costing Principles of Accounting 12e
Chapter 9 Market Segmentation, Targeting, and Positioning
Radial Basis Function (RBF) Networks
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp
Assignment Model Lecture 21 By Dr Arshad Zaheer. RECAP  Transportation model (Maximization)  Illustration (Demand > Supply)  Optimal Solution  Modi.
Clustering Uncertain Data Speaker: Ngai Wang Kay.
Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.
Scheduling policies for real- time embedded systems.
Chapter 4 Control Charts for Measurements with Subgrouping (for One Variable)
DIVERSITY PRESERVING EVOLUTIONARY MULTI-OBJECTIVE SEARCH Brian Piper1, Hana Chmielewski2, Ranji Ranjithan1,2 1Operations Research 2Civil Engineering.
1 Network Models Transportation Problem (TP) Distributing any commodity from any group of supply centers, called sources, to any group of receiving.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Machine Learning Queens College Lecture 7: Clustering.
Management Science 461 Lecture 3 – Covering Models September 23, 2008.
SAMPLING TECHNIQUES CHAPTER 2 Dr. BALAMURUGAN MUTHURAMAN
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob Fast Algorithms for Projected Clustering.
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Market watch/Business report – Grid statistics Mr. Erik Sommer, EFGS Expert Group for Business Models, Denmark. European Forum for Geostatistics.
F1 Micro economic factors. 1. The micro-environment Definition The micro environment refers to the immediate operational environment including suppliers,
The number of procedures by type and data about rev/costs. The summary section, includes a change area, which shows the difference between the current.
Peter Linde, Interviewservice Statistics Denmark
Urban growth pattern of the San Antonio area
Dynamic Graph Partitioning Algorithm
Hashing Alexandra Stefan.
POTENTIAL METHODS Part 2c Data interpolation
UZAKTAN ALGIILAMA UYGULAMALARI Segmentasyon Algoritmaları
Location Where to put facilities? Transportation costs
Units of Analysis The Basics.
Ad Hoc Geography Committee Update
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Location Where to put facilities? Transportation costs
Statistical Data Analysis
Erik Sommer Statistics Denmark
Data Mining – Chapter 4 Cluster Analysis Part 2
A more complex LP problem with What’sBest!
Sample Surveys Idea 1: Examine a part of the whole.
Expert Expert Group Meeting on Statistical Methodology for Delineating Cities and Rural Areas Iven M. Sikanyiti 28th-30th January 2019 United Nations:
Data Structures & Algorithms
A more complex LP problem
Cluster analysis Presented by Dr.Chayada Bhadrakom
Hierarchical Clustering
Presumptions Subgroups (samples) of data are formed.
Presentation transcript:

Aksel Thomsen Erik Sommer Clustered data - grids Aksel Thomsen Erik Sommer

Aksel Thomsen Erik Sommer Clustered data - grids Aksel Thomsen Erik Sommer

Outline Grid data Our method Result examples Potential expansions Commercial aspects

Grid data Either 100x100m or 1x1km grid cells Vast majority consists of few households  Clustering is needed No. of cells <100 households Percentage 100x100 m 423,755 421,655 99.5% 1x1 km 38,908 34,951 89.9%

Bornholm – Case study

Population in Bornholm I

Population in Bornholm II

Method - Principles Each grid is assigned to a unique municipality Time consistent No. of households in a grid is defined as the minimum over e.g. two years All cells with min. K households are their own cluster The remaining cells are clustered by an algorithm

Method - Algorithm Start in the South western corner Combined with the nearest remaining cell New center is calculated The nearest still remaining cell is added to the cluster 3. and 5. are repeated until the cluster consists of min. K households If less than K households remains they are added to the last cluster

Method – Example

Method – Example

Method – Example

Method – Example

Method – Example

Method – Example

Bornholm – Result

Bornholm– Average income

Potential expansions Modify the distance parameter Now: Only geographical distance Potential: Prioritize similar grids nearby Same households types Same income Same demographics Avoid mixing very different households in the same cluster

Commercial aspects (1 of 4) – action done by customers Many of the customers actually handle the clustering themselves. The clustering done by the customers/users has to meet our requirements for the minimum of households for at least two years. The clustering done by the customers can be very complex and already include a number of the potential expansions listed by Statistics Denmark.

Commercial aspects (2 of 4) – role Statistics Denmark. The primary role for Statistics Denmark in regards to clustering of grids is to be an alternative supplier. The primary demand for our clustering has been for us to be a supplier of simple clusters that are easy to understand and easy to use “keeping it simple”. Very often it seems like that the creator of the clusters tend to forget the important task of explaining and illustrating the methods used – so this is an important factor for as a supplier.

Commercial aspects (3 of 4) – two approaches Clusters can be done either simple using nearest cell approach (as shown by Aksel ) or more complex including various factors in the algorithm creating more optimized clusters (as listed as potential expansions for Statistics Denmark and already used by existing customers). Clusters can then either be created first and then be fixed as static clusters (non-dynamic) and then variables can be added or the clusters can be created by using/sorting the selected variable making dynamic clusters (changed by each variable used).

Commercial aspects (4 of 4) – two approaches Clusters with a minimum of 20, 50, 100 or 150 households used for the static clusters (non-dynamic). Micro Clusters with a minimum of 5 household used to create dynamic macro clusters with a total minimum of 300 households within a municipality where the first cluster will have the best value in regards to the selected variable and the second clusters will have the next best value etc. for example sorted by decreasing average household income.

Aksel Thomsen Erik Sommer Clustered data – grids Aksel Thomsen Erik Sommer