Presentation is loading. Please wait.

Presentation is loading. Please wait.

Map Generalization and Data Classification Gary Christopherson

Similar presentations


Presentation on theme: "Map Generalization and Data Classification Gary Christopherson"— Presentation transcript:

1 Map Generalization and Data Classification Gary Christopherson
4/4/2019 Map Generalization and Data Classification Gary Christopherson

2 Review/Preview Everything we talked about before the midterm
Everything we will be talking about before the final Data classification Why classify data Classification rules How to classify data Map types Map layout 4/4/2019

3 Data Classification The process of sorting or arranging entities into groups or categories On a map, the process of representing members of a group by the same symbol, usually defined in a legend. 4/4/2019

4 What’s the point? When there are too many data values on a map it can lose its power to tell a story or make a point 4/4/2019

5 How Many Symbols?? 4/4/2019

6 How Many Symbols?? -- 1 4/4/2019

7 How Many Symbols?? -- 3 4/4/2019

8 How Many Symbols?? 1107 SYMBOLS 4 SYMBOLS 4/4/2019

9 Too Many Colors?? 4/4/2019

10 Jenks and Coulson’s Classification Rules
Encompass the full range of the data. Have neither overlapping values nor vacant classes. Be great enough in number to avoid sacrificing the accuracy of the data, but not so numerous as to impute a greater degree of accuracy than is warranted by the nature of the collected data. Divide the data into reasonably equal groups of observations. Have a logical mathematical relationship if practical. 4/4/2019

11 Map Abstraction Process
4/4/2019

12 Nominal Scale Data Nominal scale data merely establish identity
A phone number signifies only the unique identity of the phone jack or the cell phone In a race, the numbers used to identify individual racers are at a nominal scale These identity numbers do not indicate order or relative value 1. Nominal · on a nominal scale, numbers merely establish identity · e.g. a phone number signifies only the unique identity of the phone · in the race, the numbers issued to racers which are used to identify individuals are on a nominal scale · these identity numbers do not indicate any order or relative value in terms of the race outcome 4/4/2019

13 Nominal, or Categorical Data
Qualitative Dealing with qualitative data that is ordered but without a measurable range There are no absolute rules for this kind of classification, just general guidelines : Features in different classes or categories should be more dissimilar than similar and should be symbolized differently Features in the same class or category should be more similar than dissimilar and should be symbolized similarly 4/4/2019

14 Nominal/Categorical Symbolization
4/4/2019

15 Nominal/Categorical Symbolization
4/4/2019

16 Nominal/Categorical Symbolization
4/4/2019

17 Ordinal Scale Data Ordinal Numbers establish order only In the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place, are measured on an ordinal scale. The numbers mean something relative to each other, but we do not know how much time difference there is between each racer 2. Ordinal · on an ordinal scale, numbers establish order only · phone number is not more of anything than , so phone numbers are not ordinal · in the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place, are measured on an ordinal scale. The numbers mean something relative to each other BUT · we do not know how much time difference there is between each racer 4/4/2019

18 Ordinal Data Quantitative
Dealing with Quantitative data that is ordered but without a measurable range Ordinal Classes show relative values, not absolute values In this example, 1 is less than 3, but we don’t know how much less it is Using numbers to label ordinal data is often confusing But be careful to use text that does not imply absolute values 4/4/2019

19 Ordinal Data 4/4/2019

20 Interval Scale Data On an interval scale, the difference (interval) between numbers is meaningful, but the numbering scale does not start at zero – i.e. no absolute zero Subtraction makes sense but division does not 200C is 100 degrees warmer than 100C, but you can’t say that it is twice as hot In the race: the time of day that each racer finished is measured on an interval scale If racers finished at 9:10, 9:20 and 18:20, then racer one finished 10 minutes before racer two But the racer finishing at 9:10 did not finish twice as fast as the racer finishing at 18:20 3. Interval · on interval scales, the difference (interval) between numbers is meaningful, but the numbering scale does not start at 0 · subtraction makes sense but division does not · e.g. it makes sense to say that 200C is 100 degrees warmer than 100C, so Celsius temperature is an interval scale, but 200C is not twice as warm as 100C · e.g. it makes no sense to say that the phone number is more than , so phone numbers are not measurements on an interval scale · in the race, the time of the day that each racer finished is measured on an interval scale · if the racers finished at 9:10 GMT, 9:20 GMT and 9:25 GMT, then racer one finished 10 minutes before racer 2 and the difference between racers 1 and 2 is twice that of the difference between racers 2 and 3 · however, the racer finishing at 9:10 GMT did not finish twice as fast as the racer finishing at 18:20 GMT 4/4/2019

21 Interval Data Quantitative, but deals with quantitative data that has no absolute zero – so subtraction works but division does not Interval Classes show a range of values In this example the classes show a range of low elevations for states Notice the negative numbers – this is why these values are interval scale, not ratio scale 4/4/2019

22 Ratio Scale Data On a ratio scale, measurement has an absolute zero and the difference between numbers is significant … Division makes sense A 50 kg person weighs half as much as a 100 kg person, so weight in kg is on a ratio scale Is weight in pounds on a ratio scale? Is temperature on a ratio scale? In the race: the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place finisher took 5 hours The 450th finisher took twice as long as the first place finisher (5/2.5 = 2) Allows direct comparison 4. Ratio · on a ratio scale, measurement has an absolute zero and the difference between numbers is significant · division makes sense · e.g. it makes sense to say that a 50 kg person weighs half as much as a 100 kg person, so weight in kg is on a ratio scale. Clearly the same is true for pounds. A 150 # person weighs half as much as a 300 pd person. · the zero point of weight is absolute but the zero point of the Celsius scale (used above) is not · in our race, the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place finisher took 5 hours · the 450th finisher took twice as long as the first place finisher (5/2.5 = 2) 4/4/2019

23 Ratio Scale Data Quantitative data that has an absolute zero – so both subtraction and division work Ratio scale classes show a range of numeric values In this example the classes show a range of population for states Notice there are no negative numbers – this is why these values are ratio and not interval scale data 4/4/2019

24 Data Classification Most maps use data that have been classified
Number of classes is usually between 5 and 10, more likely 5 than 10 Classification methods vary depending on data and on the story you are telling ArcGIS includes a number of different classification schemes 4/4/2019

25 Data Classification Best carried out in the context of a histogram
X-axis shows data values – here the number of farms in a county Y-axis shows frequency – here the number of counties Gray bars show the number of observations – here the number of counties for each data value Blue lines divide data into classes of aggregated data 4/4/2019

26 Data Classification There are nine standard classification schemes:
natural breaks, optimization, nested means, mean and standard deviation, equal interval, quantile, arithmetic, geometric, and user defined Creating classes based on these schemes requires summary statistics and calculations – some simple and some difficult We will look at equal interval, quantile, standard deviation, and natural breaks 4/4/2019

27 Summary Statistics Mean (average)
The sum of all values divided by the number of values in the set Mode The value that appears with the greatest frequency. Median The middle value of a set of values when they are ordered by rank, when there are 2 middle values (due to an even number in the set), the mean of those 2 numbers is used Standard Deviation The spread of values from their mean, calculated as the square root of the sum of the squared deviations from the mean value, divided by the number of elements. Also known as the square root of the variance 4/4/2019

28 Equal Interval Constant interval between classes – based on values along the x-axis Number of observations will be different from class to class Good if you want to make direct comparisons between different choropleth maps 4/4/2019

29 Calculating Equal Interval
1440 – 173 = 1277 1277 / 5 = 253.4 Calculating Equal Interval = 426; = 679; etc Subtract minimum value from maximum value Divide the result of this subtraction by the number of classes you want The result of the division will be the width of each class Start with the minimum and add this value to get the width of the first class Continue adding this value to the sum of the previous class until all classes have been created 4/4/2019

30 Quantile Equal number of observations per class
Because the number of observations will be the same from class to class, the interval between classes will be different Good classification scheme to use if certain statistical tests require equal numbers of observations 4/4/2019

31 Calculating Quantiles
92 / 5 = 18.4 Calculating Quantiles Divide the count of observations/features by the number of classes you want This will give you the number of features for each class Arrange your features from least to greatest value Divide them into classes so that the number of features in each class matches the result of your division equation 4/4/2019

32 Calculating Quantiles
92 / 5 = 18.4 Calculating Quantiles Divide the count of features by the number of classes you want This will give you the number of features for each class Arrange your features from least to greatest value Divide them into classes so that the number of features in each class matches the result of your division equation 4/4/2019

33 Jenks – Natural Breaks Minimizes variance within a class by dividing classes in areas where there are large breaks in the data Different sized classes, and different number of features Often the best choice for conveying information accurately to map readers Cannot be used to make direct comparisons between maps 4/4/2019

34 Calculating Jenks Don’t worry about this one
It is a method of statistical data classification that partitions data into classes using an algorithm that calculates groupings of data values based on the data distribution. Jenks' optimization seeks to reduce variance within groups and maximize variance between groups. 4/4/2019

35 Mean and Standard Deviations
Classes determined by the mean and deviations from the mean Best if data displays a normal distribution Usually symbolized using a diverging color scheme 4/4/2019

36 Classifying by Mean and Std. Dev.
Calculate the mean of your data Calculate the standard deviation of your data Arrange your first class so that it straddles the mean Then add classes at intervals of std. dev. both above and below the mean class 4/4/2019

37 Same Data, Different Classification Schemes
4/4/2019

38 Jenks and Coulson’s rules
Encompass the full range of the data. Have neither overlapping values nor vacant classes. Be great enough in number to avoid sacrificing the accuracy of the data, but not so numerous as to impute a greater degree of accuracy than is warranted by the nature of the collected data. Divide the data into reasonably equal groups of observations. Have a logical mathematical relationship if practical. 4/4/2019

39 Practice Put the following numbers into different classes
Quantile – five classes Equal Interval – five classes 7, 1, 18, 20, 6, 14, 19, 13, 2, 1, 25, 2, 23, 1, 15 Quantile – (1,1,1) (2,2,6) (7,13,14) (15,18,19) (20,23,25) Equal Interval (1,1,1,2,2) (6,7) (13,14,15) (18,19,20) (23,25) 4/4/2019

40 Equal Interval 1440 – 170 = 1270 1270 / 5 = 254 = 424 + 254 = 678 + 254 = 932 + 254 = 1186 + 254 = 1440 424 678 932 1186 1440 4/4/2019

41 Mean and Standard Deviation
Straddle the mean 630 – 125 = = 755 505 – 250 = 255 = = 1255 255 505 755 1005 1255 4/4/2019


Download ppt "Map Generalization and Data Classification Gary Christopherson"

Similar presentations


Ads by Google