Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cases and controls A case is an individual with a disease, whose location can be represented by a point on the map (red dot). In this table we examine.

Similar presentations


Presentation on theme: "Cases and controls A case is an individual with a disease, whose location can be represented by a point on the map (red dot). In this table we examine."— Presentation transcript:

1 Cases and controls A case is an individual with a disease, whose location can be represented by a point on the map (red dot). In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

2 Cases and controls A case is an individual with a disease, whose location can be represented by a point on the map (red dot). A control is a similar individual free from disease, whose location can also be plotted on the map (blue dots). Controls might be children born in the same year as cases, taken from a birth register. In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

3 Which test to use? Some tests are used when data are only available for cases. Examples include Ripley’s K and the variance-mean ratio. In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

4 Which test to use? Some tests are used when data are only available for cases. Examples include Ripley’s K and the variance-mean ratio. Other tests are used when data are available for both cases and controls, such as Cuzick and Edwards’ test. In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

5 Nearest neighbours In a set of points, every point has a nearest neighbour. Nearest neighbours are indicated for these points by the black arrows. In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

6 Cuzick and Edwards’ test counts the number of cases that have other cases (not controls) as their nearest neighbours. In this example, cases are shown in red and controls in blue: Cuzick and Edwards’ test result is: a = near neighb. = blue (0) b = near neighb. = red (1) c = near neighb. = red (1) = 2 Case a has a control as a nearest neighbour. a b Cases b and c have other cases as nearest neighbours c In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

7 For clustered cases the Cuzick and Edwards’ statistic is high, for example a cluster around a pollution source causing respiratory problems. For a distributed cases the test statistic is low, for examples smokers distributed amongst non smokers. If the Cuzick and Edwards’ statistic is higher than a randomly distributed data set then the data set a degree of clustering is present. In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

8 One problem with these clustering tests is scale: clusters may come in different sizes…. Cluster A is relatively small, Cluster B is larger. In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System A B

9 Cuzick and Edwards’ test (and other nearest neighbour analyses) can be expanded to consider more points than just the nearest neighbour. For example, the nearest two points to each case might be considered. We could continue and consider ever-increasing numbers of neighbouring points (the three nearest, four nearest, etc.), enabling us to detect clusters of different sizes and addressing the scale problem. 2nd nearest nearest In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

10 Testing for clustering in case data only A quadrat test can be used to test for clustering in point data of cases. This test is often used in ecology, rather than with health data. In a quadrat test, a grid is superimposed on the study area and the number of points in each grid square is calculated. 6 2 3 1 In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

11 The mean count per grid square is then calculated: ( )/4 = As is the variance: Standard deviation = 2.2 (1 decimal place) The ratio of standard deviation to mean = 2.2/3 = 0.7 6 2 3 1 In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

12 For the blue population: Mean = 3 Standard deviation = 0
For the blue population: Mean = 3 Standard deviation = 0.8 Ratio SD:mean = 0.8/3 = 0.3 The blue population is more evenly distributed and this is reflected in its low quadrat ratio 6 2 3 1 4 3 2 In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

13 Ripley’s K works by counting the number of nearby cases lying within a certain distance of each case. Each point is considered in turn and a running total is kept of the number of nearby cases. The process continues until neighbouring cases have been identified for all points. 1 2 In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

14 Ripley’s K statistic is calculated by adding up the total number of neighbours found in all circles (let’s assume here this is 1600). This is divided by the density of points squared and area. In this case, we might have 2 points per km2 and a study site with an area of 100km2. Ripley’s K statistic measures the number of points you would expect to find within a given radius of an arbitrarily chosen point on the map. In this example, we’d expect to find 4 points within our chosen radius. K = (total no. of neighbours) (point density2.study area) K = ( …etc. = 1600) (point density2.study area) K = (1600) = 1600 / 400 = 4 (22.100)

15 A scaled version of this statistic, L, is used to test for clustering
A scaled version of this statistic, L, is used to test for clustering. L is calculated by dividing by pi and taking the square root. L is 1 for a random point pattern, greater than 1 for a clustered pattern, and less than 1 for a regular pattern. In this example, our data appear clustered. The formula actually used for K is slightly more complicated than described here. The more complex formula accounts for points at the edge of the map. L = √(K/π) = √ (4/ 3.1) = 1.1

16 The calculation can be repeated for larger and larger distances to detect clusters of different sizes. 4 In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System

17 inter-point distance (radius of circles in earlier slides)
We can plot out the value of the Ripley’s K statistic for different sizes of distance radius and compare the graph to what might be expected if cases were located at random. 10 L (scaled version of Ripley’s K) In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System 10 inter-point distance (radius of circles in earlier slides)

18 inter-point distance (radius of circles in earlier slides)
A line with Y > X would indicate that the points are clustered together This line (y = x) is what we would expect if the points were randomly distributed A line with X > Y would indicate that the points are evenly distributed L (scaled version of Ripley’s K) In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System inter-point distance (radius of circles in earlier slides)

19 X = inter-point distance (radius of circles in earlier slides)
We can test for significance by generating many random point patterns, then calculating and plotting K for these random patterns. In this example, K is high, but not outside the range that could be expected by chance. Observed values for K Y = value of scaled version of Ripley’s K In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System Min and max K values from many random simulations X = inter-point distance (radius of circles in earlier slides)

20 Cuzick and Edwards’ test
In summary, we can divide our global clustering tests into those that address just case data, or those that address both case and control data. There are many examples of each, from which we have examined just three. Global clustering tests Case data Case and control data Ripley’s K Variance-mean ratio …many other tests Cuzick and Edwards’ test …many other tests In this table we examine male death rates for Scotland in 1999 and standardize by age relative to male death rates in the whole of the United Kingdom for the same period. The data are drawn from the World Health Organization Statistical Information System


Download ppt "Cases and controls A case is an individual with a disease, whose location can be represented by a point on the map (red dot). In this table we examine."

Similar presentations


Ads by Google