A comparison of some simple methods to identify geographical areas with excess incidence of a rare disease such as childhood leukaemia.
Wray NR., Alexander FE., Muirhead CR., Pukkala E., Schmidtmann I., Stiller C.
Six statistics are compared in a simulation study for their ability to identify geographical areas with a known excess incidence of a rare disease. The statistics are the standardized incidence ratio, the empirical Bayes method of Clayton and Kaldor, Poisson probability, a statistic based on the 'Breslow T' test (BT) and two statistics based on the 'Potthoff-Whittinghill' test (PW) for extra-Poisson variance. Two alternative processes of clustering are simulated in which high-risk locations could be caused by environmental sources or could be sites of microepidemics of an infectious agent contributing to a rare disease such as childhood leukaemia. The simulation processes use two parameters (proportion of cases found in clusters and mean cluster size) which are varied to embrace a variety of situations. Real and artificial data sets of small area populations are considered. The most extreme of the artificial sets has all areas of equal population size. The other data sets use the small census areas (municipalities) in Finland since these have extremely heterogeneous population size distribution. Subset selection allows examination of this variability. Receiver operator curve methodology is used to compare the efficacy of the statistics in identifying the cluster areas; statistics are compared for the proportion of true high-risk areas identified in the top 1 per cent and 10 per cent of ranked areas. One of the PW statistics performed consistently well under all circumstances, although the results for the BT statistic were marginally better when only the top 1 per cent of ranked areas was considered. The standardized incidence ratio performed consistently worst.