Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Six statistics are compared in a simulation study for their ability to identify geographical areas with a known excess incidence of a rare disease. The statistics are the standardized incidence ratio, the empirical Bayes method of Clayton and Kaldor, Poisson probability, a statistic based on the 'Breslow T' test (BT) and two statistics based on the 'Potthoff-Whittinghill' test (PW) for extra-Poisson variance. Two alternative processes of clustering are simulated in which high-risk locations could be caused by environmental sources or could be sites of microepidemics of an infectious agent contributing to a rare disease such as childhood leukaemia. The simulation processes use two parameters (proportion of cases found in clusters and mean cluster size) which are varied to embrace a variety of situations. Real and artificial data sets of small area populations are considered. The most extreme of the artificial sets has all areas of equal population size. The other data sets use the small census areas (municipalities) in Finland since these have extremely heterogeneous population size distribution. Subset selection allows examination of this variability. Receiver operator curve methodology is used to compare the efficacy of the statistics in identifying the cluster areas; statistics are compared for the proportion of true high-risk areas identified in the top 1 per cent and 10 per cent of ranked areas. One of the PW statistics performed consistently well under all circumstances, although the results for the BT statistic were marginally better when only the top 1 per cent of ranked areas was considered. The standardized incidence ratio performed consistently worst.

Original publication




Journal article


Stat Med

Publication Date





1501 - 1516


Bayes Theorem, Child, Child, Preschool, Cluster Analysis, Computer Simulation, Female, Finland, Humans, Incidence, Infant, Male, Poisson Distribution, Precursor Cell Lymphoblastic Leukemia-Lymphoma, ROC Curve