Data-driven methods distort optimal cutoffs and accuracy estimates of depression screening tools: a simulation study using individual participant data
Bhandari PM., Levis B., Neupane D., Patten SB., Shrier I., Thombs BD., Benedetti A., Sun Y., He C., Rice DB., Krishnan A., Wu Y., Azar M., Sanchez TA., Chiovitti MJ., Saadat N., Riehm KE., Imran M., Negeri Z., Boruff JT., Cuijpers P., Gilbody S., Ioannidis JPA., Kloda LA., Ziegelstein RC., Comeau L., Mitchell ND., Tonelli M., Vigod SN., Aceti F., Alvarado R., Alvarado-Esquivel C., Bakare MO., Barnes J., Bavle AD., Beck CT., Bindt C., Boyce PM., Bunevicius A., Castro e Couto T., Chaudron LH., Correa H., de Figueiredo FP., Eapen V., Favez N., Felice E., Fernandes M., Figueiredo B., Fisher JRW., Garcia-Esteve L., Giardinelli L., Helle N., Howard LM., Khalifa DS., Kohlhoff J., Kozinszky Z., Kusminskas L., Lelli L., Leonardou AA., Maes M., Meuti V., Radoš SN., García PN., Nishi D., Luwa E-Andjafono DO., Pawlby SJ., Quispel C., Robertson-Blackmore E., Rochat TJ., Rowe HJ., Sharp DJ., Siu BWM., Skalkidou A., Stein A., Stewart RC., Su KP., Sundström-Poromaa I., Tadinac M., Tandon SD., Tendais I., Thiagayson P., Töreki A., Torres-Giménez A., Tran TD., Trevillion K., Turner K., Vega-Dienstmaier JM., Wynter K., Yonkers KA.
Objective: To evaluate, across multiple sample sizes, the degree that data-driven methods result in (1) optimal cutoffs different from population optimal cutoff and (2) bias in accuracy estimates. Study design and setting: A total of 1,000 samples of sample size 100, 200, 500 and 1,000 each were randomly drawn to simulate studies of different sample sizes from a database (n = 13,255) synthesized to assess Edinburgh Postnatal Depression Scale (EPDS) screening accuracy. Optimal cutoffs were selected by maximizing Youden's J (sensitivity+specificity–1). Optimal cutoffs and accuracy estimates in simulated samples were compared to population values. Results: Optimal cutoffs in simulated samples ranged from ≥ 5 to ≥ 17 for n = 100, ≥ 6 to ≥ 16 for n = 200, ≥ 6 to ≥ 14 for n = 500, and ≥ 8 to ≥ 13 for n = 1,000. Percentage of simulated samples identifying the population optimal cutoff (≥ 11) was 30% for n = 100, 35% for n = 200, 53% for n = 500, and 71% for n = 1,000. Mean overestimation of sensitivity and underestimation of specificity were 6.5 percentage point (pp) and -1.3 pp for n = 100, 4.2 pp and -1.1 pp for n = 200, 1.8 pp and -1.0 pp for n = 500, and 1.4 pp and -1.0 pp for n = 1,000. Conclusions: Small accuracy studies may identify inaccurate optimal cutoff and overstate accuracy estimates with data-driven methods.