Toward Generalizable and Transdiagnostic Tools for Psychosis Prediction: An Independent Validation and Improvement of the NAPLS-2 Risk Calculator in the Multisite PRONIA Cohort
Koutsouleris N., Worthington M., Dwyer DB., Kambeitz-Ilankovic L., Sanfelici R., Fusar-Poli P., Rosen M., Ruhrmann S., Anticevic A., Addington J., Perkins DO., Bearden CE., Cornblatt BA., Cadenhead KS., Mathalon DH., McGlashan T., Seidman L., Tsuang M., Walker EF., Woods SW., Falkai P., Lencer R., Bertolino A., Kambeitz J., Schultze-Lutter F., Meisenzahl E., Salokangas RKR., Hietala J., Brambilla P., Upthegrove R., Borgwardt S., Wood S., Gur RE., McGuire P., Cannon TD.
Background: Transition to psychosis is among the most adverse outcomes of clinical high-risk (CHR) syndromes encompassing ultra-high risk (UHR) and basic symptom states. Clinical risk calculators may facilitate an early and individualized interception of psychosis, but their real-world implementation requires thorough validation across diverse risk populations, including young patients with depressive syndromes. Methods: We validated the previously described NAPLS-2 (North American Prodrome Longitudinal Study 2) calculator in 334 patients (26 with transition to psychosis) with CHR or recent-onset depression (ROD) drawn from the multisite European PRONIA (Personalised Prognostic Tools for Early Psychosis Management) study. Patients were categorized into three risk enrichment levels, ranging from UHR, over CHR, to a broad-risk population comprising patients with CHR or ROD (CHR|ROD). We assessed how risk enrichment and different predictive algorithms influenced prognostic performance using reciprocal external validation. Results: After calibration, the NAPLS-2 model predicted psychosis with a balanced accuracy (BAC) (sensitivity, specificity) of 68% (73%, 63%) in the PRONIA-UHR cohort, 67% (74%, 60%) in the CHR cohort, and 70% (73%, 66%) in patients with CHR|ROD. Multiple model derivation in PRONIA–CHR|ROD and validation in NAPLS-2–UHR patients confirmed that broader risk definitions produced more accurate risk calculators (CHR|ROD-based vs. UHR-based performance: 67% [68%, 66%] vs. 58% [61%, 56%]). Support vector machines were superior in CHR|ROD (BAC = 71%), while ridge logistic regression and support vector machines performed similarly in CHR (BAC = 67%) and UHR cohorts (BAC = 65%). Attenuated psychotic symptoms predicted psychosis across risk levels, while younger age and reduced processing speed became increasingly relevant for broader risk cohorts. Conclusions: Clinical-neurocognitive machine learning models operating in young patients with affective and CHR syndromes facilitate a more precise and generalizable prediction of psychosis. Future studies should investigate their therapeutic utility in large-scale clinical trials.