Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Few at-risk adults are identified by specialized services prior to the development of a first episode of psychosis. A transdiagnostic risk calculator, predicting psychosis using electronic health record (EHR) data, was developed in London, UK to identify patients at risk, using structured data and 14 natural language processing (NLP)-derived symptom and substance use concepts. We report the adaptation and internal validation of this risk calculator in a Southeast England region. METHODS: In a retrospective cohort study using EHR patient notes we identified individuals accessing mental healthcare in Southeast England (Nov-1992 to Jan-2023) who received a primary diagnosis of a non-psychotic or non-organic mental disorder. We developed new machine-learning NLP algorithms for diagnosis, symptom and substance use concepts by fine-tuning existing open-source transformer models. Baseline and outcome coded diagnoses were supplemented with NLP-derived diagnosis data. Cox regression was used to predict psychosis and prior weights were applied; discrimination (Harrell's C) was assessed. RESULTS: Nearly all NLP concepts achieved an F1-measure of accuracy above 0.8 following development. In an analysis sample of 63,922 patients with complete data, the risk calculator had acceptable but lower accuracy in Southeast England (Harrell's C 0.71) compared to the London benchmark (Harrell's C 0.85). CONCLUSIONS: The risk calculator performed similarly in Southeast England to other external validation studies, discriminating acceptably, suggesting that this calculator may be adapted successfully for new patient populations, services and geographic areas. Differences in accuracy may be due to different cultures of data capture, different NLP approaches, or differences in the patient cohort.

Original publication

DOI

10.3389/fpsyt.2025.1584719

Type

Journal article

Journal

Front Psychiatry

Publication Date

2025

Volume

16

Keywords

at risk mental state, electronic health records, mental health care, natural language processing, psychosis, risk assessment