Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

PURPOSE: The contribution of social and behavioural factors in the development of mental health conditions and treatment effectiveness is widely supported, yet there are weak population level data sources on social and behavioural determinants of mental health. Enriching these data gaps will be crucial to accelerating precision medicine. Some have suggested the broader use of electronic health records (EHR) as a source of non-clinical determinants, although social and behavioural information are not systematically collected metrics in EHRs, internationally. OBJECTIVE: In this commentary, we highlight the nature and quality of key available structured and unstructured social and behavioural data using a case example of value counts from secondary mental health data available in the UK from the UK Clinical Record Interactive Search (CRIS) database; highlight the methodological challenges in the use of such data; and possible solutions and opportunities involving the use of natural language processing (NLP) of unstructured EHR text. CONCLUSIONS: Most structured non-clinical data fields within secondary care mental health EHR data have too much missing data for adequate use. The utility of other non-clinical fields reported semi-consistently (e.g., ethnicity and marital status) is entirely dependent on treating them appropriately in analyses, quantifying the many reasons behind missingness in consideration of selection biases. Advancements in NLP offer new opportunities in the exploitation of unstructured text from secondary care EHR data particularly given that clinical notes and attachments are available in large volumes of patients and are more routinely completed by clinicians. Tackling ways to re-use, harmonize, and improve our existing and future secondary care mental health data, leveraging advanced analytics such as NLP is worth the effort in an attempt to fill the data gap on social and behavioural contributors to mental health conditions and will be necessary to fulfill all of the domains needed to inform personalized interventions.

Original publication




Journal article


J Biomed Inform

Publication Date





Data quality, Electronic health records, Mental health, Natural language processing, Precision medicine, Selection bias