Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Natural language processing (NLP) holds promise to transform psychiatric research and practice. A pertinent example is the success of NLP in the automatic detection of speech disorganization in formal thought disorder (FTD). However, we lack an understanding of precisely what common NLP metrics measure and how they relate to theoretical accounts of FTD. We propose tackling these questions by using deep generative language models to simulate FTD-like narratives by perturbing computational parameters instantiating theory-based mechanisms of FTD. METHODS: We simulated FTD-like narratives using Generative-Pretrained-Transformer-2 by either increasing word selection stochasticity or limiting the model's memory span. We then examined the sensitivity of common NLP measures of derailment (semantic distance between consecutive words or sentences) and tangentiality (how quickly meaning drifts away from the topic) in detecting and dissociating the 2 underlying impairments. RESULTS: Both parameters led to narratives characterized by greater semantic distance between consecutive sentences. Conversely, semantic distance between words was increased by increasing stochasticity, but decreased by limiting memory span. An NLP measure of tangentiality was uniquely predicted by limited memory span. The effects of limited memory span were nonmonotonic in that forgetting the global context resulted in sentences that were semantically closer to their local, intermediate context. Finally, different methods for encoding the meaning of sentences varied dramatically in performance. CONCLUSIONS: This work validates a simulation-based approach as a valuable tool for hypothesis generation and mechanistic analysis of NLP markers in psychiatry. To facilitate dissemination of this approach, we accompany the paper with a hands-on Python tutorial.

Original publication




Journal article


Biol Psychiatry Cogn Neurosci Neuroimaging

Publication Date





1013 - 1023


Computational psychiatry, GPT-2, Natural language processing, Psychosis, Schizophrenia, Thought disorder, Humans, Natural Language Processing, Frontotemporal Dementia, Schizophrenia, Semantics, Cognition