Research-ready data for multi-cohort analyses: The Dementias Platform UK (DPUK) C-Surv data model
Bauermeister S., Bauermeister JR., Bridgman R., Felici C., Newbury M., North L., Orton C., Squires E., Thompson S., Young S., Gallacher JEJ.
Abstract Research-ready data (that curated to a defined standard) increases scientific opportunity and rigour by integrating the data environment. The development of research platforms has highlighted the value of research-ready data, particularly for multi-cohort analyses. Following user consultation, a standard data model (C-Surv), optimised for data discovery, was developed using data from 12 population and clinical cohort studies. The model uses a four-tier nested structure based on 18 data themes and 137 domains selected according to user behaviour or technology. Standard variable naming conventions are applied to uniquely identify variables within the context of longitudinal studies. The model was used to develop a harmonised dataset for 11 cohorts. This dataset populated the Cohort Explorer data discovery tool for assessing the feasibility of an analysis prior to making a data access request. It was concluded that developing and applying a standard data model (C-Surv) for research cohort data is feasible and useful.