Mapping the circulating proteome across neurodegeneration: A harmonized, consortium-scale framework for uncovering molecular pathophysiology.
Finney CA., An L., Winchester LM., Vogel J., Wilkins HM., Burns JM., Swerdlow RH., Slawson C., Rothstein JD., Global Neurodegeneration Proteomics Consortium (GNPC) ., Lutz MW., Saloner R., Shvetcov A.
Large-scale plasma proteomics offers unprecedented opportunities to investigate the systemic biology of neurodegeneration, yet technical heterogeneity, site-specific artifacts, and clinical confounding remain major barriers to reproducible discovery. Leveraging data from 13,733 individuals with Alzheimer's disease (AD), Parkinson's disease (PD), frontotemporal dementia (FTD), Parkinson's disease dementia (PDD), amyotrophic lateral sclerosis (ALS), and non-impaired controls in the Global Neurodegeneration Proteomics Consortium (GNPC), we present a scalable and generalizable analytical framework for harmonizing and interpreting consortium-scale proteomic datasets. Using a high-dimensional perturbation framework, we systematically benchmark five commonly used batch correction methods across a range of realistic confounding structures, including site-disease imbalance, nonlinear effects, and heteroskedasticity. Empirical Bayes modelling via limma consistently emerged as the most robust method, optimally balancing removal of site-related technical variance with retention of disease-relevant biological signal. On this harmonized foundation, we resolve neurodegenerative disease plasma signatures, including a shared immune-metabolic axis in AD and PD, neuromuscular disruption in ALS, and proteostatic imbalance in PD. Tissue and cell-type enrichment highlight widespread immune-endocrine involvement in AD and hematopoietic activation in PD. Demographically matched analyses nominate distinct, candidate biomarkers across diseases, including lipid, redox, and complement factors in AD, lysosomal and cytoskeletal proteins in PD, and muscle-derived markers in ALS. This study establishes a scalable analytical framework for integrating real-world proteomic data and provides a disease-resolved catalogue of circulating signatures to inform biomarker development and targeted intervention across neurodegenerative diseases.
