A psychometric evaluation of the 12-item EPQ-R neuroticism scale in 384,183 UK Biobank participants using item response theory (IRT)
Bauermeister S., Gallacher J.
<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Neuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability; worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism through a self-report scale remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale (S. B. Eysenck, Eysenck, & Barrett, 1985) using Item Response Theory (IRT).</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>The 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 384,183 UK Biobank participants aged 39 to 73 years. Post-estimation mathematical assumptions were computed and all analyses were processed in STATA SE 15.1 (StataCorp, 2018) on the Dementias Platform UK (DPUK) Data Portal (Bauermeister et al., Preprint).</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>A plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.27. Difficulty values for individual item θ scores ranged from -0.14 to 1.25. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 8-item scale is more efficient and reliable with information ranging from 1.43 to 2.36 and strong scalability (H=0.43-0.53). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across both all versions of the scale.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>Across 384,183 UK Biobank participants the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former and, selective item deletions and validation of new items to increase scale informativeness and reliability.</jats:p></jats:sec>