A psychometric evaluation of the 12-item EPQ-R neuroticism scale in 502,591 UK Biobank participants using item response theory (IRT)
Bauermeister S., Gallacher J.
<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Neuroticism has been described as a broad and pervasive personality dimension or ‘heterogeneous’ trait measuring components of mood instability such as worry; anxiety; irritability; moodiness; self-consciousness; sadness and irritabililty. Consistent with depression and anxiety-related disorders, increased neuroticism places an individual vulnerable for other unipolar and bipolar mood disorders. However, the measurement of neuroticism remains a challenge. Our aim was to identify psychometrically efficient items and inform the inclusion of redundant items across the 12-item EPQ-R Neuroticism scale using Item Response Theory (IRT).</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>The 12-item binary EPQ-R Neuroticism scale was evaluated by estimating a two-parameter (2-PL) IRT model on data from 502,591 UK Biobank participants aged 37 to 73 years (<jats:italic>M</jats:italic> = 56.53 years; <jats:italic>SD</jats:italic> = 8.05), 54% female. Models were run listwise (n= 401,648) and post-estimation mathematical assumptions were computed. All analyses were conducted in STATA 16 SE on the Dementias Platform UK (DPUK) Data Portal.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>A plot of θ values (Item Information functions) showed that most items clustered around the mid-range where discrimination values ranged from 1.34 to 2.28. Difficulty values for individual item θ scores ranged from −0.13 to 1.41. A Mokken analysis suggested a weak to medium level of monotonicity between the items, no items reach strong scalability (H=0.35-0.47). Systematic item deletions and rescaling found that an 7-item scale is more efficient and with information (discrimination) ranging from 1.56 to 2.57 and stronger range of scalability (H=0.47-0.52). A 3-item scale is highly discriminatory but offers a narrow range of person ability (difficulty). A logistic regression differential item function (DIF) analysis exposed significant gender item bias functioning uniformly across all versions of the scale.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>Across 401,648 UK Biobank participants, the 12-item EPQ-R neuroticism scale exhibited psychometric inefficiency with poor discrimination at the extremes of the scale-range. High and low scores are relatively poorly represented and uninformative suggesting that high neuroticism scores derived from the EPQ-R are a function of cumulative mid-range values. The scale also shows evidence of gender item bias and future scale development should consider the former along with item deletions.</jats:p></jats:sec>