I’m working on an fMRI dataset and have a question about how to preprocess parametric modulators before including them in a first-level GLM (using FSL FEAT).
For my task-related parametric regressors (e.g., trial-by-trial ratings such as perceived price or perceived sweetness), I z-scored each regressor within subject before modeling. I did this to normalize across subjects and ensure the regressors were on comparable scales, especially since their original ranges differed (e.g., one ranged from 0–5, another from 1–10).
I’ve seen many tutorials and papers recommend demeaning parametric modulators to reduce collinearity with the intercept, but not all mention z-scoring. This makes me wonder:
Is z-scoring actually necessary, or is demeaning sufficient?
Does rescaling by the standard deviation (i.e., z-scoring) change the interpretation of the beta weights or influence model estimation in important ways?
When is z-scoring preferred over just demeaning, especially when multiple modulators with different scales are included in the same GLM?
I would greatly appreciate any clarification or references on best practices here. Thanks in advance!
In theory, neither of these are needed, since both (subtraction, division) are linear operations that won’t actually change the significance of linear relationships. However, on a practical level, statistical software just tends to act nicer when you z-score (e.g. models converge better).
The model significance and t-stats should not change. The beta weights will change if you z-score. The interpretation changes that the beta coefficent is now in terms of standard deviations of the predictor, instead of raw units of the predictor.
As long as you do not care about beta coefficient scales, I prefer to z-score always as a rule of thumb. Even if you do care about beta coefficient scales, you can just multiply the betas by the standard deviation to get them back to raw units of your predictor. I have noticed z-scoring particularly helps when you have predictors of really small and really large magnitude.
Thank you so much — this is really clear and helpful!
Just to check my understanding:
Demeaning is primarily done to reduce collinearity with the constant term in the model, so that the parametric regressor can more independently explain variance in the BOLD signal.
Z-scoring goes one step further by rescaling the regressor to unit variance, which is especially helpful when multiple regressors differ greatly in magnitude.
Let me know if I’ve misunderstood anything — otherwise, I really appreciate the explanation!
I think most of what must be say has already been said, but I’ll add an extra comment about the effect of demeaning: it changes the potential interpretation of the intercept. In fact, one should not speak of “intercept” if non-constant regressors are demeaned. The constant term (beta0) becomes the mean of the signal if all non-constant regressors are demeaned.
I also like to think of it as the difference there is between the “baseline” (=essentially the intercept) and the mean.
Note that in many cases we don’t care about the beta0 values (but not always) so this might be irrelevant in your context.