I am confused by grand mean scaling in fMRI analysis (i.e., linear model of task-based design) and hoping that someone can clarify and/or point me to a reference. Grand mean scaling is the practice of dividing the timeseries of all voxels within a run by a single number, where that single number is the average of those timeseries. The result is then often multiplied by a number like 100. The main reason for grand mean scaling that I am familiar with is that, since the raw signal varies from run to run, grand mean scaling avoids bias in comparisons of regression coefficients across runs by standardizing the units of those coefficients across runs. For example, if run two has a ‘generally higher’ signal than run one, grand mean scaling both of them will allow the regression coefficients from models fit to each run to be comparable.

However, isn’t it the case that this only makes sense when ‘generally higher’ means something very specific? It can’t just mean that the average signal is higher, because if that were the case then including a run-wise intercept would account for the differences. Instead, it seems like grand mean scaling carries the assumption that both the magnitude of the signal (e.g., the BOLD response to a stimulus) and the noise vary linearly with the mean of the signal.

So, I have two questions

- Is it correct that the validity of grand mean scaling rests on the assumption that both signal and noise are linearly related to the average BOLD signal?
- Is there a reference for 1?

Thanks