I am working with rs-fMRI data collected from four different sites, where the scanning parameters are identical across three of the sites. I want to control for site effects during the 2nd-level analysis by including site as a 2nd-level covariate in CONN toolbox.
To do this, I was planning to create separate 0/1 dummy variables for each site. For example:
Site1: [1 1 0 0 0 0 0 0]
Site2: [0 0 1 1 0 0 0 0]
Site3: [0 0 0 0 1 1 0 0]
Site4: [0 0 0 0 0 0 1 1]
Is this the correct approach for setting up site covariates in CONN? If not, could you suggest a better way to control for site effects in 2nd-level analysis?
Yes, that is perfectly correct. Then, in your second-level analysis simply include those SITE* effects in your model and enter 0’s in their corresponding spots within the “between-subjects contrast” vector. For example, a model with the effects GroupA, GroupB, Site1, Site2, Site3, and Site4, and a between-subjects contrast vector [-1 1 0 0 0 0] will evaluate group differences between GroupA and GroupB while controlling for potential confounding effects of site.
As a related note, for certain models where the constant term (e.g. average connectivity across all subjects) is part of the effect of interest that you are trying to evaluate (e.g. if you want to evaluate the adjusted means separately within each group in the above model) you may want to center the Site* covariates so that the average/constant term is not removed as part of the SITE control strategy (only the differences between SITES are removed). For example, in your case that would mean using instead the centered variables (subtracting the average value of each covariate):
Site1: [3/4 3/4 -1/4 -1/4 -1/4 -1/4 -1/4 -1/4]
Site2: [-1/4 -1/4 3/4 3/4 -1/4 -1/4 -1/4 -1/4]
Site3: [-1/4 -1/4 -1/4 -1/4 3/4 3/4 -1/4 -1/4]
Site4: [-1/4 -1/4 -1/4 -1/4 -1/4 -1/4 3/4 3/4]
In the example model above (looking at group differences), centering the SITE variables does not have an effect (the results are identical either way), but in other models/tests it does make a difference, so in general it is often recommended to center all control variables when in doubt.