I’m thinking about denoising some resting-state data using acompcor in a similar way to model 9 in Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity - PubMed . This model used the first 5 WM components and the first 5 CSF components, where fmriprep doesn’t distinguish between WM and CSF in column names, it just numbers them based on variance accounted for (although which mask the component came from can be found in the .json file).
This has me wondering, is it preferred to used an equal number of WM and CSF compcor components, or is it better to just use the first n components, regardless of their origin. Looking at my first subject, their first 9 components are from the CSF mask. Using the first 5 WM and 5 CSF (rather than the first 10 components), I am leaving some “variance accounted” on the table. On the other hand, the paper linked above tested 5 WM and 5 CSF, so that method may be more of a known quantity, and maybe there are theoretical reasons to want representation from both tissue types.
I’m curious to hear opinions. Do you prefer taking an equal number from WM and CSF, or just the first n components, regardless?
Edit - I initially said you can find what each component is in the .tsv file, I meant .json
If you look at the confounds JSON produced by fMRIprep , you can find which mask (WM, CSF, or combined mask) the compcor component is associated with.
I think you could justify regressing out either the same number of components for WM and CSF or regressing out components such that X% variance is explained. I don’t think there is anything special about the number 5, per se, but it is not very strict and allows for more temporal degrees of freedom. But if you have long and/or heavily sampled acquisitions, you can probably get away more with regressing out 50% variance explained (which fMRIPrep outputs by default). However, there is also some dependence on how you are planning on denoising your data. For example, if you’re working with a high-motion cohort (young, clinical, etc.) and expect several volumes will be scrubbed for motion artifacts, you might want to be more conservative in compcor.
For this reason, I do not have a set preference, and will make a decision according to the data I’m working with and how similar data have been treated in past publications. Also, it depends on what kind of analysis I am doing. For example, If I am looking for some network effect, I might not want to choose a strategy that introduces distance-dependent artifacts. Hope this helps!
Oh, I’ve just noticed in my .json file fmriprep is counting cumulative variance explained for CSF separately from WM and the combined mask. So the first columns are components using CSF that capture 50% variance, then the next columns are WM, then combined. So you wouldn’t ever want to just take the first 10 columns as I initially was thinking.
Thanks for the comment, I may not have caught that otherwise.
Yup! It’s sneaky like that. So if you’re doing “use X number of components approach” you need to pay attention that you only do “Use X combined mask components” or “Use X/2 CSF and WM components”.