Different results when preprocessing data together vs. separately using nilearn's NiftiMasker

I want to compare male and female sMRI images using massive independent t-tests. For each of the two groups I have an array containing paths to the respective Nifti-Files. I am using NiftiMasker from nilearn to preprocess my data. My standard pipeline was:

masker = NiftiMasker(standardize=True,
                     smoothing_fwhm=8,
                     memory=niftimasker_cache,
                     mask_strategy='template',
                     target_affine=target_affine
                     )

My first idea was to process male and female data separately (i.e. using the same pipeline both for the male and female data). I thought that it wouldn’t make sense to standardize male and female data together (see explanation in 2.)). However, this separate preprocessing gives me strange results when calculating the t-tests because all p-values are just ones then.

However, if I preprocess male and female data together, everything seems to work fine (my p-values have variance).

Does anyone has an explanation for this? Why does separate preprocessing lead to a p-value array with no variance?

I also have some more general questions or assumptions (correct me if I am wrong about any of these assumptions):

1.) I assumed that my pipeline steps in NiftiMasker are all separate from each other, meaning that all those steps (z-standardizing, smoothing, reslicing to target affine, masking according to MNI152 space do not interact with each other (you could change the order and it wouldn’t make a difference). Is this assumption true?

2.) I also expect that it wouldn’t make a difference if I process my groups together or separately, because those methods are not data-driven (e.g. reslicing to target affine, masking to MNI152 work on each image, there is no ‘learning’ from the data). Exception: I did expect a difference for z-standardizing, because it makes a difference if you calculate the total mean and variance for men and women together or if you z-standardize the groups separately according to the mean and variance of the male and female data. However, as I said above, my logic was to stick to the separate preprocessing. Ironically this method gave me the strange results.

Is this assumption true? I am also exactly sure what happens in mask_strategy = 'template' and if there are maybe any data-driven steps I am not aware of. For example, I know that mask_strategy = 'background' is data-driven, because it “guesses the value of the background from the border of the image” (see documentation.) You can directly see this effect when you preprocess men and female data separately because the output arrays then do not have the same shape.

Any help is greatly appreciated!

I think if you center and standardize both groups separately they will have the exact same mean and variance so there will be no difference and the t-test will yield a p-value of 1? but it would help to see the whole pipeline to be sure we understand what you are doing.

regarding the masking, ‘template’ strategy means using the MNI152 mask, so indeed the mask will always be the same and does not depend on the data