Fitting many files with Niftimasker, partial fit?

over9k · October 28, 2020, 6:58am

Hello,

I am attempting to fit a lot of files using Niftimasker, over 10000 of them. I am doing something like this:

filenames # 10000 filenames such as file_1.nii, file_2.nii, etc.
masker = NiftiMasker()
masker.fit(filenames)

This should create a mask using all 10000 of the images. However, I am having memory issues with this. If I do a much smaller subset of the files, for example 500, I have no issues.

Is there a way to get around this? Could I do something like a partial fit (or calling fit multiple times) for different subsets for the data? I don’t see anything like this in the doc: https://nilearn.github.io/modules/generated/nilearn.input_data.NiftiMasker.html

Any help is appreciated.

ofgulban · October 28, 2020, 8:27am

Hi @over9k,

If I understand correctly your fit operation is done per voxel basis. And if you are getting a memory error when doing fitting, maybe a quick solution would be to divide your mask into multiple pieces (e.g. one mask per hemisphere) instead of loading smaller number of files to call fit multiple times with different masks. You can later merge the results in space.

However if you are getting a memory error before -when masking or loading your 10000 files- , maybe you can check the precision your nifti files are loaded into memory. If it makes sense numerically, you can reduce your precision (e.g. float instead of doubles, or int instead of floats …). Hope it helps.

over9k · October 28, 2020, 9:25pm

I hadn’t thought of that, thanks, @ofgulban .

Is it possible to have multiple mask files, however? For example, one mask for the first 2000 files, another mask for the second 2000 files and so on. And then do a final fit with the five masks that I would have?

Thanks

over9k · October 30, 2020, 7:50am

I seem to have found a solution (cc @ofgulban) in case anyone else needs this.

I created separate mask images for subsets of the dataset using NiftiMasker. I then used nilearn.masking.intersect_masks to join them all together in union with a threshold of 0. Can also do intersection with a threshold of 1. Things are much more memory efficient this way.

ofgulban · November 1, 2020, 10:53am

Happy to hear this @over9k, good luck with your analyses.