Hi,
From what I understand nibabel.load uses lazy loading by default. Is there a way to disable this behaviour?
As for the reason: I need to set up a workflow that passes data between processing steps in memory only, without saving anything to disk. Since several tools (e.g. ANTs) save data to disk always (sometimes in a bit of a sneaky way in the tmp directory) this is not exactly possible. Instead, I need to allow for these intermediate data to be saved to disk, then load them back into memory and delete the files on disk. However, with lazy loading I cannot just delete the file after loading, correct? Hence the question to disable it. Also happy for any other solution that I might be overlooking.
Thanks!
Julia
You can disable memmaps and populate the cache:
# float64 because that's the default
# You could set another default
def load_and_cache(fname, dtype=np.float64):
# Turn off mmap so cached arrays are fully loaded into memory
img = nb.load(fname, mmap=False)
# get_fdata() will cache and return the same array
# whenever you use the same dtype.
# Changing dtypes will reset the cache.
img.get_fdata(dtype=dtype)
return img
If you don’t want to play around with caching or limit yourself to files that are sensibly interpreted as floating point:
def load_and_copy(fname):
img = nb.load(fname)
# Fully load the data array and return a fully in-memory image
return img.__class__(np.asarray(img.dataobj), img.affine, img.header)
Excellent, thank you so much!
Is there any disadvantage to the second option? Otherwise it seems to be the safer bet, right?
There are a few pieces of metadata like the filename that are stored in an image loaded from disk, but as long as you don’t care (and I never do) the second option has the least magic. I treat the image objects (the literal Nifti1Image
) as disposable and create new images all the time.
The only time when it isn’t a safe bet is when you want to leave the data on disk until the last possible moment because of memory concerns.
Perfect, thanks for the quick answer and explanations. This is exactly what I need.
1 Like
For completeness, it’s img.dataobj in the last line, not np.dataobj, correct? Would I not still need to disable mmap? Is img.dataobj definitely fully loaded here?
You’re right, it’s img.dataobj
(I’ll update that).
Using np.asarray()
(as opposed to np.array()
or np.asanyarray()
) ensures that you always get a np.ndarray
object, whether it receives an np.ndarray
or a np.memmap
. So it doesn’t matter whether you disable mmap
or not.
Great, makes sense. Thanks so much