I wrote a function that standardizes an image array ‘cluster-wise’ based on extracted regions. I was surprised to find out that the get_fdata()
method seems to only return a reference to the image array and not a real copy of the image array. I found out about this because after running my function my original image array was also manipulated. I find this behavior a little bit unintuitive since get_fdata
or get_data
intuitively sound like you would create a copy of the data array. If I would write a get_data
function my first idea would probably be to write it with copying as the default and referencing as a special case (just like in pandas, where copying is always standard unless you provide inplace = True
). For a real copy you apparently have to use copy_data_array = np.asarray(img.dataobj)
which is not as straightforward to read. Does anyone know why copying the image data array is not standard in get_fdata()
?
Can you clarify what the source of your image data is? Is it being created in memory or loaded from a file? These will have slightly different behaviors.
Assuming that you’ve loaded from file, then the way this actually works is that get_fdata()
does return a copy of the image data, but that copy is cached as an _fdata_cache
object. Further calls will just return _fdata_cache
, so if you modify the returned value, that will persist.
If you don’t want to use caching, you can say:
img.get_fdata(caching="unchanged")
If it’s already cached, then you can uncache:
img.uncache()
If you’ve created the file yourself, then yeah, you’re probably going to need to make a copy, as nibabel will only make a copy if you’re changing the dtype
, in an effort to avoid multiplying your memory consumption.
The image is loaded from disk. I then run the following function with the image and a mask image as arguments:
def cluster_standardize(stat_img,mask_img,ignore_zeros=True):
region_imgs,indices = connected_regions(stat_img,
extract_type='connected_components',
min_region_size=1,
smoothing_fwhm=None)
stat_img_data = np.asarray(stat_img.dataobj)
for idx,region_img in enumerate(iter_img(region_imgs)):
region_img_indices = np.where(region_img.get_fdata() != 0 )
# get the size of each region
region_size = len(region_img_indices[0])
stat_img_data_region_values = stat_img_data[region_img_indices]
# if region consists of more than one voxel calculate the mean and
# standard deviation from all voxels within that region
if region_size > 1:
stat_img_data_region_mean = stat_img_data_region_values.mean()
stat_img_data_region_std = stat_img_data_region_values.std()
# if region consists of only one voxel, standardize that voxel
# according to mean and standard deviation of whole brain
else:
stat_img_data_masked = apply_mask(stat_img,mask_img)
if ignore_zeros is True:
stat_img_data_masked = stat_img_data_masked[np.nonzero(stat_img_data_masked)]
stat_img_data_region_mean = stat_img_data_masked.mean()
stat_img_data_region_std = stat_img_data_masked.std()
# z-standardize region values
stat_img_data_region_values_std = ((stat_img_data_region_values - stat_img_data_region_mean) / stat_img_data_region_std)
# replace original values with z-standardized values
stat_img_data[region_img_indices] = stat_img_data_region_values_std
return new_img_like(stat_img,stat_img_data)
The critical line is:
stat_img_data = np.asarray(stat_img.dataobj)
The error happened when this line was:
stat_img_data = stat_img.get_fdata()
When using .get_fdata()
inside the function the original data array from the input image was also changed.
I see. Yes, that makes sense that would cause you issues. You can definitely use the strategies for manipulating the cache mentioned above. I might also consider creating a new data object, rather than modifying the contents of the get_fdata()
result:
new_img_data = np.zeros_like(stat_img.get_fdata())
...
new_img_data[region_img_indices] = stat_img_data_region_values_std
This feels cleaner to me, as there’s less chance of original values remaining in the output, for example, if the set of non-zero voxels in the two images differs.