Consequence of using single (float32) or double (float64) precision for saving interpolated data


When saving interpolated data (after linear or non-linear warps but also as well as internal representation) we often face the decision if single (float32) or double (float64) precision should be used. I was wondering if there are any comparisons using MRI (fMRI especially) data between the two approaches that would show how much precision is being lost when choosing float32 and if that precision difference has real world consequences.

The reason I’m asking is that using float64 comes with performance and memory (RAM and HDD) penalties which are especially painful when dealing with high-resolution short TR data.

On the other hand for data coming from a multiband sequence (say x8) acquired using 16-bit ADCs, the last step of the reconstruction involves combining 32 int16s, so there is potentially quite a bit more than 16-bits of information there which could justify using float64.


In the scenerio you describe above, I think that float32 is almost certainly sufficient.

I am assuming that your original data is in a range of say 0-1000. So float 32 would give approx 1e-5 precision.

(iFor example, if we are interested in measuring change in position over a 1 kilometer distance (1000 meters), we can reliably measure changes in distance of about 1e-5-1e-3= 1e-2 meters before we loose precision). Of course double precision would allow us to measure changes of about 1e-13-1e-3 = 1e-11 meters, but in most circumstances that is just not beneficial to have that level of precision.

On the other hand when storing parameters where small deviations can result in large changes in results, then double precision is best.

I typically store displacement fields as float32, and affine transform parameters as float64.


For your original question, interpolation does not increase the available precision, so you only need to worry about rounding error. If your input is float32 or worse, float32 is fine for the interpolation output.

Regarding your update about combination of values in reconstruction: the ideal conditions for a combination of independent measurements is repeatedly measuring the same quantity, in which case you can gain log2(N) bits of precision - for 32 independent measurements of the same quantity at 16 bits each, you can gain 5 bits, leading to a total of 21 bits. float32 has 24 bits of precision, so it has room to spare.

However, if they are independent measurements but there are some weighting factors involved, it will not approach this ideal. If they are not of the same quantity, such that there is cancellation involved, you can instead have less precision than the inputs. This also ignores the more practical issue of SNR.

As a more intuitive guideline for this, float32 has a precision ratio (take a value and the closest next representable number, divide the difference between them by their average) in the range [2^-24, 2^-23] = [10^-7.22, 10^-6.92]. For comparison, 7 significant figures in scientific notation has a precision ratio in the range [10^-7, 10^-6] (because 1.000002 and 9.999998 both have the same absolute step to the next number, but differ by an order of magnitude).

So you can just ask yourself: does it make sense to have more than 7 significant figures? If not, store it as float32.


Hey all,

I’ve wondered the same thing for a while now. Because with fMRI, resting state especially (as I understand it), the signal after detrending, nuisance regression, and temporal bandpassing is a small signal relative to the larger original data range. So is that extra precision helping with the calculation of correlations or is it just adding to uncertainty in the signal (noise) anyway to have 64-bit versus 32-bit precision?

Thanks for entertaining my musings on this older thread.



I recently did some tests after converting float64 data to float32 and int16. Results are interesting:

In [17]: np.abs((nii_int16.get_data() - nii_float64.get_data())).mean()
Out[17]: 0.09107306744975145

In [19]: np.abs((nii_float32.get_data() - nii_float64.get_data())).mean()
Out[19]: 3.63454324763137e-05

In [21]: np.abs((nii_int16.get_data() - nii_float32.get_data())).mean()
Out[21]: 0.09107308637217748

It seems that pushing all the way to int16 inures a real precision cost.


Hi Chris, since you are at it, would you mind adding uint16 to this comparison?

In [31]: np.abs((nii_int16.get_data() - nii_uint16.get_data())).mean()
Out[31]: 0.0017041164935648729

In [32]: np.abs((nii_uint16.get_data() - nii_float32.get_data())).mean()
Out[32]: 0.09107304279264676

In [33]: np.abs((nii_uint16.get_data() - nii_float64.get_data())).mean()
Out[33]: 0.09107302352344504

Comparable precision loss.


When converting floats to integers it is a very good idea that you rescale the data to use the full precision of your datatype - it is free data. For example, if you are saving your data as UINT_16 (which stores values as discrete while numbers from 0…65535), then you should set your scl_inter to the minimum value in your image and the scl_slope (65535/(maximum-minimum)). The SPM/Matlab script nii_change_datatype.m illustrates this method. I think this explains a lot of the error you are seeing by comparing the means.

I also think you need to be realistic regarding the SNR/CNR in modern MRI sequences - visually we can see the noise very well in grayscale images, and yet humans tend to have problems discriminating a grayscale with 100 levels. Traditionally, MRI was sampled at 12-bit (0…4095) which is probably more than sufficient for most modalities. Now most scanners use 16-bit integers (0…65535), largely because most DICOM images have 16-bits allocated. Furthermore, in my experience, most of the sequences we use have the FFT scale set very conservatively to avoid clipping, so if you look at your raw data you will see it rarely uses the 16 bit range.

I am sure there are many other articles on the topic, but as a behavioral scientist I was really influenced by the article Good news for bad clocks which shows that through repetitive sampling we are able to accurately discern small differences in noisy data.

I argue that 32-bit float is overkill for typical neuroimaging data, but disk drives are fast and modern computers are exceptionally fast with floating point numbers compared to those in the past.