Accessing the same voxel across multiple nifti files in constant time

Hello fellow neuro stars!

My use-case is the following: for a given individual, I want to quickly access a given voxel across multiple fMRI nifti images. I’m working with Python and I want this for many individuals.

My current workflow involves loading all images in RAM with nibabel, but it does not scale for big datasets (here I’m considering a ~1To dataset).
Another workflow could try to memmap these maps on disk, but it’s a bit cumbersome and would take 2x more space on disk.

Does anyone have a better idea? :blush:

1 Like

Would the slicer method work?

import tempfile
import numpy as np
import nibabel as nb

with tempfile.NamedTemporaryFile(suffix=".nii.gz") as f:
   # voxel to load
    x,y,z = 10,10,10
    # prep fake image
    Z = np.random.normal(size=(91, 108, 91))
    n = nb.Nifti1Image(Z, affine=np.eye(4))
    n.to_filename(f.name)

    # load image back
    img: nb.Nifti1Image = nb.load(f.name)
    n: nb.Nifti1Image = nb.load(f.name).slicer[x:(x+1), y:(y+1), z:(z+1)]
    print(f"{img.get_fdata().nbytes=}")
    print(f"{n.get_fdata().nbytes=}")

# img.get_fdata().nbytes=7154784
# n.get_fdata().nbytes=8

edit: a timed version

import numpy as np
import nibabel as nb
import timeit
import tempfile


with tempfile.NamedTemporaryFile(suffix=".nii.gz") as f:
    # prep fake image
    Z = np.random.normal(size=(91, 108, 91))
    n = nb.Nifti1Image(Z, affine=np.eye(4))
    n.to_filename(f.name)

    setup = f"""
import numpy as np
import nibabel as nb
# load full image and then slice
def load_full(f, x, y, z) -> None:
    img: nb.Nifti1Image = nb.load(f)
    _ = img.get_fdata()[x : (x + 1), y : (y + 1), z : (z + 1)]


# load sliced image only
def load_sliced(f, x, y, z) -> None:
    n: nb.Nifti1Image = nb.load(f).slicer[x : (x + 1), y : (y + 1), z : (z + 1)]
    _ = n.get_fdata()
f = '{f.name}'
x, y, z = 10, 10, 10
    """

    print(f"{timeit.timeit(stmt='load_full(f, x, y, z)', setup=setup, number=100)=}")
    print(f"{timeit.timeit(stmt='load_sliced(f, x, y, z)', setup=setup, number=100)=}")


# timeit.timeit(stmt='load_full(f, x, y, z)', setup=setup, number=100)=3.3801059580000583
# timeit.timeit(stmt='load_sliced(f, x, y, z)', setup=setup, number=100)=0.4113531250040978

Thank you for your answer!

So that’s the thing:

  1. if I use nb.load() at runtime, then loading the value of a given voxel across maps will take me O(n \times m) where m is the number of maps and n is the number of voxels per map (because np.load() will load each map completely, no? Maybe @effigies can confirm)
  2. if I pre-load these maps so as to simply use a slicer at runtime, then they are loaded in RAM, which won’t fit in my case

What I’m looking for is a method that runs in O(1) with minimal memory footprint. I guess, in essence, what I’m looking for is something comparable to what the GLFT format does, but for NIFTI images.
The GLTF format stores metadata in a .glft file and data arrays in binary files. I think I would like to do the same with NIFTI files, that is store their data not as a base64 string inside the file, but rather as a uncompressed binary file (hdf5 maybe?).

Here is an example of a .glft file for you to get a better sense of what I am saying (notice how the file refers to vertices_pial_left.bin and triangles_pial_left.bin where the vertices and edges of this mesh have been stored):

{
  "accessors": [
    {
      "bufferView": 0,
      "byteOffset": 0,
      "componentType": 5126,
      "count": 163842,
      "type": "VEC3",
      "max": [
        1.7827589511871338,
        68.96263122558594,
        78.02754974365234
      ],
      "min": [
        -68.7943344116211,
        -104.82635498046875,
        -47.05241394042969
      ]
    },
    {
      "bufferView": 1,
      "byteOffset": 0,
      "componentType": 5125,
      "count": 983040,
      "type": "SCALAR"
    }
  ],
  "asset": {
    "version": "2.0"
  },
  "buffers": [
    {
      "uri": "vertices_pial_left.bin",
      "byteLength": 1966104
    },
    {
      "uri": "triangles_pial_left.bin",
      "byteLength": 3932160
    }
  ],
  "bufferViews": [
    {
      "buffer": 0,
      "byteOffset": 0,
      "byteLength": 1966104,
      "target": 34962
    },
    {
      "buffer": 1,
      "byteOffset": 0,
      "byteLength": 3932160,
      "target": 34963
    }
  ],
  "meshes": [
    {
      "primitives": [
        {
          "attributes": {
            "POSITION": 0
          },
          "indices": 1
        }
      ]
    }
  ],
  "nodes": [
    {
      "mesh": 0
    }
  ],
  "scenes": [
    {
      "nodes": [
        0
      ]
    }
  ]
}

Oh! I think that I understand now. Though, I don’t think it’s true that the whole map is loaded when called like nb.load(filename).slicer[x,y,z]. See the edited post for a timing comparison of calling get_fdata() on the original or sliced image. Even so, this couldn’t be \mathcal{O}(1) – presumably at best \mathcal{O}(m). But varying the size of the nifti file shows that there’s also at least some dependence on n.

with tempfile.NamedTemporaryFile(suffix=".nii.gz") as f:
    # prep fake image
    Z = np.random.normal(size=(512, 512, 512))
    n = nb.Nifti1Image(Z, affine=np.eye(4))
    n.to_filename(f.name)

    setup = f"""
import numpy as np
import nibabel as nb
# load full image and then slice
def load_full(f, x, y, z) -> None:
    img: nb.Nifti1Image = nb.load(f)
    _ = img.get_fdata()[x : (x + 1), y : (y + 1), z : (z + 1)]


# load sliced image only
def load_sliced(f, x, y, z) -> None:
    n: nb.Nifti1Image = nb.load(f).slicer[x : (x + 1), y : (y + 1), z : (z + 1)]
    _ = n.get_fdata()
f = '{f.name}'
x, y, z = 10, 10, 10
    """

    print(f"{timeit.timeit(stmt='load_full(f, x, y, z)', setup=setup, number=10)=}")
    print(f"{timeit.timeit(stmt='load_sliced(f, x, y, z)', setup=setup, number=10)=}")

# timeit.timeit(stmt='load_full(f, x, y, z)', setup=setup, number=10)=49.05634537500737
# timeit.timeit(stmt='load_sliced(f, x, y, z)', setup=setup, number=10)=0.9410956670035375

Sorry, I’m not sure.

Indeed that’s a very interesting method you are proposing!
Empirically, I think the loading duration is short enough that I can try to use this for my application!

However, I still have two inquires:

  1. it feels like the slicer loading procedure still depends on the size of the image as exemplified below
  2. is this method applicable to gifti files as well? I couldn’t find a slicer method in nibabel for gifti objects ; in case it does not exist, do you think it’s implementable?

Example illustrating the fact that slicer still scales linearly in the number of voxels:
I just tried the following snippet (I divided output durations by the number of steps to approximate how much time it takes to actually load these things):

# %%
import numpy as np
import nibabel as nb
import timeit
import tempfile

# %%
number = 100

# %%
with tempfile.NamedTemporaryFile(suffix=".nii.gz") as f:
    # prep fake image
    Z = np.random.normal(size=(512, 512, 512))
    n = nb.Nifti1Image(Z, affine=np.eye(4))
    n.to_filename(f.name)

    setup = f"""
import numpy as np
import nibabel as nb
# load full image and then slice
def load_full(f, x, y, z) -> None:
    img: nb.Nifti1Image = nb.load(f)
    _ = img.get_fdata()[x : (x + 1), y : (y + 1), z : (z + 1)]


# load sliced image only
def load_sliced(f, x, y, z) -> None:
    n: nb.Nifti1Image = nb.load(f).slicer[x : (x + 1), y : (y + 1), z : (z + 1)]
    _ = n.get_fdata()
f = '{f.name}'
x, y, z = 10, 10, 10
    """
    print(f"{timeit.timeit(stmt='load_full(f, x, y, z)', setup=setup, number=number)/number}")
    print(f"{timeit.timeit(stmt='load_sliced(f, x, y, z)', setup=setup, number=number)/number}")

which yields (careful, it takes 10 minutes to run haha):

5.1362942935799945
0.10730613767998876

Running the same snippet with an image size of (256, 256, 256) yields

0.6240910130699922
0.02270433284007595

which is roughtly 8x faster than with images of size (512, 512, 512), which seems to indicate a linear dependency in n (the number of voxels in the image).

@alexis is the idea to do this one off for a single voxel, or to do this multiple times for different voxels in the time series?

  1. It is worth noting that standard gzip does not support random access, so one needs to unpack the whole file. In your example, you are trying to load a gzip image (".nii.gz"). Indeed, this is precisely why SPM refuses to support .nii.gz files, insisting on uncompressed .nii files. One can create indexed gzip files that do allow better performance. The files must be created with the indices, there is a slight file size penalty, and while all gzip tools should be able to handle them, they will only be faster with index-gzip aware tools.
  2. The NIfTI specification requires that the first three dimensions are spatial. This is an inherently poor design for operations that want to operate on voxels (e.g. virtually all non-spatial operations: temporal smoothing, slice time correction, statistics, etc.). In other words, for 4D NIfTI time series are always stored XYZT with time being the 4th dimension. Therefore, successive timepoints of a single voxel are stored far apart. The magic of the defunct VoxBo was that its 4D files were stored in TXYZ order, so timepoints for a given voxel were stored contiguously, allowing reading along cache lines dramatically improving performance. If you want to do a lot of voxelwise calculations, you may want to swizzle your datasets to TXYZ and then save these as uncompressed files that can be accessed with a memmap. For full disclosure, when the NIfTI format was being specified I advocated for allowing this configuration, but this was not the consensus.

Oh I see, this insight is very interesting!
Fortunately, in this case, I’m only dealing with fMRI contrast maps, thus I don’t have the timeseries issue you mention.

Based on your comment, I just reran the previous experiment setting suffix=".nii" instead of suffix="nii.gz" and was very surprised by the results:

(512, 512, 512) image size yields

0.0002763139399996817
0.0005048864000002595

(256, 256, 256) image size yields

0.0002674535299991021
0.0005311758999994254

I see three interesting pieces of information:

  1. loading is much, much faster!
  2. loading duration does not depend on image size anymore
  3. slicing now takes a bit longer than full loading

In my case, this speedup comes at the cost of gunzipping all files, which I evaluate will take about 4x more space on disk (400Ko → 1.4Mo).

@alexisthual gz is an old standard which does not leverage modern computer architecture. You can get dramatically faster decompression with zstd. Further, standard non-indexed gzip can not use parallel decompression (though you can calculate the CRC32 on a separate thread for a tiny performance benefit).

However, gz is the compression adopted by NIfTI, and it is so common it is easy to support in any language. One thing to evaluate is the library used for gzip compression. FSL 6.0.4 became faster in writing and reading .nii.gz files by moving to the CloudFlare zlib. You can test various libraries yourself with this Python script - last I tested CloudFlare was about 50% faster decompressing than the standard zlib, but CloudFlare decompression has recently adopted some new optimizations. libdeflate is the gz decompression speed king, but it uses a different API and as I recall it has much larger memory requirements for large files.

Since raw data in our field is often stored as 16-bit integers, one can improve compression ratio by using byte shuffling, and one can leverage multi-dimensional tricks. Both methods are used by blosc so you could consider zarr locally, but these files will not be readable by other tools in our field.

2 Likes

Sorry, I was without power yesterday and have only skimmed this. Have you seen this doc? NiBabel Manual - Images and Memory

The basic idea is to slice the ArrayProxy img.dataobj[x, y, z] instead of calling the method img.get_fdata()[x, y, z]. So you might use the following approach:

image_list = [nb.load(fname) for fname in file_list]

def get_location_across_images(x, y, z):
    return [img.dataobj[x, y, z] for img in image_list]

(You could also write it as a generator so that you do not necessarily need to read from every file in order to start processing data from the first file…)

I probably would not use img.slicer[] if you’re working with many files. Just to give you a sense of the costs, img.slicer[...].get_fdata() will calculate a new affine based on the slices so that the spatial location of each voxel is preserved after slicing, create a new Nifti1Image object in memory with that affine and the sliced dataobj (see below), convert the sliced array (if necessary) to float64, and store the resulting numpy array or memmap in _fdata_cache. You will then need to be careful not to keep references around, so that garbage collection will drop them, but you will have little control over when that happens.

If you just need the time series, img.dataobj[...] will do the minimum necessary to get you an array (or memmap) with correct values. The wrinkle here is that the type of the resulting array depends on the on-disk data type and the values of the scaling parameters scl_slope and scl_inter. You may prefer something like img.dataobj[...].astype(np.float32).

Optimization if you need to reduce copies and casts to their absolute minimum

If you want to optimize even further, you can use the underlying ArrayProxy._get_scaled() function that ArrayProxy.__getitem__ calls:

img.dataobj._get_scaled(np.float32, (i, j, k))

And as @neurolabusc mentioned, you will see much better performance if indexed-gzip is used. Nibabel will use it automatically if it is installed. I don’t know much about pre-creating indices or how to make sure they’re picked up, but the speed improvements are impressive even without pre-created indices.

3 Likes

These are all very valuable insights, thank you!

I have another question then: what about gifti files? Is there also a way for me to efficiently access the same vertex index across many files?

No. Unfortunately, GIFTIs are extremely inefficient for slicing. I really wouldn’t recommend them for anything except surface or parcellation data. You’ll be better off converting to another format than attempting anything clever. CIFTI-2 is, for better or worse, the best supported at the moment. There is the bmsh format based on the JData family of data formats, which is very promising, but I haven’t yet dived into it myself. See ENH: Reading and writing JSON/binary JSON based surface data (.jmsh and .bmsh) by fangq · Pull Request #1114 · nipy/nibabel · GitHub for a proposed implementation in nibabel.

This has been an interesting discussion, and I thought I would add a few small points that might be useful for other readers, even if not really “pythonic”…

As Chris R. (neurolabusc) noted, the gzip compression is going to stymy any kind of efficiency you might hope to get, but uncompressed NIFTI (nii or hdr/img) or AFNI HEAD/BRIK should be good. The proposed python functions look pretty good, but you can also do with some other simple tools, albeit possibly less efficiently.

  1. nifti_tool. Rick Reynolds let me know about this common tool’s ability to get an ijk index with the option “-disp_ci i j k -1 0 0 0” to get the voxel values across time. The -1 can act as a wildcard to get slice data, but here goes across the 4th dimension. Uses SEEK to jump to each block to read, rather than reading in an entire block.
  2. file_tool. AFNI generic file reading program with offset and length options.
  3. 3dmaskdump. AFNI tool for querying by index or mask. The -ibox option can limit to a specific index.

For data on surfaces, we generally use standard meshes where there is nodal correspondence across subjects. The the gifti gii files (or niml.dset or text 1D.dset) can be queried at each node in a similar way as the volumetric NIFTI files. An example with 3dmaskdump:

3dmaskdump -quiet -noijk -ibox 32 0 0 test.thick.gii
3.621048

For simple analysis across many subjects, I often use 3dMean. That can take wildcards for the input datasets, but only loads a single volume at a time. That works well for mean or std dev and similar simple statistics. Otherwise, single slices are usually good enough for analysis.

https://afni.nimh.nih.gov/afni/community/board/read.php?1,166268,166285#msg-166285

There have been some attempts to do more complex analysis with large data that have high memory requirements like light sheet microscopy data.
https://www.janelia.org/sites/default/files/Library/nmeth.3041.pdf