Hello! Has anyone here ever written nifti I/O code that wraps the nifti_clib code directly in Cython? Was that a big win in terms of run-time? We are interested in reading many files in parallel, so we’d like to use openmp in the Cython code to parallelize reads. Seems like that would best be done this way, if we don’t particularly care about portability? Or maybe we’re thinking about this the wrong way?
I doubt you’ll get a lot of benefit from nifti_clib. Loading and correctly handling headers is not the bottleneck; most often it’s gzip. If you have indexed_gzip installed, you will get all the benefits of Cython, although not of a parallel gzip algorithm. I have seen (but not used) rapidgzip, which might be of interest before writing your own library.
Assuming what you want at the end of the day is either a nibabel image or a numpy array, then I would suggest that you consider the ArrayProxy class, which is how we represent the information needed to load data on-demand: nibabel/nibabel/arrayproxy.py at master · nipy/nibabel · GitHub. In particular, the __array__() and __getitem__() methods are the main ways people will access the data (np.asanyarray(img.dataobj)). An optimizing loader could work with the ArrayProxy spec:
Just in case: have you had a chance to confirm and test that the filesystem you’d be working on handles parallel I/O well? I ask because the cluster I tend to work on has only one filesystem that is guaranteed to be secure enough for files with personally identifying information. That filesystem is, tragically, unable to cope with many simultaneous filesystem interactions, and it’s slow, too. Even simpler forms of parallelism have caused problems after a certain scale (e.g., running tens of participants through fMRIPrep).
@effigies: to your point, what if we plan to unzip the files we are working with in advance? We’d much rather uncompress once upfront and for our application we’re going to need to make multiple passes of reading the data. Would you expect a benefit of cythoning in that case?
Any thoughts about going full zarr for this use-case?
We’ll definitely need to do some empirical benchmarking of different ways of doing this, but it’s really helpful to hear about issues we might run into and the pointer to arrayproxy of course.
Regarding zarr, could you say a bit more about the specific scenario? For example, is this for fitting some DL model, perhaps with tensorflow or pytorch? If that’s the situation, then I’d hazard that it’d be worth at least starting with their built-in dataset tools, and seeing whether the achieved speed is sufficient. By built-in tools, I mean tensorflow records (see their discussion of i/o optimizations) or pytorch datasets/dataloaders (see their discussion of parallel file access, which I understand to be based on a multiprocess model, in which each process would be reading files that contain only the data array as either a serialized numpy or pyarrow object). If that’s not your scenario, then sorry for the noise (though, that tensorflow discussion of i/o optimizations may still be helpful since it discusses not only parallel data extraction but also strategies like optimizing the overlap of i/o vs compute operations through prefetching).
Definitely one approach. And you could just as easily apply scale factors and save as float32 (or any appropriate dtype) at that point, which would mean you can memmap or use any other method to achieve random access.
Possibly a very specific access mechanism, but it’s going to be hard to beat numpy.memmap(). Possibly the ArrayProxy.__getitem__ slicer calculations could be sped up, but I have always assumed that the I/O is the slow bit, and we farm out to numpy for that. That said, it’s possible that since they were written 15+ years ago, numpy has provided functions that eliminate the need for most of our code.
If you’re going to be putting chunks on S3, I don’t see any reason not to, although I’m not 100% sure what all going full zarr implies. You may want to look at GitHub - neuroscales/nifti-zarr: A draft specification for the nifti-zarr format (PR#1 has significant discussion, PR#7 has the most recent proposal) for a round-trippable Zarr interpretation of Nifti.
I’m not 100% sure what all going full zarr implies.
Yeah - basically what you said - put chunks on S3, ignoring the fact that the data originally came from a nifti, with minimal metadata (I like the idea of casting everything to a space-conserving dtype!). Use zarr from then on. That’s admittedly a kludge.
Hi all, I don’t think I can add much to this discussion, apart from mentioning that, if you’re going to use e.g.numpy.memmap, there’s probably no need to use Cython. Cython is a very good choice if you need to interface with an existing C library, but it sounds like this may not be necessary for you.