Using docker to distribute Highres Neuroimaging software

juhuntenburg · May 11, 2017, 12:23am

Hi,

We are currently working on a piece of neuroimaging software for high-resolution image processing and thinking about the best way to handle distribution. The code is originally written in Java but we are working on Python wrappers to integrate with other Nipy tools. We use JCC to call the original Java classes from Python, which produces C++ and makes installation platform-dependent.

One idea to still keep the installation simple is to use docker. But beyond all the hype, we are wondering:

if neuroimaging software users actually (know how to) use docker
if the docker overhead will slow down computations, which is a crucial concern for highres data

Happy for any input!
Julia

ChrisGorgolewski · May 11, 2017, 12:44am

I think the answer depends on the objective:

If you are looking to distribute a library that is intended to be integrated with other software wrapping it in Docker will make it hard if not impossible. Here building a pypi or conda package makes more sense (although I do not know how are you going to deal with the Java dependency - but do you still need JVM after compilation using JCC?).
If you are looking to distribute a command line tool with complex set of dependencies Docker is a good fit.

In other words distributing nilearn in Docker makes little sense, but distributing FMRIPREP in Docker makes more sense.

To answer your other questions:

There is no data on what percentage of neuroimagers know how to use Docker. There is however a growing list of publications and training materials that use Docker in context of neuroimaging.
Docker is slightly slower than bare metal on Windows and Mac, but runs as fast as native on Linux.

juhuntenburg · May 11, 2017, 2:45pm

Thanks so much for you input Chris, that’s really helpful.

I will discuss with the others, but in my opinion an integrateable library would be the dream, so docker might not be ideal. Also, bare metal might indeed be the preferable option when it comes to cutting the cortex in razor-thin slices

We still need the JVM after JCC, so not sure how that would work with pypi or conda packages. I will look into it. Or maybe someone here has done this kind of thing before?

Thanks again, much appreciated!

Grant · May 13, 2017, 12:24am

Using Docker as a means to distribute and use neuroimaging software is a very forward-thinking thing to do, though Docker’s adoption as a scientific tool is still in very early stages, so as Chris said, there isn’t much quantitative info available on its usage right now.

As with any tool, like git or GNU Make, it has a learning curve that you’ll have to consider, as not many uses in scientific software exist yet. Docker itself has undergone quite a few significant changes just in the recent year or two, including the division between Enterprise and Community editions. These are all good developments, for the most part, but it does mean that the installation procedure for Docker is always being tweaked, which may impact its utility to you as a distribution too.

That said, there’s nothing preventing you from releasing your software through more traditional means in addition to Docker. It could be excellent have traditional downloads for the application (or for something like the JAR object, since you mention your project is coded in Java) as well as putting your officially supported Docker Image on something like DockerHub. If your project is hosted on an online repository, it’s a trivial addition (in terms of storage) to add a DockerFile to the top directory, once you’ve made sure it works as intended on your machine.

Whether or not Docker adds a lot of overhead to your pipeline depends on what base image you use. The default image supported by Docker is Alpine Linux, which is very similar to Ubuntu but much smaller in terms of memory used, and there are already many variants on DockerHub that have Java Development Kits and the like included. If you build your Docker distribution on one of those, you should see negligible slowdown execution of your code, since containers are very fast to start/stop and are not fully-fledged virtual machines (technically, they’re just a combination of Linux namespaces and cgroups, but that’s a different story). Unless something sub-optimal is done in creating the image or DockerFile, you shouldn’t notice a significant difference in time executing your code with Docker (except for the initial download and build of the Docker Image) vs. running locally.

Either way, it sounds like a cool project, so best of luck!

juhuntenburg · June 2, 2017, 1:33pm

Grant, thank you so much for all of that information, that’s super helpful. It seems that traditional distribution, and an additional Docker Image will be the way to go.
Thanks!