Can nipype restart stalled nodes?

wfbroderick · February 28, 2018, 4:12pm

I’m wondering if there’s a way for nipype to restart stalled nodes? I’m using a preprocessing pipeline (code here) we wrote in-lab and every so often, one of the nodes (typically concat_rigids or merge_epis, which are MapNodes that call fsl.ConvertXFM and fsl.Merge, respectively) will stall out. The MapNode spawns a huge number of sub-processes (in the 100s), each of which only takes a minute or two at max. But sometimes, the node will stall out and cause the entire pipeline to freeze. From the log, it looks like the pipeline is just waiting to hear back from the sub-process (it’s not that it ran out of memory or anything), but I haven’t checked whether the issue is with nipype not realizing the sub-process finished or with the sub-process itself. This doesn’t happen when the plugin is Linear, only when it’s either MultiProc or SLURM, and I believe it only happens when run on our HPC cluster, which uses the SLURM job scheduler (I say “I believe” because no one else has run into this issue and the others in the lab run the pipeline locally – but it’s too memory-intensive for my machine, so I’ve never done a local run myself to compare).

The way I typically handle this is kill the whole pipeline and restart it. This has worked every time. It doesn’t take as long as running from scratch, because it finds the cached outputs and continues from there, but it would obviously be more convenient if I could just restart the problem sub-process instead of the whole thing. Is there a way to do so? Something like, specify the max length of time to let a sub-process run for and, if it exceeds that, kill it and try it again?

Given that it only happens when parallelizing the job on the cluster and restarting it fixes it every time, I suspect the output is being created and nipype is just not realizing it. But I feel like a fix for this would be more complicated and the issue may be setup-specific, whereas the above fix might be more general.

Thanks!

oesteban · February 28, 2018, 6:57pm

This is a very accurate assessment of the problem. You are right: one or more subnodes got killed (I’d bet a memory allocation problem) and MultiProc sits there waiting for the job to finish. With SLURM you can easily check on the error log if the OOM killer kicked in.

In FMRIPREP we’ve experience this problem a bunch of times. These are the measures we have taken (and worked with FMRIPREP):

We experienced great memory savings when we switched multiprocessing to the so-called 'forkserver' mode (https://github.com/poldracklab/fmriprep/blob/4314431f98e1ec68d4666ae03be684b414785d22/fmriprep/cli/run.py#L211). That works well for MultiProc.
I highly recommend updating to the latest release of nipype, since it seems to me you are not using current master (we tried to address this issue previously).
Limit the number of parallel subnodes using the n_procs property of that MapNode.
Replace the MapNode with a Node that iterates internally over the inputs (https://github.com/poldracklab/fmriprep/blob/master/fmriprep/interfaces/itk.py#L81-L155)

If I think of further solutions, I’ll add them.

Hope this helps!

effigies · February 28, 2018, 7:07pm

In addition to setting n_procs, if you can estimate the amount of memory each process is going to use, and add mem_gb=<VAL> to your Node and MapNode arguments, nipype will only run jobs when resources are available. (Note that this only works in MultiProc. Also, it is accounting, not profiling. If your jobs use more CPUs or memory than you estimate, then nipype will run too many jobs simultaneously. If you estimate too high, you might under-utilize your resources.)

wfbroderick · February 28, 2018, 7:44pm

Thanks for your responses! I was using version 0.14.1, so I’ll try updating to the newest version (1.0.1 is what I just got from pip) and see if that helps.

I’ll also try switching multiprocessing to 'forkserver' and see if that helps.

I’m passing n_procs and memory_gb to the MultiProc plugin and that doesn’t seem to help (which is why I thought it wasn’t a memory issue). I’d rather avoid passing those arguments right to the MapNode or switching it to a Node if possible, but we’ll see what I have to do.

effigies · February 28, 2018, 7:57pm

Passing n_procs and memory_gb to MultiProc will do very little if any of your jobs use more than one core or the default amount of RAM (I believe 250MB, off the top of my head), unless you tag the individual Nodes with more accurate estimates.

wfbroderick · February 28, 2018, 8:01pm

Really? It seems to be helping somewhat: the cluster job scheduler was killing my job really quickly until I set n_procs and memory_gb so that they matched SLURM’s cpus-per-task and mem options. (Without me tagging the individual nodes with anything.)

effigies · February 28, 2018, 8:28pm

Ah, yes, if you’re restricted to fewer cores than are present on the compute node, that will make a difference. I suspect that the memory tag makes less of a difference.