Large virtual memory allocations in fMRIPrep on PBS system

@mitchem890 Which IU resource are you running this on? I believe “mem” is ignored on carbonate. I have to set vmem instead. Without setting vmem, carbonate defaults it to 16gb.

@effigies I just disabled the plugin parameters (to keep things simple). Here is the commands that I am running now - and the error message.

[stderr log]
++ singularity run --cleanenv docker://poldracklab/fmriprep:1.2.6-1 --notrack --resource-monitor --skip_bids_validation --stop-on-first-crash --mem-mb 30000 --omp-nthreads 8 --output-space fsaverage5 fsnative T1w template --template-resampling-grid=2mm --force-bbr --force-syn --skull-strip-template=NKI --fs-license-file=/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/.license.txt --work-dir=/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/ --participant_label=sub-TTTEMPSUB /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripinput/ /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripout/ participant
e[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[91mERROR : Could not write to /N/soft/rhel7/singularity/2.6.1/var/singularity/mnt/final/etc/localtime: Too many levels of symbolic links
e[0m/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/site-packages/nilearn/datasets/neurovault.py:16: DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
from collections import Container
/usr/local/miniconda/lib/python3.7/site-packages/skimage/init.py:80: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/pytest.py’ mode=‘r’ encoding=‘utf-8’>
imp.find_module(‘pytest’)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/site-packages/networkx/classes/reportviews.py:95: DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
from collections import Mapping, Set, Iterable
/usr/local/miniconda/lib/python3.7/site-packages/bids/layout/bids_layout.py:121: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripinput/dataset_description.json’ mode=‘r’ encoding=‘UTF-8’>
self.description = json.load(open(target, ‘r’))
/usr/local/miniconda/lib/python3.7/site-packages/grabbit/core.py:436: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/bids/layout/config/bids.json’ mode=‘r’ encoding=‘UTF-8’>
domain = json.load(open(domain, ‘r’))
/usr/local/miniconda/lib/python3.7/site-packages/bids/layout/bids_layout.py:208: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripinput/sub-TTTEMPSUB/ses-SSSES/func/sub-TTTEMPSUB_ses-SSSES_task-rest_bold.json’ mode=‘r’ encoding=‘utf-8’>
encoding=‘utf-8’))
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘DeprecationWarning’>): Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
Captured warning (<class ‘ResourceWarning’>): unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/pytest.py’ mode=‘r’ encoding=‘utf-8’>
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/site-packages/nilearn/datasets/neurovault.py:16: DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
from collections import Container
/usr/local/miniconda/lib/python3.7/site-packages/skimage/init.py:80: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/pytest.py’ mode=‘r’ encoding=‘utf-8’>
imp.find_module(‘pytest’)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
Captured warning (<class ‘PendingDeprecationWarning’>): the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
Captured warning (<class ‘PendingDeprecationWarning’>): the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
=>> PBS: job killed: vmem 32281870336 exceeded limit 32212254720

[stdout - last few lines]

190719-08:21:38,185 nipype.workflow INFO:
[Node] Finished “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.fixhdr_skullstrip2”.
190719-08:21:40,103 nipype.workflow INFO:
[Node] Setting-up “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.combine_masks” in “/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/combine_masks”.
190719-08:21:40,124 nipype.workflow INFO:
[Node] Running “combine_masks” (“nipype.interfaces.fsl.maths.BinaryMaths”), a CommandLine Interface with command:
fslmaths /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/skullstrip_first_pass/ref_image_corrected_brain_mask.nii.gz -mul /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/fixhdr_skullstrip2/uni_xform_mask_xform.nii.gz /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/combine_masks/ref_image_corrected_brain_mask_maths.nii.gz
190719-08:21:40,459 nipype.workflow INFO:
[Node] Finished “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.combine_masks”.
190719-08:21:42,103 nipype.workflow INFO:
[Node] Setting-up “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.apply_mask” in “/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask”.
190719-08:21:42,121 nipype.workflow INFO:
[Node] Running “apply_mask” (“nipype.interfaces.fsl.maths.ApplyMask”), a CommandLine Interface with command:
fslmaths /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/fixhdr_unifize/uni_xform.nii.gz -mas /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/combine_masks/ref_image_corrected_brain_mask_maths.nii.gz /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask/uni_xform_masked.nii.gz
190719-08:21:42,376 nipype.workflow INFO:
[Node] Finished “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.apply_mask”

As you can see, fmriprep seems to be (at least temporarily) exceeding the --mem-mb. Strangely, if I just rerun the job, the job sometimes succeeds. I believe this is because the batch scheduler isn’t quick enough to discover that the fmriprep is exceeding the requested memory size, and/or fmriprep only exceeds max memory size - occasionally… I am not sure which is which, but either way, fmriprep shouldn’t exceed the max memory. Have you run a detailed memory footprint analysis with fmriprep?

A couple things.

  1. You say above that you use n_procs = 4 and here you have --omp-nthreads 8. What this means is that a job that can parallelize will claim 8 threads and be told it can use 8 threads, so if it’s a job that does fork-exec (more on this in a bit), you might get temporary bumps of 8x memory claims for that job.
  2. I would actually recommend against using --omp-nthreads without a very good reason, because our default value is min(n-cpus - 1, 8) which tends to be pretty sensible.
  3. You don’t specify --n-cpus here, which will mean it will try to use all of the CPUs available on the system. When using plugin.yml, this should be fine, but just a note for this run.

I don’t see anything in your logs that makes it clear which jobs were running at the time of death. Could you try -vv to bump up the verbosity?

I suspect it’s the latter. What we’ve seen in the past is a fork-exec issue on operating systems that have strict memory overcommit policies. Because fork-exec makes a copy of the original process with its full address space, the virtual memory doubles until the exec call. If you can overcommit, it’s fine, because you probably don’t actually touch more than one page, so copy-on-write saves you from ever actually allocating the memory. If you can’t, then the apparent memory usage can get catastrophically high. I’m guessing you’re in this latter category.

What we need is two memory estimates for every node: the actual memory consumed and the virtual memory claimed. If we strictly use the latter, then everybody’s performance drops, as the utilized memory will be far lower than desired on systems that don’t have strict overcommit policies.

One additional reason that restarting can sometimes bypass it is that it affects the relative start times of pairs of processes. If on the first run, two processes hit a memory (real or virtual) peak at the same time, the timing can be different on a second run.

@oesteban was working on the resource monitor at one point, with the idea of collecting stats on the memory and virtual memory footprints of all processes so we can see where our estimates are off. Possibly he can update on the status, but no, we don’t AFAIK have anything like what we would need to definitively address this at this point.


In the short term, I can make some recommendations:

  1. Use LegacyMultiProc plugin. This is able to take advantage of a technique for cheaper forking.
  2. Under-estimate --mem-mb.
  3. Leave --omp-nthreads unset or set it to < n_procs. Once you get something running and want to optimize, you can try tweaking, but I don’t think you’re going to get much out of it.
  4. Re-run.

I’m sorry that we don’t have a fully satisfactory option at this point. It would be nice if we could actually constrain the jobs, but nipype does not have that capability.

@effigies Thanks for th tip!

I’ve updated the plugin config with the following content

plugin: LegacyMultiProc
plugin_args: {maxtasksperchild: 1, memory_gb: 28, n_procs: 8, raise_insufficient: false}

Running with the following PBS directives.

#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=3:00:00,vmem=30gb
#PBS -N app-run-fmriprep

I then updated the singularity command line with the following (removed cpu/memory related options)

singularity run --cleanenv docker://poldracklab/fmriprep:1.2.6-1 --notrack --resource-monitor --skip_bids_validation --stop-on-first-crash --use-plugin=multi_proc.yml --output-space fsaverage5 fsnative T1w template --template-resampling-grid=2mm --force-bbr --force-syn --skull-strip-template=NKI --fs-license-file=.license.txt --work-dir=fmripworkdir/ --participant_label=sub-TTTEMPSUB fmripinput/ fmripout/ participant
e[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[91mERROR  : Could not write to /N/soft/rhel7/singularity/2.6.1/var/singularity/mnt/final/etc/localtime: Too many levels of symbolic links
e[0m=>> PBS: job killed: vmem 36544569344 exceeded limit 32212254720

As you can see, the fmriprep is still allocating >28gb of vmem (aggregate) near the startup time. I’ve checked the overcommit setting and it seems to be enabled (/proc/sys/vm/overcommit_memory set to 0). By the way, all of our HPC / cloud compute resources have swap memory disabled… I know most desktop / laptop type environment all has swap memory enabled. This might be contributing to the problem, I am not sure.

When fmriprep spikes with the VM, maybe this can be avoided by staggering the forking for each process?

This seems very strange… @oesteban have you seen anything like this?

I’m inclined to think that doesn’t matter in practice if you’re setting a vmem limit, instead of mem, but good to know. I hadn’t considered that the batch system and the OS might treat virtual memory differently.

This is doable in principle by adding a new argument to the plugin in Nipype, but I haven’t scoped out how involved it would be.

Realized this was getting long, so moved to a new thread. Also, to be clear, is the 36.5GB allocation during workflow run? fMRIPrep spends a while preparing the workflow, and if we’re exceeding the memory limits at that point, that’s a different problem altogether.

That peak in vmem allocation at the very beginning is in agreement with my findings when I most intensely worked on this: https://github.com/nipy/nipype/issues/2776#issuecomment-439568215

According to brainlife’s task info, the spike happens at the very beginning, and it’s from processes called “pandoc-citeproc html” and “pandoc”. I know it’s kind of hard to see with all the colors… but I can send you the raw data if you are interested.

We only polls memory stat every 30 seconds, so there might be other processes that are spiking … but not showing up on our graph.

3 Likes

Just wanted to echo @Soichi_Hayashi’s experience…having run fMRIPrep (ver 1.2.5… so a while back) on the IU clusters, for some subjects (no apparent pattern about which subjects, perhaps ~10% of subjects) I suspect there was a massive ‘spike’ of memory requested at the beginning of the jobs, exceeding my vmem allocation which was 124gb. In my experience (the logs have since been wiped unfortunately…) these jobs were killed within 10 seconds of the node running. Reran the subjects with even more vmem requested and then they succeeded. Hope this bit of info could help y’all!

this is super-useful, thanks!

EDIT: pandoc is a memory hog (its devs’ words, not mine).

@Soichi_Hayashi, I’m going to get this fixed ASAP - stay tuned!

@oesteban Thanks for the quick action! I see you are working on the fix already. On the thread you mentioned from pandoc, it says this also > https://github.com/jgm/pandoc/issues/3169#issuecomment-496360494

If pandoc is compiled with -rtsopts (as it should be by default unless someone specifically turns this off), then you can specify the maximum heap size as follows:
pandoc +RTS -M30m -RTS -f markdown -t html MANUAL.txt
The +RTS -M30m -RTS sets max heap size to 40M. Max stack size defaults to 80% heap size.

You’ve probably seen that already… but just FYI.

Yep, I saw that. The problem is that I don’t think it is worth recompiling pandas to put it in our docker image :(. I think people can live having their individual reports without boilerplate if they hit the max VMS allocation. Would this be a particular problem for you?

Have we verified that the version installed doesn’t allow us to add those parameters?

Nope, that would be very easy of course

Hello! Is there any update to this issue?

Can you give 1.5.0rc1 a try? That pre-release includes the fix.

@oesteban We updated our script to use 1.5.1rc2 container.

Our jobs are finishing successfully about 80% of the time, but some are still failing. The process finished almost all the way to the end, and I see this in the log at the end.

fMRIPrep failed: Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/legacymultiproc.py", line 69, in run_node
    result['result'] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 479, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 569, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 662, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 382, in run
    runtime = self._run_interface(runtime)
  File "/usr/local/miniconda/lib/python3.7/site-packages/niworkflows/interfaces/nilearn.py", line 132, in _run_interface
    new_nii.to_filename(self._results['out_file'])
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/filebasedimages.py", line 335, in to_filename
    self.to_file_map()
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/analyze.py", line 1049, in to_file_map
    arr_writer.to_fileobj(imgf)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/arraywriters.py", line 562, in to_fileobj
    nan2zero=self._needs_nan2zero())
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/volumeutils.py", line 660, in array_to_file
    pre_clips=pre_clips)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/volumeutils.py", line 833, in _write_data
    fileobj.write(dslice.tostring())
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/openers.py", line 220, in write
    return self.fobj.write(*args, **kwargs)
  File "/usr/local/miniconda/lib/python3.7/gzip.py", line 264, in write
    self.fileobj.write(self.compress.compress(data))
OSError: [Errno 28] No space left on device

=>> PBS: job killed: vmem 22177083392 exceeded limit 17179869184
/usr/local/miniconda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 5 leaked semaphores to clean up at shutdown
  len(cache))

It’s intresting that I see both “No space left on device” and “vmem exceeded limit” at the same time. Does fmriprep use some kind of a memory mapped tmp space? Maybe it’s using too much of tmpfs?

fMRIPrep’s default working directory (where nipype intermediate results are stored) is $PWD/work, and the Docker image sets the default WORKDIR to /tmp, so without other intervention, the working directory will be /tmp/work.

On a system that uses a ramdisk for /tmp, this will definitely be problematic, unless you can assume many gigabytes of RAM.

I’m not entirely sure whether Singularity takes the working directory to be the calling directory or reads it from the image. In any event, passing the -w flag to fMRIPrep will allow you to make a better selection, if that does turn out to be the problem.

1 Like

Based on our suspicion that pandoc is requesting this serge in memory which is trouble for our compute cluster setup, is it a possibility to propose a flag to skip pandoc altogether? A flag to set a variable to negate the conditional here? https://github.com/poldracklab/fmriprep/blob/master/fmriprep/cli/run.py#L447 Is this the only place where pandoc comes up?

I second @faskowit’s suggestion to make this optional step. I just sent a PR to do just that > https://github.com/poldracklab/fmriprep/pull/1876 I’ve tested it on our system and it seems to work as expected (it skips generation of HTML file from citation.md).

3 Likes