Large virtual memory allocations in fMRIPrep on PBS system

Soichi_Hayashi · July 18, 2019, 3:08pm

@mitchem890 @effigies We are running into a similar issue on IU carbonate. Despite setting --use-plugin pointing to a file containing the following content …

plugin: LegacyMultiProc
plugin_args: {maxtasksperchild: 1, memory_gb: 22, n_procs: 4, raise_insufficient: false}

fmriprep is getting killed by our PBS scheduler with the following error message.

PBS: job killed: vmem 139623743488 exceeded limit 85899345920

We are running it with the following PBS directives.

#PBS -l nodes=1:ppn=4,walltime=12:00:00,vmem=80gb
#PBS -N app-run-fmriprep

I am wondering, although most of the fmriprep code is designed to honer memory_gb, or --mem-mb options, there are some parts of the code that tries to allocate more memory at least temporarily up to all available memory on the system? The node we are running has 256g of memory, but we don’t want it to more than 80gb so that multiple fmriprep jobs can run in parallel.

We are running fmriprep container through singularity

singularity run --cleanenv docker://poldracklab/fmriprep:1.2.6-1 …
This issue might be somehow related to singularity, but I am not sure…

Do you have any suggestions / comments

mitchem890 · July 18, 2019, 5:14pm

Maybe our cluster is setup differently but I was told we don’t have to explicitly set the vmem when running job.

here’s a sample of one of our fmriprep pbs’s:

#!/bin/bash
#PBS -l nodes=1:ppn=12,walltime=30:00:00,mem=64gb
#PBS -N fMRIPrep-150423
#PBS -j oe
#PBS -o fMRIPrep-150423.log

module load singularity-2.5.2

singularity run --cleanenv \
-B /scratch/mjeffers/fMRIPrep/sub-150423:/tmp \
/home/ccp_hcp/fmriprep/SingularityImages/fmriprep-1.1.7.simg \
--fs-license-file /tmp/.license/freesurfer/license.txt \
-w /tmp /tmp/fMRIPrep_input /tmp/fMRIPrep_output participant \
--participant_label 150423 \
--output-space T1w fsnative fsaverage fsaverage5 \
-n-cpus 12 \
--omp-nthreads 4 \
--mem-mb 64000 \
-vv --use-plugin /tmp/plugin.yml

and my plugin.yml:

plugin: LegacyMultiProc
plugin_args: {maxtasksperchild: 1, memory_gb: 64, n_procs: 12, raise_insufficient: false}`

Maybe try to just set the ‘mem’ parameter\

Thanks,
Mitch

effigies · July 18, 2019, 6:41pm

Hi @Soichi_Hayashi,

What’s your full fMRIPrep command? Just a warning that if you set --mem-mb, then it will override memory_gb in your plugin_args.

Do you have the stdout log? There is a possibility that we have some jobs that are not well-tagged with their expected memory consumption, so if you can identify which were running when the memory request was made, we can look into those specifically. It may help to use --verbose to ensure that we give better indications of which jobs are running at any given time.

Best,
Chris

Soichi_Hayashi · July 19, 2019, 12:31pm

@mitchem890 Which IU resource are you running this on? I believe “mem” is ignored on carbonate. I have to set vmem instead. Without setting vmem, carbonate defaults it to 16gb.

@effigies I just disabled the plugin parameters (to keep things simple). Here is the commands that I am running now - and the error message.

[stderr log]
++ singularity run --cleanenv docker://poldracklab/fmriprep:1.2.6-1 --notrack --resource-monitor --skip_bids_validation --stop-on-first-crash --mem-mb 30000 --omp-nthreads 8 --output-space fsaverage5 fsnative T1w template --template-resampling-grid=2mm --force-bbr --force-syn --skull-strip-template=NKI --fs-license-file=/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/.license.txt --work-dir=/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/ --participant_label=sub-TTTEMPSUB /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripinput/ /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripout/ participant
e[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[91mERROR : Could not write to /N/soft/rhel7/singularity/2.6.1/var/singularity/mnt/final/etc/localtime: Too many levels of symbolic links
e[0m/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/site-packages/nilearn/datasets/neurovault.py:16: DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
from collections import Container
/usr/local/miniconda/lib/python3.7/site-packages/skimage/init.py:80: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/pytest.py’ mode=‘r’ encoding=‘utf-8’>
imp.find_module(‘pytest’)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/site-packages/networkx/classes/reportviews.py:95: DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
from collections import Mapping, Set, Iterable
/usr/local/miniconda/lib/python3.7/site-packages/bids/layout/bids_layout.py:121: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripinput/dataset_description.json’ mode=‘r’ encoding=‘UTF-8’>
self.description = json.load(open(target, ‘r’))
/usr/local/miniconda/lib/python3.7/site-packages/grabbit/core.py:436: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/bids/layout/config/bids.json’ mode=‘r’ encoding=‘UTF-8’>
domain = json.load(open(domain, ‘r’))
/usr/local/miniconda/lib/python3.7/site-packages/bids/layout/bids_layout.py:208: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripinput/sub-TTTEMPSUB/ses-SSSES/func/sub-TTTEMPSUB_ses-SSSES_task-rest_bold.json’ mode=‘r’ encoding=‘utf-8’>
encoding=‘utf-8’))
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
Captured warning (<class ‘DeprecationWarning’>): Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
Captured warning (<class ‘ResourceWarning’>): unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/pytest.py’ mode=‘r’ encoding=‘utf-8’>
Captured warning (<class ‘ImportWarning’>): can’t resolve package from spec or package, falling back on name and path
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
/usr/local/miniconda/lib/python3.7/site-packages/nilearn/datasets/neurovault.py:16: DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
from collections import Container
/usr/local/miniconda/lib/python3.7/site-packages/skimage/init.py:80: ResourceWarning: unclosed file <_io.TextIOWrapper name=’/usr/local/miniconda/lib/python3.7/site-packages/pytest.py’ mode=‘r’ encoding=‘utf-8’>
imp.find_module(‘pytest’)
/usr/local/miniconda/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can’t resolve package from spec or package, falling back on name and path
return f(*args, **kwds)
Captured warning (<class ‘PendingDeprecationWarning’>): the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
Captured warning (<class ‘PendingDeprecationWarning’>): the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
=>> PBS: job killed: vmem 32281870336 exceeded limit 32212254720

[stdout - last few lines]
…
190719-08:21:38,185 nipype.workflow INFO:
[Node] Finished “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.fixhdr_skullstrip2”.
190719-08:21:40,103 nipype.workflow INFO:
[Node] Setting-up “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.combine_masks” in “/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/combine_masks”.
190719-08:21:40,124 nipype.workflow INFO:
[Node] Running “combine_masks” (“nipype.interfaces.fsl.maths.BinaryMaths”), a CommandLine Interface with command:
fslmaths /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/skullstrip_first_pass/ref_image_corrected_brain_mask.nii.gz -mul /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/fixhdr_skullstrip2/uni_xform_mask_xform.nii.gz /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/combine_masks/ref_image_corrected_brain_mask_maths.nii.gz
190719-08:21:40,459 nipype.workflow INFO:
[Node] Finished “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.combine_masks”.
190719-08:21:42,103 nipype.workflow INFO:
[Node] Setting-up “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.apply_mask” in “/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask”.
190719-08:21:42,121 nipype.workflow INFO:
[Node] Running “apply_mask” (“nipype.interfaces.fsl.maths.ApplyMask”), a CommandLine Interface with command:
fslmaths /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/fixhdr_unifize/uni_xform.nii.gz -mas /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/combine_masks/ref_image_corrected_brain_mask_maths.nii.gz /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/fmriprep_wf/single_subject_TTTEMPSUB_wf/func_preproc_ses_SSSES_task_rest_wf/bold_t1_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask/uni_xform_masked.nii.gz
190719-08:21:42,376 nipype.workflow INFO:
[Node] Finished “fmriprep_wf.single_subject_TTTEMPSUB_wf.func_preproc_ses_SSSES_task_rest_wf.bold_t1_trans_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.apply_mask”

As you can see, fmriprep seems to be (at least temporarily) exceeding the --mem-mb. Strangely, if I just rerun the job, the job sometimes succeeds. I believe this is because the batch scheduler isn’t quick enough to discover that the fmriprep is exceeding the requested memory size, and/or fmriprep only exceeds max memory size - occasionally… I am not sure which is which, but either way, fmriprep shouldn’t exceed the max memory. Have you run a detailed memory footprint analysis with fmriprep?

effigies · July 19, 2019, 1:35pm

Soichi_Hayashi:

singularity run --cleanenv docker://poldracklab/fmriprep:1.2.6-1 \
    --notrack --resource-monitor --skip_bids_validation \
    --stop-on-first-crash --mem-mb 30000 --omp-nthreads 8 \
    --output-space fsaverage5 fsnative T1w template \
    --template-resampling-grid=2mm --force-bbr --force-syn \
    --skull-strip-template=NKI \
    --fs-license-file=/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/.license.txt \
    --work-dir=/N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripworkdir/ \
    --participant_label=sub-TTTEMPSUB \
    /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripinput/ \
    /N/dc2/scratch/brlife/carbonate-workflows/5d29cd0f78d9fa1cee2a2474/5d31341178d9fa1cee2b9eeb/fmripout/
    participant

A couple things.

You say above that you use n_procs = 4 and here you have --omp-nthreads 8. What this means is that a job that can parallelize will claim 8 threads and be told it can use 8 threads, so if it’s a job that does fork-exec (more on this in a bit), you might get temporary bumps of 8x memory claims for that job.
I would actually recommend against using --omp-nthreads without a very good reason, because our default value is min(n-cpus - 1, 8) which tends to be pretty sensible.
You don’t specify --n-cpus here, which will mean it will try to use all of the CPUs available on the system. When using plugin.yml, this should be fine, but just a note for this run.

I don’t see anything in your logs that makes it clear which jobs were running at the time of death. Could you try -vv to bump up the verbosity?

I suspect it’s the latter. What we’ve seen in the past is a fork-exec issue on operating systems that have strict memory overcommit policies. Because fork-exec makes a copy of the original process with its full address space, the virtual memory doubles until the exec call. If you can overcommit, it’s fine, because you probably don’t actually touch more than one page, so copy-on-write saves you from ever actually allocating the memory. If you can’t, then the apparent memory usage can get catastrophically high. I’m guessing you’re in this latter category.

What we need is two memory estimates for every node: the actual memory consumed and the virtual memory claimed. If we strictly use the latter, then everybody’s performance drops, as the utilized memory will be far lower than desired on systems that don’t have strict overcommit policies.

One additional reason that restarting can sometimes bypass it is that it affects the relative start times of pairs of processes. If on the first run, two processes hit a memory (real or virtual) peak at the same time, the timing can be different on a second run.

@oesteban was working on the resource monitor at one point, with the idea of collecting stats on the memory and virtual memory footprints of all processes so we can see where our estimates are off. Possibly he can update on the status, but no, we don’t AFAIK have anything like what we would need to definitively address this at this point.

In the short term, I can make some recommendations:

Use LegacyMultiProc plugin. This is able to take advantage of a technique for cheaper forking.
Under-estimate --mem-mb.
Leave --omp-nthreads unset or set it to < n_procs. Once you get something running and want to optimize, you can try tweaking, but I don’t think you’re going to get much out of it.
Re-run.

I’m sorry that we don’t have a fully satisfactory option at this point. It would be nice if we could actually constrain the jobs, but nipype does not have that capability.

Soichi_Hayashi · July 19, 2019, 7:23pm

@effigies Thanks for th tip!

I’ve updated the plugin config with the following content

plugin: LegacyMultiProc
plugin_args: {maxtasksperchild: 1, memory_gb: 28, n_procs: 8, raise_insufficient: false}

Running with the following PBS directives.

#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=3:00:00,vmem=30gb
#PBS -N app-run-fmriprep

I then updated the singularity command line with the following (removed cpu/memory related options)

singularity run --cleanenv docker://poldracklab/fmriprep:1.2.6-1 --notrack --resource-monitor --skip_bids_validation --stop-on-first-crash --use-plugin=multi_proc.yml --output-space fsaverage5 fsnative T1w template --template-resampling-grid=2mm --force-bbr --force-syn --skull-strip-template=NKI --fs-license-file=.license.txt --work-dir=fmripworkdir/ --participant_label=sub-TTTEMPSUB fmripinput/ fmripout/ participant
e[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[33mWARNING: Warning reading tar header: Ignoring malformed pax extended attribute
e[0me[91mERROR  : Could not write to /N/soft/rhel7/singularity/2.6.1/var/singularity/mnt/final/etc/localtime: Too many levels of symbolic links
e[0m=>> PBS: job killed: vmem 36544569344 exceeded limit 32212254720

As you can see, the fmriprep is still allocating >28gb of vmem (aggregate) near the startup time. I’ve checked the overcommit setting and it seems to be enabled (/proc/sys/vm/overcommit_memory set to 0). By the way, all of our HPC / cloud compute resources have swap memory disabled… I know most desktop / laptop type environment all has swap memory enabled. This might be contributing to the problem, I am not sure.

When fmriprep spikes with the VM, maybe this can be avoided by staggering the forking for each process?

effigies · July 19, 2019, 8:10pm

This seems very strange… @oesteban have you seen anything like this?

I’m inclined to think that doesn’t matter in practice if you’re setting a vmem limit, instead of mem, but good to know. I hadn’t considered that the batch system and the OS might treat virtual memory differently.

This is doable in principle by adding a new argument to the plugin in Nipype, but I haven’t scoped out how involved it would be.

effigies · July 19, 2019, 8:14pm

Realized this was getting long, so moved to a new thread. Also, to be clear, is the 36.5GB allocation during workflow run? fMRIPrep spends a while preparing the workflow, and if we’re exceeding the memory limits at that point, that’s a different problem altogether.

oesteban · July 19, 2019, 9:46pm

That peak in vmem allocation at the very beginning is in agreement with my findings when I most intensely worked on this: https://github.com/nipy/nipype/issues/2776#issuecomment-439568215

Soichi_Hayashi · July 20, 2019, 3:01pm

According to brainlife’s task info, the spike happens at the very beginning, and it’s from processes called “pandoc-citeproc html” and “pandoc”. I know it’s kind of hard to see with all the colors… but I can send you the raw data if you are interested.

We only polls memory stat every 30 seconds, so there might be other processes that are spiking … but not showing up on our graph.

faskowit · July 20, 2019, 6:51pm

Just wanted to echo @Soichi_Hayashi’s experience…having run fMRIPrep (ver 1.2.5… so a while back) on the IU clusters, for some subjects (no apparent pattern about which subjects, perhaps ~10% of subjects) I suspect there was a massive ‘spike’ of memory requested at the beginning of the jobs, exceeding my vmem allocation which was 124gb. In my experience (the logs have since been wiped unfortunately…) these jobs were killed within 10 seconds of the node running. Reran the subjects with even more vmem requested and then they succeeded. Hope this bit of info could help y’all!

oesteban · July 20, 2019, 9:09pm

this is super-useful, thanks!

EDIT: pandoc is a memory hog (its devs’ words, not mine).

oesteban · July 20, 2019, 9:30pm

@Soichi_Hayashi, I’m going to get this fixed ASAP - stay tuned!

Soichi_Hayashi · July 22, 2019, 6:58pm

@oesteban Thanks for the quick action! I see you are working on the fix already. On the thread you mentioned from pandoc, it says this also > https://github.com/jgm/pandoc/issues/3169#issuecomment-496360494

If pandoc is compiled with -rtsopts (as it should be by default unless someone specifically turns this off), then you can specify the maximum heap size as follows:
pandoc +RTS -M30m -RTS -f markdown -t html MANUAL.txt
The +RTS -M30m -RTS sets max heap size to 40M. Max stack size defaults to 80% heap size.

You’ve probably seen that already… but just FYI.

oesteban · July 22, 2019, 9:06pm

Yep, I saw that. The problem is that I don’t think it is worth recompiling pandas to put it in our docker image :(. I think people can live having their individual reports without boilerplate if they hit the max VMS allocation. Would this be a particular problem for you?

effigies · July 22, 2019, 10:28pm

Have we verified that the version installed doesn’t allow us to add those parameters?

oesteban · July 22, 2019, 10:47pm

Nope, that would be very easy of course

Soichi_Hayashi · August 7, 2019, 1:27am

Hello! Is there any update to this issue?

oesteban · August 7, 2019, 4:45am

Can you give 1.5.0rc1 a try? That pre-release includes the fix.

Soichi_Hayashi · November 9, 2019, 2:42pm

@oesteban We updated our script to use 1.5.1rc2 container.

Our jobs are finishing successfully about 80% of the time, but some are still failing. The process finished almost all the way to the end, and I see this in the log at the end.

fMRIPrep failed: Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/legacymultiproc.py", line 69, in run_node
    result['result'] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 479, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 569, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 662, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 382, in run
    runtime = self._run_interface(runtime)
  File "/usr/local/miniconda/lib/python3.7/site-packages/niworkflows/interfaces/nilearn.py", line 132, in _run_interface
    new_nii.to_filename(self._results['out_file'])
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/filebasedimages.py", line 335, in to_filename
    self.to_file_map()
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/analyze.py", line 1049, in to_file_map
    arr_writer.to_fileobj(imgf)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/arraywriters.py", line 562, in to_fileobj
    nan2zero=self._needs_nan2zero())
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/volumeutils.py", line 660, in array_to_file
    pre_clips=pre_clips)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/volumeutils.py", line 833, in _write_data
    fileobj.write(dslice.tostring())
  File "/usr/local/miniconda/lib/python3.7/site-packages/nibabel/openers.py", line 220, in write
    return self.fobj.write(*args, **kwargs)
  File "/usr/local/miniconda/lib/python3.7/gzip.py", line 264, in write
    self.fileobj.write(self.compress.compress(data))
OSError: [Errno 28] No space left on device

=>> PBS: job killed: vmem 22177083392 exceeded limit 17179869184
/usr/local/miniconda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 5 leaked semaphores to clean up at shutdown
  len(cache))

It’s intresting that I see both “No space left on device” and “vmem exceeded limit” at the same time. Does fmriprep use some kind of a memory mapped tmp space? Maybe it’s using too much of tmpfs?