Fmriprep - autorecon3 error: missing *h.inflated / antsRegistration error: segmentation fault

pjkohler · June 25, 2018, 10:57pm

Hello,

I have been trying to get a BIDS data set analyzed with fmriprep. I have initially been using a local Mac OSX machine, on which I have been able to analyze data from several subjects. This machine did have several issues, however: unpredictable crashes of docker, massive cpu over-usage despite limiting the number of cpus in the docker settings, occasional system-level crashes. After some googling it appears that those issues might be fundamental issues of using docker on a mac, rather than fmriprep issues. So let’s put them aside for now.

Because of the issues, I decided to do fmriprep on Stanford’s Sherlock HPC cluster, which is going a lot better. I use an sbatch command like this:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --time=48:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem-per-cpu=2G
#SBATCH -p normal

module load system singularity
singularity run ${SCRATCH}/singularity/fmriprep_latest-2018-02-23-98613986e605.img ${SCRATCH}/raw/headmodel ${SCRATCH}/analyzed/headmodel participant --participant_label 0019 --nthreads 18 --omp-nthreads 16 -w ${SCRATCH}/temp --output-space T1w fsnative --fs-license-file ${HOME}/license.txt

I do however still get some errors. One type of error is a recon-all error I only get when I try to run two (rather than one) subjects at the same time (“participant_label 0016 0017”, say). I am able to run each of these subjects on their own, so I assume it is some kind of resource management error. Quoting the relevant part of the crash log here:

Traceback (most recent call last):
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/plugins/multiproc.py”, line 68, in run_node
result[‘result’] = node.run(updatehash=updatehash)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/engine/nodes.py”, line 487, in run
result = self._run_interface(execute=True)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/engine/nodes.py”, line 571, in _run_interface
return self._run_command(execute)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/engine/nodes.py”, line 650, in _run_command
result = self._interface.run(cwd=outdir)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/interfaces/base/core.py”, line 516, in run
runtime = self._run_interface(runtime)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/interfaces/base/core.py”, line 1023, in _run_interface
self.raise_exception(runtime)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/interfaces/base/core.py”, line 960, in raise_exception
).format(**runtime.dictcopy()))
RuntimeError: Command:
recon-all -autorecon3 -hemi lh -openmp 8 -subjid sub-0017 -sd /scratch/users/pjkohler/analyzed/headmodel/freesurfer
Standard output:
Subject Stamp: freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.1-f53a55a
Current Stamp: freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.1-f53a55a
INFO: SUBJECTS_DIR is /scratch/users/pjkohler/analyzed/headmodel/freesurfer
Actual FREESURFER_HOME /opt/freesurfer
-rw-rw-r-- 1 pjkohler amnorcia 37262 Jun 22 19:56 /scratch/users/pjkohler/analyzed/headmodel/freesurfer/sub-0017/scripts/recon-all.log
Linux sh-101-03.int 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
‘/opt/freesurfer/bin/recon-all’ -> ‘/scratch/users/pjkohler/analyzed/headmodel/freesurfer/sub-0017/scripts/recon-all.local-copy’
#--------------------------------------------
#@# Sphere lh Fri Jun 22 22:34:10 UTC 2018
/scratch/users/pjkohler/analyzed/headmodel/freesurfer/sub-0017/scripts

mris_sphere -rusage /scratch/users/pjkohler/analyzed/headmodel/freesurfer/sub-0017/touch/rusage.mris_sphere.lh.dat -seed 1234 …/surf/lh.inflated …/surf/lh.sphere

setting seed for random number genererator to 1234
$Id: mris_sphere.c,v 1.61 2016/01/20 23:42:15 greve Exp $
$Id: mrisurf.c,v 1.781.2.6 2016/12/27 16:47:14 zkaufman Exp $
MRISread(…/surf/lh.inflated): could not open file

== Number of threads available to mris_sphere for OpenMP = 8 ==
No such file or directory
mris_sphere: could not read surface file …/surf/lh.inflated
No such file or directory
Linux sh-101-03.int 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

recon-all -s sub-0017 exited with ERRORS at Fri Jun 22 22:34:10 UTC 2018

There are similar crash logs for the other hemisphere and for the other subject.

The other error seems more serious, because it happens consistently for some subjects but not others (3-4 of ~20 so far). It appears to be an error with antsRegistration, crash log here:

Traceback (most recent call last):
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/plugins/multiproc.py”, line 68, in run_node
result[‘result’] = node.run(updatehash=updatehash)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/engine/nodes.py”, line 487, in run
result = self._run_interface(execute=True)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/engine/nodes.py”, line 571, in _run_interface
return self._run_command(execute)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/engine/nodes.py”, line 650, in _run_command
result = self._interface.run(cwd=outdir)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/interfaces/base/core.py”, line 516, in run
runtime = self._run_interface(runtime)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/interfaces/report_base.py”, line 51, in _run_interface
ReportCapableInterface, self)._run_interface(runtime)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/interfaces/fixes.py”, line 43, in _run_interface
runtime, correct_return_codes)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/interfaces/ants/registration.py”, line 948, in _run_interface
runtime = super(Registration, self)._run_interface(runtime)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/interfaces/base/core.py”, line 1023, in _run_interface
self.raise_exception(runtime)
File “/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/interfaces/base/core.py”, line 960, in raise_exception
).format(**runtime.dictcopy()))
RuntimeError: Command:
antsRegistration --collapse-output-transforms 1 --dimensionality 3 --float 1 --initialize-transforms-per-stage 0 --interpolation LanczosWindowedSinc --output [ transform, transform_Warped.nii.gz, transform_InverseWarped.nii.gz ] --transform Translation[ 0.05 ] --metric Mattes[ /scratch/users/pjkohler/temp/fmriprep_wf/single_subject_0018_wf/func_preproc_ses_01_task_MT_run_03_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask/uni_xform_masked.nii.gz, /scratch/users/pjkohler/temp/fmriprep_wf/single_subject_0018_wf/func_preproc_ses_01_task_MT_run_03_wf/fmap_wf/bet/sub-0018_ses-01_run-01_magnitude_ras_squeezed_mcf_corrected_brain.nii.gz, 1, 64, Random, 0.5 ] --convergence [ 500, 1e-07, 200 ] --smoothing-sigmas 8.0mm --shrink-factors 2 --use-estimate-learning-rate-once 1 --use-histogram-matching 1 --transform Affine[ 0.01 ] --metric Mattes[ /scratch/users/pjkohler/temp/fmriprep_wf/single_subject_0018_wf/func_preproc_ses_01_task_MT_run_03_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask/uni_xform_masked.nii.gz, /scratch/users/pjkohler/temp/fmriprep_wf/single_subject_0018_wf/func_preproc_ses_01_task_MT_run_03_wf/fmap_wf/bet/sub-0018_ses-01_run-01_magnitude_ras_squeezed_mcf_corrected_brain.nii.gz, 1, 64, Random, 0.5 ] --convergence [ 200, 1e-08, 100 ] --smoothing-sigmas 2.0mm --shrink-factors 1 --use-estimate-learning-rate-once 1 --use-histogram-matching 1 --winsorize-image-intensities [ 0.005, 0.998 ] --write-composite-transform 1
Standard output:

Standard error:
Segmentation fault
Return code: 139

Happy to try any changes to my sbatch command that might address this issue.

Thanks!

/Peter

ChrisGorgolewski · June 26, 2018, 2:35am

That’s surprising. I have been using Docker for Windows on a regular basis and it has been pretty stable for me (with occasional issues with mounting the working directory with -w - I just skip that and use the default which is a directory withing the container).

This is unusual - the two subjects run in separate directories and should not interfere with each other (unless you specify the same participant label twice). We have been running multiple subjects per fmriprep run before and never run into this issue. Was there anything else about this run? Like different working directory?

Segmentation faults are always tricky to debug. This could be a memory issue, although since this step is coregistering magnitude image with BOLD reference image it should not be that resource consuming. Maybe wrong image was use as magnitude when you prepared the dataset? Despite the crash there should be a report generated - if you could share it here we might be able to investigate more.

I don’t think this is the reason for your issues, but there are some weird things about your sbatch. First you are asking SLURM for 20 cores, but then limiting fmriprep to 18. Secondly you are setting --omp-nthreads which in general is not necessary (fmriprep will find an optimal value for you) and we don’t really see much of an improvement above setting it to 8. It might be also good to try --mem instead of --mem-per-cpu.

pjkohler · June 26, 2018, 4:55am

What I am describing is not an issue with mounting directories or getting fmriprep started, it is during use. It is closer to what is being described here, although none of those fixes alleviate the issue for me. mac Os 10.13.5.

Nothing unusual about the two subjects, same experiment, same directory. I am forced to conclude that it has something to do with running multiple subjects, since running each subject separately, immediately afterwards, worked fine. I tried it with two subjects twice and got the same error, before running each separately, which should rule out some kind of fluke.

I checked the report and think there is something very wrong with the field maps I am trying to apply with these older scans and that they are wrong for everyone, but only fail on some. Will run with “–ignore fieldmaps” now to test this hypothesis. I suppose it is even possible that the multiple subjects error is also somehow a result of these improper field maps.

Okay, I will ask SLURM and fmriprep to use the same number of cores, not set --omp, and try mem instead of mem-per-cpu. Thanks.

pjkohler · June 26, 2018, 6:04pm

Okay, okay, okay. I was able to run five subjects through with “–ignore fieldmaps”. I have a feeling this is because my wonky field maps were causing both issues, rather than due to the minor changes I made to sbatch. So the only real complaint I have is that fmriprep seem to fail rather ungracefully when giving wonky field maps. I will let you know if anything else comes up.

ChrisGorgolewski · June 26, 2018, 6:06pm

Thanks for following up. It would be interesting to see how “wonky” fieldmaps look like.

pjkohler · June 26, 2018, 6:11pm

I honestly have no idea what we were doing with those field maps. They do not have the same coverage as the functional scans. They were collected right after I joined the lab, and were never incorporated into our analysis pipeline. The “topup” field maps we are collecting now are analyzed w/o problems and look fine.

btw, “wonky” is a technical term.

ChrisGorgolewski · July 1, 2018, 10:37pm

It seems that this bug has been fixed version 8.05.0-ce-mac67 2018-06-07 of Docker for MAc (part of the Edge channel). See https://docs.docker.com/docker-for-mac/edge-release-notes/