Issue leading fmriprep till the end on HPC for 76 subjects in parallel

ldaumail · March 28, 2019, 2:58pm

Dear fmriprep experts,

I am currently trying to run fmriprep (latest version on a singularity version 3.01 image) for 76 subjects in parallel, for a dataset countaining T1w, T2w, T2star and 3 bold functional files per subject.
I tried running it multiple times, and several issues came up:
-for the same subject, the recon-all output was different across the multiple runs, among all the subjects. Would you have an explanation to this?
The differences across the runs are the 2 cases as follows:
1-When the recon-all doesn’t make it completely, it skips some steps, especially for these files
aparc.a2009s+aseg.mgz
aparc.DKTatlas+aseg.mgz
and less frequently: aseg.mgz

I don’t get any error in the recon-all.log file, except that I see that recon-all skips these steps. In the /output/fmriprep/log/ log file (crash file) it is written that recon-all errors occured. And on the visual html report, I see “_autorecon30” or “_autorecon31”.

2- When recon-all went well, then I get a different type of error. There is no error in the recon-all.log file, and regarding the error in the /output/fmriprep/log/ log file, I obtain this:

Node: fmriprep_wf.single_subject_004_wf.func_preproc_ses_1_task_compassion_wf.bold_surf_wf.medial_nans
Working directory: /mnt/data/loic2/work/fmriprep_wf/single_subject_004_wf/func_preproc_ses_1_task_compassion_wf/bold_surf_wf/_hemi_lh/medial_nans

Node inputs:

in_file = ['/mnt/data/loic2/work/fmriprep_wf/single_subject_004_wf/func_preproc_ses_1_task_compassion_wf/bold_surf_wf/_hemi_lh/sampler/mapflow/_sampler0/lh.fsaverage5.gii']
subjects_dir = <undefined>
target_subject = ['fsaverage5']

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node
    result['result'] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 1253, in _run_interface
    self.config['execution']['stop_on_first_crash'])))
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 1128, in _collate_results
    for i, nresult, err in nodes:
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 99, in nodelist_runner
    result = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 557, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 637, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 369, in run
    runtime = self._run_interface(runtime)
  File "/usr/local/miniconda/lib/python3.7/site-packages/niworkflows/interfaces/freesurfer.py", line 328, in _run_interface
    newpath=runtime.cwd)
  File "/usr/local/miniconda/lib/python3.7/site-packages/niworkflows/interfaces/freesurfer.py", line 463, in medial_wall_to_nan
    darray.data[medial] = np.nan
ValueError: assignment destination is read-only

Moreover, in the visual report, I obtained this error (knowing that my first task in the resting state bold data is compassion) :

fmriprep_wf.single_subject_004_wf.func_preproc_ses_1_task_compassion_wf.bold_surf_wf.medial_nans

Finally, in this case (2), considered as most advanced run, I don’t get any ‘/output/fmriprep/func/’ directory and this doesn’t seem normal to me.

I am running my jobs on a cluster composed of 12 nodes, each of which have 20 processors composed of 2 threads each. Making a total of 40 threads per node and 480 threads total.

I read on previous reports that fmriprep wasn’t tested and reliable on data sets superior to 4 subjects ( for jobs in parallel) , would this really be an issue?
I also think that my functional data might have issues such as oversized brains…would it be an issue if some voxels go empty due to split parts of the brain for the registration?

Here is my script that I run on slurm (sbatch):

#! /bin/bash
#
#SBATCH --job-name=limited_mindfcomp_22_03

#SBATCH --output=limited_mindfcomp_22_03.txt

#SBATCH --error=limited_mindfcomp_22_03.err

#SBATCH --cpus-per-task=20
#SBATCH --array=1-76

SUBJ=(002 004 005 007 010 011 012 014 016 017 018 022 025 026 028 029 030 032 034 035 036 037 038 040 042 050 052 053 054 055 056 057 058 059 060 062 063 064 065 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 087 089 090 091 092 093 094 095 096 097 098 099 101 102 103 104 105 106 108 109)

singularity run --cleanenv -B /mnt:/mnt /mnt/data/singularity_images/fmriprep-latest.simg /mnt/data/loic2/RSBIDS4 /mnt/data/loic2/fmriprep_output_tw_less participant --participant-label ${SUBJ[$SLURM_ARRAY_TASK_ID-1]} --low-mem --stop-on-first-crash --medial-surface-nan --use-aroma --cifti-output --notrack --output-space template fsaverage5 --fs-license-file /mnt/data/loic2/license.txt

Thanks in advance for any suggestion.

Best,

Loïc

oesteban · March 28, 2019, 6:51pm

No, it shouldn’t. I’d guess recon-all is reaching to different points of the workflow. Are you cleaning up the *IsRunning* files from the freesurfer folder?

This is surprising because those nodes will check that the recon-all log contains “finished without error”. Could you post the contents of the crashfiles corresponding to these _autorecon3{0,1} problems?

ldaumail:

2- When recon-all went well, then I get a different type of error. There is no error in the recon-all.log file, and regarding the error in the /output/fmriprep/log/ log file, I obtain this:

Node: fmriprep_wf.single_subject_004_wf.func_preproc_ses_1_task_compassion_wf.bold_surf_wf.medial_nans
Working directory: /mnt/data/loic2/work/fmriprep_wf/single_subject_004_wf/func_preproc_ses_1_task_compassion_wf/bold_surf_wf/_hemi_lh/medial_nans

Node inputs:

in_file = ['/mnt/data/loic2/work/fmriprep_wf/single_subject_004_wf/func_preproc_ses_1_task_compassion_wf/bold_surf_wf/_hemi_lh/sampler/mapflow/_sampler0/lh.fsaverage5.gii']
subjects_dir = <undefined>
target_subject = ['fsaverage5']

I’d bet this is derived from a faulty execution of recon-all. I’d try to get recon-all to work first.

ldaumail · March 29, 2019, 10:35am

Hi, thanks for your response!
I am not sure where I should find the “Is Running” freesurfer file… I didn’t find any!

Here is what I get in the _autorecon30 crash file:

Node: _autorecon30
Working directory: /mnt/data/loic2/work/fmriprep_wf/single_subject_002_wf/anat_preproc_wf/surface_recon_wf/autorecon_resume_wf/autorecon3/mapflow/_autorecon30

Node inputs:

FLAIR_file = <undefined>
T1_files = <undefined>
T2_file = <undefined>
args = <undefined>
big_ventricles = <undefined>
brainstem = <undefined>
directive = autorecon3
environ = {}
expert = <undefined>
flags = <undefined>
hemi = lh
hippocampal_subfields_T1 = <undefined>
hippocampal_subfields_T2 = <undefined>
hires = <undefined>
mprage = <undefined>
mri_aparc2aseg = <undefined>
mri_ca_label = <undefined>
mri_ca_normalize = <undefined>
mri_ca_register = <undefined>
mri_edit_wm_with_aseg = <undefined>
mri_em_register = <undefined>
mri_fill = <undefined>
mri_mask = <undefined>
mri_normalize = <undefined>
mri_pretess = <undefined>
mri_remove_neck = <undefined>
mri_segment = <undefined>
mri_segstats = <undefined>
mri_tessellate = <undefined>
mri_watershed = <undefined>
mris_anatomical_stats = <undefined>
mris_ca_label = <undefined>
mris_fix_topology = <undefined>
mris_inflate = <undefined>
mris_make_surfaces = <undefined>
mris_register = <undefined>
mris_smooth = <undefined>
mris_sphere = <undefined>
mris_surf2vol = <undefined>
mrisp_paint = <undefined>
openmp = 8
parallel = <undefined>
subject_id = sub-002
subjects_dir = /mnt/data/loic2/fmriprep_output_tw_less/freesurfer
talairach = <undefined>
use_FLAIR = False
use_T2 = True
xopts = <undefined>

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node
    result['result'] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 557, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 637, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 369, in run
    runtime = self._run_interface(runtime)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 752, in _run_interface
    self.raise_exception(runtime)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 689, in raise_exception
    ).format(**runtime.dictcopy()))
RuntimeError: Command:
recon-all -autorecon3 -hemi lh -openmp 8 -subjid sub-002 -sd /mnt/data/loic2/fmriprep_output_tw_less/freesurfer -T2pial -nosphere -nosurfreg -nojacobian_white -noavgcurv -nocortparc -nopial
Standard output:
Subject Stamp: freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.1-f53a55a
Current Stamp: freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.1-f53a55a
INFO: SUBJECTS_DIR is /mnt/data/loic2/fmriprep_output_tw_less/freesurfer
Actual FREESURFER_HOME /opt/freesurfer
-rw-rw-r-- 1 loic.daumail crnl 150758 Mar 22 14:38 /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/scripts/recon-all.log
Linux node2 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
'/opt/freesurfer/bin/recon-all' -> '/mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/scripts/recon-all.local-copy'
#--------------------------------------------
#@# Refine Pial Surfs w/ T2/FLAIR Fri Mar 22 15:54:18 UTC 2019

 bbregister --s sub-002 --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz --lta /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.lta --init-coreg --T2 

tmp /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095
Log file is /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.dat.log
Fri Mar 22 15:54:18 UTC 2019

setenv SUBJECTS_DIR /mnt/data/loic2/fmriprep_output_tw_less/freesurfer
cd /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/scripts
/opt/freesurfer/bin/bbregister --s sub-002 --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz --lta /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.lta --init-coreg --T2

$Id: bbregister,v 1.75 2016/05/10 20:02:28 greve Exp $
Linux node2 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
FREESURFER_HOME /opt/freesurfer
mri_convert /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii
mri_convert.bin /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii 
$Id: mri_convert.c,v 1.226 2016/02/26 16:15:24 mreuter Exp $
reading from /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz...
TR=2500.00, TE=0.00, TI=0.00, flip angle=0.00
i_ras = (0.997575, 0.0555938, -0.0418801)
j_ras = (-0.0509711, 0.993234, 0.104349)
k_ras = (0.0473979, -0.101961, 0.993659)
writing to /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii...
mri_coreg --s sub-002 --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii --regdat /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/reg.init.dat --reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/mri_coreg.lta --nthreads 8 --dof 6 --sep 4 --ftol .0001 --linmintol .01 --no-ref-mask

$Id: mri_coreg.c,v 1.27 2016/04/30 15:11:49 greve Exp $
cwd /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/scripts
cmdline mri_coreg --s sub-002 --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii --regdat /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/reg.init.dat --reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/mri_coreg.lta --nthreads 8 --dof 6 --sep 4 --ftol .0001 --linmintol .01 --no-ref-mask 
sysname  Linux
hostname node2
machine  x86_64
user     loic.daumail
dof    6
nsep    1
cras0    1
ftol    0.000100
linmintol    0.010000
bf       1
bflim    30.000000
bfnsamp    30
SmoothRef 0
SatPct    99.990000
MovOOB 0
optschema 1
Reading in mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii
Reading in ref /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/brainmask.mgz
Setting cras translation parameters to align centers
Creating random numbers for coordinate dithering
Performing intensity dithering
Initial parameters -0.9970  0.0266 -1.0281  0.0000  0.0000  0.0000  1.0000  1.0000  1.0000  0.0000  0.0000  0.0000 
Separation list (1):  4   min = 4
DoSmoothing 1
DoCoordDither 1
DoIntensityDither 1
nitersmax 4
ftol 1.000e-04
linmintol 1.000e-02
SatPct 99.990000
Hist FWHM 7.000000 7.000000
nthreads 8
movsat = 804.0000
mov gstd 1.8914 1.8914 1.8914
Smoothing mov
refsat = 116.0000
ref gstd 1.8914 1.8914 1.8914
Smoothing ref
COREGpreproc() done
Testing if mov and target overlap
Init cost   -1.0568892680
nhits = 170008 out of 16777216, Percent Overlap:  64.9
Initial  RefRAS-to-MovRAS
 1.00000   0.00000   0.00000  -0.99701;
 0.00000   1.00000   0.00000   0.02655;
 0.00000   0.00000   1.00000  -1.02806;
 0.00000   0.00000   0.00000   1.00000;
Initial  RefVox-to-MovVox
-0.99757   0.04188   0.05559   202.26289;
 0.05097  -0.10435   0.99323   7.66836;
-0.04740  -0.99366  -0.10196   273.23480;
 0.00000   0.00000   0.00000   1.00000;
sep = 4 -----------------------------------
COREGoptBruteForce() 30 1 30
Turning on MovOOB for BruteForce Search
#BF# sep= 4 iter=0 lim=30.0 delta=2.00   1.00299   0.02655   0.97194   0.00000   0.00000   0.00000   -1.0467087
Turning  MovOOB back off after brute force search


---------------------------------
Init Powel Params dof = 6
Starting OpenPowel2(), sep = 4
InitialCost        -1.0572760105 
#@#  4  188  1.00299 0.02655 0.97194 0.00000 0.00000 0.00000   -1.0572760
fs_powell::minimize
  nparams 6
  maxfev 4
  ftol   0.000100
  linmin_xtol_   0.010000
  powell nthiter 0: fret = -1.057276
#@#  4  191  -0.61504 0.02655 0.97194 0.00000 0.00000 0.00000   -1.0573967
#@#  4  192  0.11910 0.02655 0.97194 0.00000 0.00000 0.00000   -1.0577884
#@#  4  196  0.13861 0.02655 0.97194 0.00000 0.00000 0.00000   -1.0577886
#@#  4  199  0.13861 1.02655 0.97194 0.00000 0.00000 0.00000   -1.0580703
#@#  4  203  0.13861 0.64458 0.97194 0.00000 0.00000 0.00000   -1.0580761
#@#  4  204  0.13861 0.81765 0.97194 0.00000 0.00000 0.00000   -1.0580909
#@#  4  209  0.13861 0.79503 0.97194 0.00000 0.00000 0.00000   -1.0580913
#@#  4  210  0.13861 0.80503 0.97194 0.00000 0.00000 0.00000   -1.0580916
#@#  4  213  0.13861 0.80503 -0.64609 0.00000 0.00000 0.00000   -1.0586066
#@#  4  214  0.13861 0.80503 -0.16249 0.00000 0.00000 0.00000   -1.0588244
#@#  4  218  0.13861 0.80503 -0.08688 0.00000 0.00000 0.00000   -1.0588297
#@#  4  223  0.13861 0.80503 -0.09688 1.00000 0.00000 0.00000   -1.0591702
#@#  4  228  0.13861 0.80503 -0.09688 1.02644 0.00000 0.00000   -1.0591707
#@#  4  235  0.13861 0.80503 -0.09688 1.02644 -0.61803 0.00000   -1.0591748
#@#  4  237  0.13861 0.80503 -0.09688 1.02644 -0.31850 0.00000   -1.0591984
#@#  4  238  0.13861 0.80503 -0.09688 1.02644 -0.32850 0.00000   -1.0591986
#@#  4  239  0.13861 0.80503 -0.09688 1.02644 -0.35713 0.00000   -1.0591999
#@#  4  249  0.13861 0.80503 -0.09688 1.02644 -0.35713 0.38197   -1.0592102
#@#  4  250  0.13861 0.80503 -0.09688 1.02644 -0.35713 0.22318   -1.0592223
  powell nthiter 1: fret = -1.059222
#@#  4  260  0.07486 0.80503 -0.09688 1.02644 -0.35713 0.22318   -1.0592262
#@#  4  272  0.07486 0.77989 -0.09688 1.02644 -0.35713 0.22318   -1.0592269
#@#  4  281  0.07486 0.77989 0.04400 1.02644 -0.35713 0.22318   -1.0592389
#@#  4  283  0.07486 0.77989 0.02400 1.02644 -0.35713 0.22318   -1.0592394
#@#  4  303  0.07486 0.77989 0.02400 1.02644 -0.31540 0.22318   -1.0592400
Powell done niters total = 1
OptTimeSec  6.7 sec
OptTimeMin  0.11 min
nEvals 319
Final parameters   0.07486256   0.77989459   0.02399519   1.02643967  -0.31539670   0.22318122 
Final cost   -1.059239958466621
 

---------------------------------
mri_coreg utimesec    260.153866
mri_coreg stimesec    2.444657
mri_coreg ru_maxrss   471136
mri_coreg ru_ixrss    0
mri_coreg ru_idrss    0
mri_coreg ru_isrss    0
mri_coreg ru_minflt   863477
mri_coreg ru_majflt   20
mri_coreg ru_nswap    0
mri_coreg ru_inblock  1054
mri_coreg ru_oublock  16
mri_coreg ru_msgsnd   0
mri_coreg ru_msgrcv   0
mri_coreg ru_nsignals 0
mri_coreg ru_nvcsw    8147
mri_coreg ru_nivcsw   335
Final  RefRAS-to-MovRAS
 0.99998   0.00390  -0.00550   0.07486;
-0.00380   0.99983   0.01791   0.77989;
 0.00557  -0.01789   0.99982   0.02400;
 0.00000   0.00000   0.00000   1.00000;
Final  RefVox-to-MovVox
-0.99711   0.04637   0.06022   202.34358;
 0.05416  -0.12240   0.99100   10.01273;
-0.05332  -0.99140  -0.11954   276.70767;
 0.00000   0.00000   0.00000   1.00000;
Final parameters  0.0749  0.7799  0.0240  1.0264 -0.3154  0.2232 
nhits = 168792 out of 16777216, Percent Overlap:  64.4
mri_coreg RunTimeSec 49.4 sec
To check run:
   tkregisterfv --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii --targ /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/brainmask.mgz --reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/mri_coreg.lta --s sub-002 --surfs 

mri_coreg done

mri_segreg --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii --init-reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/reg.init.dat --out-reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/bbr.pass1.dat --subsamp-brute 100 --subsamp 100 --tol 1e-4 --tol1d 1e-3 --brute -4 4 4 --surf white --gm-proj-frac 0.5 --gm-gt-wm 0.5
$Id: mri_segreg.c,v 1.113 2016/05/10 03:23:20 greve Exp $
setenv SUBJECTS_DIR /mnt/data/loic2/fmriprep_output_tw_less/freesurfer
cd /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/scripts
mri_segreg --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii --init-reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/reg.init.dat --out-reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/bbr.pass1.dat --subsamp-brute 100 --subsamp 100 --tol 1e-4 --tol1d 1e-3 --brute -4 4 4 --surf white --gm-proj-frac 0.5 --gm-gt-wm 0.5 
sysname  Linux
hostname node2
machine  x86_64
user     loic.daumail
movvol /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/template.nii
regfile /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/reg.init.dat
subject sub-002
dof 6
outregfile /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/bbr.pass1.dat
UseMask 0
UseLH 1
UseRH 1
nsubsamp 100
PenaltySign  -1
PenaltySlope 0.500000
PenaltyCenter 0.000000
surfname white
GMProjFrac 0.500000
WMProjAbs 2.000000
lhcostfile (null)
rhcostfile (null)
interp  trilinear (1)
frame  0
TolPowell 0.000100
nMaxItersPowell 36
n1dmin  3
Profile   0
Gdiag_no  -1
AddNoise  0 (0)
SynthSeed 1554181523
TransRandMax 0.000000
RotRandMax 0.000000
Translations 0.000000 0.000000 0.000000
Rotations   0.000000 0.000000 0.000000
Input reg
-0.99711  -0.06022   0.04637  -0.35699;
 0.05332  -0.11954   0.99140  -0.31728;
 0.05416  -0.99100  -0.12240  -0.12563;
 0.00000   0.00000   0.00000   1.00000;

Loading mov
Projecting LH Surfs
Loading lh.white surf
Loading lh.thickness for GM
GM Proj: 1 0.500000 2.000000
WM Proj: 0 0.500000 2.000000
Projecting RH Surfs
Loading rh.white surf
Loading rh.thickness for GM
Projecting RH Surfs
Using lh.cortex.label
Using rh.cortex.label
Computing relative cost
 0  -25.0 -25.0 -25.0   1.042457
 1  -25.0 -25.0  25.0   0.979156
 2  -25.0  25.0 -25.0   0.960287
 3  -25.0  25.0  25.0   0.994496
 4   25.0 -25.0 -25.0   1.030461
 5   25.0 -25.0  25.0   1.023065
 6   25.0  25.0 -25.0   1.014829
 7   25.0  25.0  25.0   1.007578
REL:  8  0.216440    8.052328  1.006541 rel = 0.215033 
Initial costs ----------------
Number of surface hits 2385
WM  Intensity   266.9800 +/-  31.1710
Ctx Intensity   310.3461 +/-  33.6154
Pct Contrast     14.9797 +/-  13.0611
Cost   0.2164
RelCost   0.2150

------------------------------------
Brute force preopt -4 4 4, n = 729
     0  -4.0000  -4.0000  -4.0000  -4.0000  -4.0000  -4.0000      0.9686   0.9686  0.0
     1  -4.0000  -4.0000  -4.0000  -4.0000  -4.0000   0.0000      0.9578   0.9578  0.0
     3  -4.0000  -4.0000  -4.0000  -4.0000   0.0000  -4.0000      0.9315   0.9315  0.0
    15  -4.0000  -4.0000  -4.0000   0.0000   4.0000  -4.0000      0.8552   0.8552  0.0
    21  -4.0000  -4.0000  -4.0000   4.0000   0.0000  -4.0000      0.8377   0.8377  0.0
    39  -4.0000  -4.0000   0.0000   0.0000   0.0000  -4.0000      0.8193   0.8193  0.0
   102  -4.0000   0.0000  -4.0000   4.0000   0.0000  -4.0000      0.7495   0.7495  0.0
   120  -4.0000   0.0000   0.0000   0.0000   0.0000  -4.0000      0.6649   0.6649  0.0
   346   0.0000   0.0000  -4.0000   4.0000   0.0000   0.0000      0.6284   0.6284  0.0
   364   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000      0.2164   0.2164  0.0
Brute Force --------------------------
Min cost was 0.216440
Number of iterations   729
Search time 1.774000 sec
Parameters at best (transmm, rotdeg)
  0.000   0.000   0.000  0.000  0.000  0.000 
--------------------------------------------

Starting Powell Minimization
Init Powel Params dof = 6
0 0
1 0
2 0
3 0
4 0
5 0
fs_powell::minimize
  nparams 6
  maxfev 36
  ftol   0.000100
  linmin_xtol_   0.001000
  powell nthiter 0: fret = 0.216440
   8  0.019  0.000  0.000  0.000  0.000  0.000   0.2164148657
  10  0.013  0.000  0.000  0.000  0.000  0.000   0.2164040448
  11  0.012  0.000  0.000  0.000  0.000  0.000   0.2164034195
  19  0.010 -0.618  0.000  0.000  0.000  0.000   0.1789231543
  21  0.010 -0.515  0.000  0.000  0.000  0.000   0.1771607954
  22  0.010 -0.506  0.000  0.000  0.000  0.000   0.1770838917
  24  0.010 -0.487  0.000  0.000  0.000  0.000   0.1770623870
  25  0.010 -0.495  0.000  0.000  0.000  0.000   0.1770601943
  28  0.010 -0.494  0.000  0.000  0.000  0.000   0.1770599980
  30  0.010 -0.494  1.000  0.000  0.000  0.000   0.1740782232
  34  0.010 -0.494  0.618  0.000  0.000  0.000   0.1592437975
  38  0.010 -0.494  0.617  0.000  0.000  0.000   0.1592435776
  47  0.010 -0.494  0.617  0.006  0.000  0.000   0.1591032219
  48  0.010 -0.494  0.617  0.028  0.000  0.000   0.1589818823
  50  0.010 -0.494  0.617  0.024  0.000  0.000   0.1589602718
  51  0.010 -0.494  0.617  0.022  0.000  0.000   0.1589576634
  61  0.010 -0.494  0.617  0.022 -0.006  0.000   0.1589354493
  62  0.010 -0.494  0.617  0.022 -0.012  0.000   0.1589306338
  63  0.010 -0.494  0.617  0.022 -0.011  0.000   0.1589301592
  71  0.010 -0.494  0.617  0.022 -0.011 -0.041   0.1583929888
  74  0.010 -0.494  0.617  0.022 -0.011 -0.032   0.1583913871
  76  0.010 -0.494  0.617  0.022 -0.011 -0.037   0.1583828887
  powell nthiter 1: fret = 0.158383
  86 -0.002 -0.494  0.617  0.022 -0.011 -0.037   0.1583388113
  88 -0.004 -0.494  0.617  0.022 -0.011 -0.037   0.1583377925
  97 -0.004 -0.480  0.617  0.022 -0.011 -0.037   0.1582864401
  98 -0.004 -0.475  0.617  0.022 -0.011 -0.037   0.1582759008
 100 -0.004 -0.464  0.617  0.022 -0.011 -0.037   0.1582633996
 102 -0.004 -0.465  0.617  0.022 -0.011 -0.037   0.1582633758
 103 -0.004 -0.466  0.617  0.022 -0.011 -0.037   0.1582622992
 112 -0.004 -0.466  0.522  0.022 -0.011 -0.037   0.1572756185
 113 -0.004 -0.466  0.484  0.022 -0.011 -0.037   0.1561060689
 116 -0.004 -0.466  0.460  0.022 -0.011 -0.037   0.1558337568
 128 -0.004 -0.466  0.460  0.048 -0.011 -0.037   0.1556342523
 130 -0.004 -0.466  0.460  0.042 -0.011 -0.037   0.1556091045
 132 -0.004 -0.466  0.460  0.043 -0.011 -0.037   0.1556084653
 140 -0.004 -0.466  0.460  0.043 -0.046 -0.037   0.1545353725
 142 -0.004 -0.466  0.460  0.043 -0.119 -0.037   0.1530244057
 144 -0.004 -0.466  0.460  0.043 -0.136 -0.037   0.1528863553
 145 -0.004 -0.466  0.460  0.043 -0.143 -0.037   0.1528384732
 146 -0.004 -0.466  0.460  0.043 -0.167 -0.037   0.1527488838
 148 -0.004 -0.466  0.460  0.043 -0.159 -0.037   0.1527418226
 149 -0.004 -0.466  0.460  0.043 -0.161 -0.037   0.1527374518
 150 -0.004 -0.466  0.460  0.043 -0.162 -0.037   0.1527370455
 158 -0.004 -0.466  0.460  0.043 -0.162 -0.089   0.1526257110
 159 -0.004 -0.466  0.460  0.043 -0.162 -0.069   0.1523648073
 160 -0.004 -0.466  0.460  0.043 -0.162 -0.065   0.1523493677
 161 -0.004 -0.466  0.460  0.043 -0.162 -0.061   0.1523423990
  powell nthiter 2: fret = 0.152342
 171 -0.007 -0.466  0.460  0.043 -0.162 -0.061   0.1523172977
 173 -0.021 -0.466  0.460  0.043 -0.162 -0.061   0.1522755817
 174 -0.018 -0.466  0.460  0.043 -0.162 -0.061   0.1522683377
 175 -0.017 -0.466  0.460  0.043 -0.162 -0.061   0.1522681976
 185 -0.017 -0.451  0.460  0.043 -0.162 -0.061   0.1521558635
 197 -0.017 -0.451  0.461  0.043 -0.162 -0.061   0.1521547719
 210 -0.017 -0.451  0.461  0.048 -0.162 -0.061   0.1521243295
 224 -0.017 -0.451  0.461  0.048 -0.163 -0.061   0.1521237995
 234 -0.017 -0.451  0.461  0.048 -0.163 -0.074   0.1520770380
 236 -0.017 -0.451  0.461  0.048 -0.163 -0.071   0.1520728596
 239 -0.029 -0.436  0.463  0.053 -0.164 -0.080   0.1520684640
  powell nthiter 3: fret = 0.152073
 248 -0.030 -0.451  0.461  0.048 -0.163 -0.071   0.1520472349
 249 -0.025 -0.451  0.461  0.048 -0.163 -0.071   0.1520335407
 251 -0.026 -0.451  0.461  0.048 -0.163 -0.071   0.1520334190
 261 -0.026 -0.448  0.461  0.048 -0.163 -0.071   0.1520254613
 262 -0.026 -0.447  0.461  0.048 -0.163 -0.071   0.1520251352
 273 -0.026 -0.447  0.456  0.048 -0.163 -0.071   0.1520089362
 274 -0.026 -0.447  0.457  0.048 -0.163 -0.071   0.1520086199
 286 -0.026 -0.447  0.457  0.050 -0.163 -0.071   0.1520005464
 300 -0.026 -0.447  0.457  0.050 -0.161 -0.071   0.1519979390
 311 -0.026 -0.447  0.457  0.050 -0.161 -0.081   0.1519761680
 313 -0.026 -0.447  0.457  0.050 -0.161 -0.078   0.1519728157
 316 -0.036 -0.443  0.453  0.053 -0.158 -0.084   0.1519397183
 323 -0.036 -0.443  0.453  0.053 -0.158 -0.085   0.1519395817
 325 -0.036 -0.443  0.453  0.053 -0.158 -0.085   0.1519393161
 329 -0.036 -0.443  0.453  0.053 -0.158 -0.085   0.1519393058
  powell nthiter 4: fret = 0.151939
 361 -0.036 -0.443  0.451  0.053 -0.158 -0.085   0.1519346732
 394 -0.035 -0.443  0.452  0.053 -0.159 -0.084   0.1519342443
 398 -0.034 -0.443  0.452  0.053 -0.159 -0.083   0.1519342362
 401 -0.034 -0.443  0.452  0.053 -0.159 -0.083   0.1519341215
Powell done niters = 4
Computing relative cost
 0  -25.0 -25.0 -25.0   1.042045
 1  -25.0 -25.0  25.0   0.998006
 2  -25.0  25.0 -25.0   0.974316
 3  -25.0  25.0  25.0   0.994356
 4   25.0 -25.0 -25.0   1.041316
 5   25.0 -25.0  25.0   1.034572
 6   25.0  25.0 -25.0   1.005030
 7   25.0  25.0  25.0   1.006803
REL:  8  0.151934    8.096445  1.012056 rel = 0.150124 
Number of iterations     4
Min cost was 0.151934
Number of FunctionCalls   404
TolPowell 0.000100
nMaxItersPowell 36
OptimizationTime 0.946000 sec
Parameters at optimum (transmm) -0.03433 -0.44343  0.45183
Parameters at optimum (rotdeg)  0.05279 -0.15871 -0.08344 
Final costs ----------------
Number of surface hits 2385
WM  Intensity   265.5035 +/-  28.4921
Ctx Intensity   308.5524 +/-  30.6311
Pct Contrast     14.9874 +/-  11.3154
Cost   0.1519
RelCost   0.2150
Reg at min cost was 
-0.99718  -0.05765   0.04815  -0.39143;
 0.05472  -0.11854   0.99144  -0.76007;
 0.05145  -0.99127  -0.12136   0.32492;
 0.00000   0.00000   0.00000   1.00000;

Writing optimal reg to /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/tmp.bbregister.382095/bbr.pass1.dat, type = 14 
Original Reg 
-0.99711  -0.06022   0.04637  -0.35699;
 0.05332  -0.11954   0.99140  -0.31728;
 0.05416  -0.99100  -0.12240  -0.12563;
 0.00000   0.00000   0.00000   1.00000;

Original Reg - Optimal Reg
 0.00007  -0.00257  -0.00178   0.03444;
-0.00140  -0.00100  -0.00004   0.44279;
 0.00271   0.00027  -0.00104  -0.45055;
 0.00000   0.00000   0.00000   0.00000;

Computing change in lh position
LH rmsDiffMean 0.764431
Computing change in rh position
Surface-RMS-Diff-mm 0.687359 0.094218 0.869883
mri_segreg done
mri_
 0.00000   0.00000   0.00000   1.00000;

Writing optimal reg to /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.dat, type = 14 
Original Reg 
-0.99718  -0.05765   0.04815  -0.39143;
 0.05472  -0.11854   0.99144  -0.76007;
 0.05145  -0.99127  -0.12136   0.32492;
 0.00000   0.00000   0.00000   1.00000;

Original Reg - Optimal Reg
 0.00005   0.00071   0.00186  -0.15171;
 0.00186  -0.00177  -0.00031   0.01617;
-0.00102   0.00017  -0.00183   0.01207;
 0.00000   0.00000   0.00000   0.00000;

Computing change in lh position
LH rmsDiffMean 0.141307
Computing change in rh position
Surface-RMS-Diff-mm 0.157359 0.044769 0.291182
mri_segreg done
MinCost: 0.161232 265.581031 308.454638 14.905878 
tkregister2_cmdl --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz --reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.dat --noedit --ltaout /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.lta
tkregister_tcl /opt/freesurfer/tktools/tkregister2.tcl
INFO: no target volume specified, assuming FreeSurfer orig volume.
target  volume orig
movable volume /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz
reg file       /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.dat
LoadVol        0
ZeroCRAS       0
$Id: tkregister2.c,v 1.132.2.1 2016/08/02 21:17:29 greve Exp $
Diagnostic Level -1
---- Input registration matrix --------
-0.99722  -0.05835   0.04628  -0.23972;
 0.05286  -0.11678   0.99175  -0.77623;
 0.05247  -0.99144  -0.11954   0.31285;
 0.00000   0.00000   0.00000   1.00000;
float2int = 0
---------------------------------------
INFO: loading target /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig.mgz
Ttarg: --------------------
-1.00000   0.00000   0.00000   128.00000;
 0.00000   0.00000   1.00000  -128.00000;
 0.00000  -1.00000   0.00000   128.00000;
 0.00000   0.00000   0.00000   1.00000;
INFO: loading movable /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz
Tmov: --------------------
-1.00000   0.00000   0.00000   88.00000;
 0.00000   0.00000   1.00000  -128.00000;
 0.00000  -1.00000   0.00000   128.00000;
 0.00000   0.00000   0.00000   1.00000;
mkheaderreg = 0, float2int = 0
---- Input registration matrix --------
-0.99722  -0.05835   0.04628  -0.23972;
 0.05286  -0.11678   0.99175  -0.77623;
 0.05247  -0.99144  -0.11954   0.31285;
 0.00000   0.00000   0.00000   1.00000;
Determinant -1
subject = sub-002
RegMat ---------------------------
-0.99722  -0.05835   0.04628  -0.23972;
 0.05286  -0.11678   0.99175  -0.77623;
 0.05247  -0.99144  -0.11954   0.31285;
 0.00000   0.00000   0.00000   1.00000;
Cleaning up
 
Started at Fri Mar 22 15:54:18 UTC 2019 
Ended   at Fri Mar 22 15:56:18 UTC 2019
BBR-Run-Time-Sec 120
 
bbregister Done
To check results, run:
tkregisterfv --mov /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz --reg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.lta --surfs 
 

 cp /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.auto.lta /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.lta 


 mri_convert -odt float -at /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.lta -rl /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig.mgz /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.prenorm.mgz 

mri_convert.bin -odt float -at /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.lta -rl /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig.mgz /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.prenorm.mgz 
$Id: mri_convert.c,v 1.226 2016/02/26 16:15:24 mreuter Exp $
reading from /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig/T2raw.mgz...
TR=2500.00, TE=0.00, TI=0.00, flip angle=0.00
i_ras = (0.997575, 0.0555938, -0.0418801)
j_ras = (-0.0509711, 0.993234, 0.104349)
k_ras = (0.0473979, -0.101961, 0.993659)
INFO: Reading transformation from file /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.lta...
Reading transform with LTAreadEx()
reading template info from volume /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/orig.mgz...
INFO: Applying transformation from file /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/transforms/T2raw.lta...
---------------------------------
INFO: Transform Matrix (linear_ras_to_ras)
 0.99998  -0.00206   0.00529   0.00581;
 0.00214   0.99989  -0.01502  -0.30241;
-0.00526   0.01503   0.99987   0.51841;
 0.00000   0.00000   0.00000   1.00000;
---------------------------------
Applying LTAtransformInterp (resample_type 1)
writing to /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.prenorm.mgz...

 mri_normalize -sigma 0.5 -nonmax_suppress 0 -min_dist 1 -aseg /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/aseg.presurf.mgz -surface /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/surf/rh.white identity.nofile -surface /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/surf/lh.white identity.nofile /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.prenorm.mgz /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.norm.mgz 

mghRead(/mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.prenorm.mgz): could not read 262144 bytes at slice 52
using Gaussian smoothing of bias field, sigma=0.500
disabling nonmaximum suppression
retaining  points that are at least 1.000mm from the boundary
using segmentation for initial intensity normalization
reading from /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.prenorm.mgz...
mri_normalize: could not open source file /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/mri/T2.prenorm.mgz
Linux node2 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

recon-all -s sub-002 exited with ERRORS at Fri Mar 22 15:56:23 UTC 2019

For more details, see the log file /mnt/data/loic2/fmriprep_output_tw_less/freesurfer/sub-002/scripts/recon-all.log
To report a problem, see http://surfer.nmr.mgh.harvard.edu/fswiki/BugReporting

Standard error:

Return code: 1

Regarding the second part, I was talking about a case for which recon-all truly worked well, are you sure it is an issuer with recon-all then?

Thanks for your help!

Sincerely,

Loïc

ldaumail · March 29, 2019, 10:36am

I had to remove a part from the crash file… I am unable to upload any contents, I am a newcomer!

oesteban · March 29, 2019, 5:55pm

Your recon-all errors are a duplicate of Could not read error : while file ":/out/freesurfer/sub-001/mri/T2.prenorm.mgz" exist

I believe you may be hitting memory limitations. Can you try the following sbatch config? I modified your $SUBJ variable to contain two subjects each with one or the other error you are getting.

#!/bin/bash
#SBATCH --job-name=limited_mindfcomp_22_03
#SBATCH --output=limited_mindfcomp_22_03.txt
#SBATCH --error=limited_mindfcomp_22_03.err
#SBATCH -n 1
#SBATCH --cpus-per-task=12
#SBATCH --mem-per-cpu=4G
#SBATCH --array=1-2

SUBJ=(002 004)
singularity run --cleanenv -B /mnt:/mnt /mnt/data/singularity_images/fmriprep-latest.simg \
      /mnt/data/loic2/RSBIDS4 /mnt/data/loic2/fmriprep_output_tw_less participant \
      --participant-label ${SUBJ[$SLURM_ARRAY_TASK_ID-1]} --low-mem --stop-on-first-crash \
      --medial-surface-nan --use-aroma --cifti-output --notrack \
      --output-space template fsaverage5 --fs-license-file /mnt/data/loic2/license.txt \
      --omp-nthreads 8 --nthreads 12 --mem_mb 30000

Before running that, please make sure you run:

find /mnt/data/loic2/fmriprep_output_tw_less/freesurfer -name "*IsRunning*" -delete

to get rid of all the IsRunning files.

ldaumail · April 1, 2019, 3:21pm

Thanks a lot for your help!

The script you sent me gave reproducible outputs across the 76 subjects. I am not sure whether I understand the rules we have to take into account between the performance/memory allocation slurm command lines and the performance/memory allocation of fmriprep.
Are there some important things to consider?
Also, should there be a relationship between --omp-nthreads and --nthreads?

However, I still get this error, in all of my 76 subjects: (fmriprep/log/crash file)

Node: 
fmriprep_wf.single_subject_109_wf.func_preproc_ses_1_task_restingstate_wf.bold_surf_wf.medial_nans
Working directory: /mnt/data/loic2/work/fmriprep_wf/single_subject_109_wf/func_preproc_ses_1_task_restingstate_wf/bold_surf_wf/_hemi_rh/medial_nans

Node inputs:

in_file = ['/mnt/data/loic2/work/fmriprep_wf/single_subject_109_wf/func_preproc_ses_1_task_restingstate_wf/bold_surf_wf/_hemi_rh/sampler/mapflow/_sampler0/rh.fsaverage5.gii']
subjects_dir = <undefined>
target_subject = ['fsaverage5']

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node
    result['result'] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 1253, in _run_interface
    self.config['execution']['stop_on_first_crash'])))
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 1128, in _collate_results
    for i, nresult, err in nodes:
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 99, in nodelist_runner
    result = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 557, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 637, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 369, in run
    runtime = self._run_interface(runtime)
  File "/usr/local/miniconda/lib/python3.7/site-packages/niworkflows/interfaces/freesurfer.py", line 328, in _run_interface
    newpath=runtime.cwd)
  File "/usr/local/miniconda/lib/python3.7/site-packages/niworkflows/interfaces/freesurfer.py", line 463, in medial_wall_to_nan
    darray.data[medial] = np.nan
ValueError: assignment destination is read-only

Knowing that restingstate is my last functional data file.

Thanks again for your help.

oesteban · April 1, 2019, 4:42pm

Glad it worked out.

Most HPCs now use cgroups or some other mechanism to ensure user don’t exceed their allocations of resources. Memory-wise that usually translates into disallowing memory overcommitting (i.e. requesting more memory than physically available). Although fMRIPRep does not take a lot of physical memory, the confluence of how Python operates with the Linux kernel with some additional factors result in fMRIPrep overcommitting too much memory.

With

#SBATCH --cpus-per-task=12
#SBATCH --mem-per-cpu=4G

you are allocating 64GB which is A LOT of memory. Unless you have lots (and by lots I mean >15) of BOLD runs per subject, 64GB should suffice.

Then this configuration interacts with --omp-nthreads and --nthreads. --omp-nthreads tells fMRIPRep how many cpus can a single process (i.e. one task of the processing workflow) take. Generally, you won’t see any substantial improvement with --omp-nthreads above 8. This is because the two major bottlenecks in fMRIPRep are ANTs registration and FreeSurfer. The former does not improve a lot with above 8 threads. The latter does not handle parallelization very granularly, so this configuration will not influence much. Then --nthreads limits the number of processes run in parallel. This one has a direct impact in limiting the memory allocated at a given time. The more processes are allowed to run, the more memory will be allocated. Therefore, limiting it from 20 cpus to 12 made an actual difference.

Finally, regarding your last question. I’m going to ping @effigies here: is there any change in nibabel recently that makes this operation not allowed: https://github.com/poldracklab/fmriprep/pull/1438/files#diff-ed2c216942d7f90b051cb664bca3d06bL477?

effigies · April 1, 2019, 5:30pm

That’s actually a change in numpy causing that issue. It may be worth opening an issue on nibabel so we can provide writable memory.

ldaumail · April 1, 2019, 6:16pm

Okay thanks for the indication, I will send an email once I am accepted in the nibabel mailing list

effigies · April 1, 2019, 6:17pm

Oh, sorry, I just meant to open an issue at: https://github.com/nipy/nibabel/issues

ldaumail · April 1, 2019, 6:51pm

Opened an issue here: https://github.com/nipy/nibabel/issues/746

ldaumail · April 1, 2019, 7:12pm

Here is what they responded:
"
This looks like it’s related to numpy/numpy#11739, #697, #700, #702.

GIFTI data arrays are being built with base64.b64decode and zlib.decompress , which return bytes objects instead of bytearray , so they’re not writable. We’ll need to try to figure out a way to create these arrays without doubling memory.

On the other hand, GIFTI darrays are one-per-timestep, so the cost of copying to bytearrays would be time more than memory, so that might be not be too bad.

The quick fix will be to use numpy 1.15.x until we can fix this.
"

oesteban · April 1, 2019, 10:12pm

Hi @ldaumail, that is kind of a triage of the problem.

Since it will take a little while to get that fixed, I’d suggest you don’t use --medial-surface-nan.

ldaumail · April 2, 2019, 3:21pm

Hi @oesteban, thanks a lot for all the help you gave me,
the run went perfectly well for all the 76 subjects!
I would have a last question if I may ask, could you clarify a bit how you find 64G for the memory allocation?
For 12 cpus per task and 4G per cpu, I would rather assume a 48G of memory allocation… Am I wrong?

Thanks in advance, and thanks again for all your help !

Sincerely,

Loïc

oesteban · April 3, 2019, 4:52pm

Sorry, that is me being really bad at maths