Failed (exit code 1) on Sherlock server

rmaatoug · April 10, 2017, 8:14pm

Hi all,

First I have used this following command line on Sherlock :

singularity run /share/PI/russpold/singularity_images/poldracklab_fmriprep_0.3.1-2017-03-25-c38ac0136e8c.img /scratch/PI/aetkin/redwan/framing/sourcedata/ /scratch/PI/aetkin/redwan/preprocessed/ participant --participant_label sub-01 -w $LOCAL_SCRATCH

After 40 hours, the preprocessing of only one subject was not done (with 64GB of ram and 16 cores per node) so I decided to kill the process.

Then I have changed the command line, adding the BOLD command :
singularity run /share/PI/russpold/singularity_images/poldracklab_fmriprep_0.3.1-2017-03-25-c38ac0136e8c.img /scratch/PI/aetkin/redwan/framing/sourcedata/ /scratch/PI/aetkin/redwan/preprocessed/ participant --participant_label sub-01 -w $LOCAL_SCRATCH –no-freesurfer

The preprocess takes just 2 hours but fails.

The email from Stanford shows that :

Job ID: 14230866
Cluster: sherlock
User/Group: rmaatoug/aetkin
State: FAILED (exit code 1)
Nodes: 1
Cores per node: 16
CPU Utilized: 13:35:36
CPU Efficiency: 60.27% of 22:33:20 core-walltime
Memory Utilized: 40.16 GB
Memory Efficiency: 64.26% of 62.50 GB

I attach the error file :

/usr/local/miniconda/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in versi
on 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interfa
ce of the new CV iterators are different from that of this module. This module will be removed in 0.20.
“This module will be removed in 0.20.”, DeprecationWarning)
/usr/local/miniconda/lib/python3.6/site-packages/nipype/workflows/dmri/mrtrix/group_connectivity.py:16: UserWarning: cmp not installed
warnings.warn(‘cmp not installed’)
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/nipype/interfaces/base.py:431: UserWarning: Input convergence_threshold requires inputs
: number_of_iterations
warn(msg)
/usr/local/miniconda/lib/python3.6/site-packages/nipype/interfaces/base.py:431: UserWarning: Input sampling_percentage requires inputs:
sampling_strategy
warn(msg)
/usr/local/miniconda/lib/python3.6/site-packages/nipype/interfaces/base.py:431: UserWarning: Input sigma_units requires inputs: smoothin
g_sigmas
warn(msg)
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
/usr/local/miniconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:430: UserWarning: The behaviour of affine_transform with
a one-dimensional array supplied for the matrix parameter has changed in scipy 0.18.0.
"The behaviour of affine_transform with a one-dimensional "
Traceback (most recent call last):
File “/usr/local/miniconda/bin/fmriprep”, line 11, in
load_entry_point(‘fmriprep==0.3.1’, ‘console_scripts’, ‘fmriprep’)()
File “/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/run_workflow.py”, line 88, in main
create_workflow(opts)
File “/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/run_workflow.py”, line 183, in create_workflow
subject_label, run_uuid=run_uuid)
File “/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/viz/reports.py”, line 174, in run_reports
report = Report(reportlet_path, config, out_dir, run_uuid, out_filename)
File “/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/viz/reports.py”, line 93, in init
self._load_config(config)
File “/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/viz/reports.py”, line 106, in _load_config
self.index()
File “/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/viz/reports.py”, line 124, in index
subject = re.search(’^(?P<subject_id>sub-[a-zA-Z0-9]+)$’, subject_dir).group()
AttributeError: ‘NoneType’ object has no attribute ‘group’

I would really appreciate any help
Thank you,
Redwan

ChrisGorgolewski · April 10, 2017, 10:24pm

This is definitelly too long. I suspect the submillimiter freesurfer pipeline kicked in - what is the resolution of your T1 data?
You should try passing --nthreads 16 and --ants-nthreads 16 arguments to speed things up. If this does not work you can try disabling the submm freesurfer pipeline using --no-submm-recon flag.

This bug has been fixed in version 0.3.2. In general, you should pass labels without the sub- prefix.

rmaatoug · April 11, 2017, 6:00pm

Hi Chris,

Thank you for your answer,
Now everything works like a charm with the new release of fmriprep and this following command line :

singularity run /share/PI/russpold/singularity_images/poldracklab_fmriprep_0.3.2-2017-04-08-6bcffd8d4693.img
/scratch/PI/aetkin/redwan/framing/sourcedata/ /scratch/PI/aetkin/redwan/preprocessed/ participant --participant_label 01 -w $LOCAL_
SCRATCH –nthreads 16 --ants-nthreads 16

Just for information, with my data (multiband with TR = 400ms) :
-the preprocessing takes about 8 hours with the surface reconstruction
-the preprocessing takes about 2 hours without the surface reconstruction

Have a nice day and thank you for your very helpful work !
Redwan

rmaatoug · August 25, 2017, 9:24pm

Hi guys,

I Have tried to run the preprocessing using Sherlock server and I have the following issue .The .err file shows that :
WARNING: Non existant ‘bind path’ source: ‘/oak’
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/103900/smaps
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/107462/smaps
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/22974/smaps
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/33432/smaps
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/43782/smaps
slurmstepd: error: *** JOB 16973902 ON sh-18-6 CANCELLED AT 2017-08-25T03:58:25 DUE TO TIME LIMIT ***

I have submitted a job with the following command line :

singularity run /share/PI/russpold/singularity_images/poldracklab_fmriprep_0.5.3-2017-07-18-ad83dbe794fc.img /scratch/PI/aetkin/redwan/framing/sourcedata/ /scratch/PI/aetkin/redwan/preprocessed/ participant --participant_label 12 -w $LOCAL_SCRATCH --nthreads 16 --no-freesurfer

As you can see, I am not using the last release of fmriprep (but it should not be an issue ?!)
It is not a timing issue because I have asked for 12 hours.

The last time fmriprep was perfectly working, it was with almost the same command line (in bold is what I removed in the current command line) :

singularity run /share/PI/russpold/singularity_images/poldracklab_fmriprep_0.5.3-2017-07-18-ad83dbe794fc.img /scratch/PI/aetkin/redwan/framing/sourcedata/ /scratch/PI/aetkin/redwan/preprocessed/ participant --participant_label 12 -w $LOCAL_SCRATCH --nthreads 16 –ants-nthreads 16 --no-freesurfer

I have removed this because when I tried to run the preprocessing with that I had the following error : “–ants-nthreads 16” command not found.

If someone could help,
Thank you very much,
Redwan

ChrisGorgolewski · August 26, 2017, 7:22pm

Depending on the resolution of the input data and number of functional runs FMRIPREP can take more than 12 hours to run. Try increasing the limit.

rmaatoug · August 26, 2017, 10:43pm

Thanks Chris,
I am going to try that.
But you could you tell me what is this WARNING for ? and what is the slurmstepd : error ?

Thank you

rmaatoug · August 26, 2017, 10:48pm

And what about this option : –ants-nthreads 16
Has the option been deprecated or you just change the name ?

Thanks
Redwan

ChrisGorgolewski · August 28, 2017, 2:50am

The name of that parameter has changed to --omp-nthreads. You can find up to date list of command line flags here http://fmriprep.readthedocs.io/en/stable/usage.html#command-line-arguments

I’m not sure what slurmstepd error indicates. Seems slurm related so you should probably talk to Sherlock admins about it.

rmaatoug · August 28, 2017, 5:16pm

Thank you Chris for the answer,
Regarding the slurmstepd error, here is the answer from Sherlock people :

“Statistics are collected when a job has finished, including PSS, which is a measure of memory usage. The error message means that when Slurm tries to collect all info to calculate PSS, the file exposing kernel statistics for the process is already gone. This is probably due to the cleaning process being slightly out of sync. But since this is something that happens after the job has finished, results should not be affected”