Fmriprep stuck on resume recon-all

pettitta · January 17, 2019, 5:28pm

Hello,

I’m currently working on a dataset with two waves of MRI data. I’ve been trying to debug this step for a few weeks and I’ve hit a wall. I’ve been running the following fmriprep command for a single test participant on a cluster computer using fmriprep 1.2.4 with 100GB allotted to processing.

singularity run --bind "${group_dir}":"${group_dir}" ${image} ${bids_dir} ${derivatives} participant \ --participant_label ${subid} \
-w ${working_dir} \
-t ${task} --use-aroma --write-graph \
--output-space {'T1w','template','fsaverage5','fsnative'} \
--mem-mb 100000 \
--debug -vvvv \
--fs-license-file $FS_LICENSE

Unfortunately, it keeps getting stuck on a freesurfer step. This is especially odd because I’ve run freesurfer for this participant before, separately, and it’s never had any issues. Additionally, I’ve tried using the freesurfer output for fmriprep and it’s run into the same issue. So I removed the freesurfer output completely and instructed fmriprep to make its own freesurfer output and still I get the error. I’ve attached the output and error logs for the single participant. Any help you could provide would be greatly appreciated.

Side note: I ran the exact same script on a different dataset and it worked beautifully, but that dataset had only 1 wave, so could this potentially be due to having multiple waves of data?

ADS5226_fmripreprest_error.txt (5.5 KB)
ADS5226_fmripreprest_output.txt (3.1 MB)

effigies · January 17, 2019, 5:32pm

Hi @pettitta,

I don’t see anything in the logs that indicates a problem. It seems to indicate that it was in the middle of running some of the FreeSurfer pipeline when the log ended.

90115-21:30:56,551 nipype.workflow INFO:
	 [MultiProc] Running 2 tasks, and 0 jobs ready. Free memory (GB): 87.66/97.66, Free processors: 12/28.
                     Currently running:
                       * _autorecon_surfs1
                       * _autorecon_surfs0

Did you verify that FreeSurfer stopped running? Is it possible you hit a timeout for the entire process, and re-running might help? Note that if the cluster killed the job, you probably have freesurfer/<subject>/scripts/IsRunning* files that need to be removed to resume.

pettitta · January 17, 2019, 5:48pm

Thanks so much for your quick response! The problem is that it stays on that specific step for ~2 days (in addition to the extra time it took to get to this step). This has happened > 5 times, all stopping at the exact same point. Each time I rerun fmriprep I completely remove the freesurfer directory, the working_bids_fmripreprest directory, and the fmriprep directory to make sure that it’s a clean space. How would I know if I hit a timeout and would it consistently happen at the same spot over and over again?

effigies · January 17, 2019, 6:10pm

I guess the question is what it means to “stop” at the same point. If it’s a particularly difficult reconstruction, perhaps that could go on for so long, but that seems unlikely. However, before killing your process, I would verify that it is in fact still running, using top or ps, or similar. You can also look in freesurfer/<subject>/scripts/ to find log files that might give an indication of something going wrong.

By timeout, I’m referring to the point at which your job on a batch system (e.g. SLURM, SGE, etc.) simply kills fMRIPrep. If it’s 2 days in, that’s a little weird. If you’re able to share a defaced T1w image that works in standard FreeSurfer but causes FreeSurfer in fMRIPrep to fail, we can see if we can reproduce it. Assuming your skull stripping works okay, it’s difficult to see what the difference would be.

If you have an already-run copy of the FreeSurfer (6.0) directory, you can simply place that in your outputs, and fMRIPrep will only verify that all of the outputs are present, and not attempt to run if they’re found.

pettitta · January 17, 2019, 6:28pm

I’ve run the process with 4 separate participants and they’ve all stalled at the same place. I’ll attach the log files that are found in the freesurfer scripts directory. I think they indicate that something is still running, but it’s taking days and it seems as you say a little weird. Additionally, when I up in the already run freesurfer output like you indicated, it still stalls and runs for days even though the freesurfer output is there.

recon-all-lh.txt (133.3 KB)
recon-all-rh.txt (117.5 KB)
recon-all-status-lh.txt (674 Bytes)
recon-all-status-rh.txt (674 Bytes)
recon-all-status.txt (977 Bytes)
recon-all.txt (157.1 KB)

effigies · January 17, 2019, 6:45pm

What version of FreeSurfer did you pre-run, and where did you place it?

pettitta · January 17, 2019, 6:48pm

I used freesurfer 6.0.0, and the file structure for freesurfer was ${bids_data}/derivatives/freesurfer.

effigies · January 17, 2019, 7:21pm

So we look for FS results in <outputs>/freesurfer, because we can’t guarantee write access to the BIDS directory. If your <outputs> is just <bids_root>/derivatives, then this is equivalent, but if your output directory is not that, then we won’t be finding and using your precomputed FreeSurfer runs.

pettitta · January 17, 2019, 7:33pm

Sorry if I’m about to ask a stupid question. But where do I specify the output directory?

Currently I have this:

bids_dir="${group_dir}""${study}"/data/BIDS_data
derivatives="${bids_dir}"/derivatives
working_dir="${derivatives}"/working_bids_fmripreprest/
image="${group_dir}""${container}"

singularity run --bind “${group_dir}”:"${group_dir}" ${image} ${bids_dir} ${derivatives} participant
–participant_label ${subid}
-w ${working_dir}
-t ${task} --use-aroma --write-graph
–output-space {‘T1w’,‘template’,‘fsaverage5’,‘fsnative’}
–mem-mb 100000
–skip_bids_validation
–debug -vvvv
–fs-license-file $FS_LICENSE

So would I need to place the freesurfer output into the <bids_root>/derivatives/fmriprep directory for this to work?

effigies · January 17, 2019, 7:41pm

derivatives="${bids_dir}"/derivatives`
...
singularity run --bind “${group_dir}”:"${group_dir}" ${image} ${bids_dir} ${derivatives} participant
...

Looks like your output directory is ${derivatives}, so ${derivatives}/freesurfer would be the right place for it.

Hmm. I’ll think about this some more. Sorry, I’m trying to juggle a few things. Ping me tomorrow if I don’t get back before then.

pettitta · January 17, 2019, 7:42pm

No worries, thanks for your help! I’ll try rerunning it with the freesurfer output in ${derivatives} one more time to see if it works this time.

pettitta · January 21, 2019, 7:35pm

So, I think I’ve figured out what’s going on, and it may be a little complicated, so I’ll try to make this as non-confusing as possible. . Over the weekend, I batch ran 5 participants. 3 of the participants, I had their individual timepoint freesurfer directories present (i.e., derivatives/freesurfer/sub-X_ses-1 and derivatives/freesurfer/sub-X_ses-2). One of the participants, I also placed their base image into the freesurfer directory (i.e., derivatives/freesurfer/sub-Y_ses-1 and derivatives/freesurfer/sub-Y_ses-2 AND derivatives/freesurfer/sub-Y). This participant, with the base image already made, stopped after 2 hours in, with the following error:

ERROR: It appears that this subject ID is an existing base/template from longitudinal processing (-base): sub-ADS4241 If you are trying to re-run a -base template you need to pass the -base and all -tp flags:

   \' -base <templateid> -tp <tpNid> ... -all \'

   (Instead of -all you can pass other flags, such
   as -autorecon2 -autorecon3 to run only parts.)

However, for the other participants, without the base image present, they’ve been running for about 4 days now. I’ve been checking the logs in freesurfer/sub-X/scripts, and as you said, they’ve actually been progressing. I compared the size of the directory being produced by fmriprep (e.g., derivatives/freesurfer/sub-X) and the size of the directory produced when I independently ran the base image and the fmriprep directory is about 1.4 GB and the one I independently ran is about 250MB.

Additionally, I was also interested in if simply removing all of the freesurfer directories for a participant and running fmriprep would produce the same thing. So for 5th participant I ran one with no freesurfer output and it’s following the exact same steps as the ones where I had derivatives/freesurfer/sub-X_ses-1 and derivatives/freesurfer/sub-X_ses-2. So I’m assuming that fmriprep is simply ignoring the freesurfer output from sessions one and two and just making a single unbiased image rather than making an individual image for each timepoint and then making a base image from that.

Does that sound correct? Should it be taking 4+ days for this whole process? It looks like it’s working, it’s just taking a lot longer than I previously had thought (based on running freesurfer independent of fmriprep).

Thanks so much for your time and help!

Adam

jbwexler · November 26, 2019, 1:46am

Hey Adam, I was wondering if you ever resolved this issue? I seem to be having a similar issue. About 3/4 of the subjects in the dataset ran just fine all the way to the end of fmriprep. But the other 1/4 got stuck at autorecon_surfs0. This step took the successful subjects only about an hour. I was wondering if you might have any advice.

Thanks,
Joe

sanchezj · April 13, 2020, 3:48pm

Hey everyone.
We are having the same issue with fmriprep recon all for a small subset of our data. Was there ever a resolution ?
J -

pettitta · April 15, 2020, 5:37pm

Hi everyone,

I didn’t really find any resolution for this. I actually just let fmriprep run and some participants took 10+ days, but they finished. It just took a very very long time. I’m not sure if this is the correct solution, but I can’t think of anything else.