Computer went down (power outage) during fmriprerp - how to continue?

orduek · January 22, 2019, 1:36pm

During long (more than 20h) or preprocessing, my lab computer went down (power outage). Is there a way to continue preprocessing, or do I have to begin from scratch? I’ve noticed that the tmp folder is not empty.

effigies · January 22, 2019, 2:41pm

As long as you still have the same working directory available (usually the case unless you’re running through Docker and not mounting a working directory from outside the container), you can pick back up just by making sure -w points to the same working directory. Also, if you’re running at 20h, it’s likely that it’s FreeSurfer’s recon-all that was taking so long. fMRIprep will only run portions of recon-all that haven’t been finished.

You may need to remove freesurfer/sub-*/scripts/IsRunning* from your output directory, otherwise recon-all will detect those and assume another job is already running.

orduek · January 22, 2019, 2:43pm

I actually run with --fs-no-reconall tag. We’re talking about 7 subjects with 2 session per subject.
I had --n-threads 10 and I have 32GB RAM. So also wondered why it takes that long…

effigies · January 22, 2019, 3:10pm

Are you running all subjects in the same fmriprep instance? I think it’s reasonably safe to assume at least an hour per BOLD series (amortizing the anatomical portion; so the marginal cost of each series goes down), so 14h as a base number is a good low-end estimate. There are a number of factors that could make them take longer, such as longer time series, multiple BOLD series per session, susceptibility-distortion-correction, etc. All this is to say that, on its face, 20h for 14 sessions isn’t overly worrying. Did any of the subjects finish? You can get a sense of the completion by looking at the output directory and comparing the number of output BOLD series to the number you expect.

Additionally, if you use --n-threads 10, the default number of threads for multi-threaded jobs (controllable via --omp-nthreads) will be 8 (we don’t see much improvement beyond 8, so that leaves space for single-core jobs). However, this means that you won’t have room to run 2 8-core jobs simultaneously, so you might be spending a lot of time with two cores mostly idle.

Our general recommendation for running many subjects is to run a separate fMRIPrep instance per subject, and if possible, run them in parallel on a cluster computing environment.

orduek · January 22, 2019, 3:23pm

Thank you for your help!
I think some of them did finish (producing 3 nii.gz files per BOLD file).
I have a 12 cores computer, so maybe I’ll run them subject by subject with n-threads 5 per subject.
Moving all the data to the cluster might take a lot of time by itself, so I wonder if it is the best solution…

effigies · January 22, 2019, 3:27pm

Okay. I’ll note that 5 cores will make a noticeable slowdown in ANTs registration relative to 8 cores, but it is possible that it will give the scheduler a bit more room to make more full use of your 10 cores. Just another thought, if you’re running on your regular workstation, that if your memory is being used by your other activities, you might be hitting swap, and that will absolutely kill performance.