Pre-processing additional func sessions

Hi Experts,

I am pre-processing a publicly available dataset with fMRIPrep. For each subject, there are 14 sessions, each containing 3 functional tasks inside /func (movie, rest, inscapes) and fieldmap data inside /fmap. I have processed two of the 14 sessions for each subject and would now like to process the remaining 12 sessions. I wondered if it would be appropriate to just add --fs-subjects-dir to my command so that fMRIPrep does not rerun the surface reconstruction? Or is freesurfer required for each session? In addition, my plan was to add the remaining 12 sessions into the /sorucedata folder which at present only contains the 2 sessions I have already pre-processed. Will fMRIPrep automatically register that sessions 1 and 2 have already been pre-processed and skip them? Or will I need to add –session-id and then list all 12 sessions I want to be processed?

Hope that makes sense!

For reference, here is the command I run (additional flag in bold):

singularity run --cleanenv --bind /cubric/scratch/c1749990/HBN/sourcedata --bind /cubric/scratch/c1749990/HBN/derivatives --bind /cubric/scratch/c1749990/HBN/work /cubric/scratch/c1749990/singularity.images/fmriprep_20.2.7.simg /cubric/scratch/c1749990/HBN/sourcedata /cubric/scratch/c1749990/HBN/derivatives participant --participant_label $SUBJ$i --output-spaces fsnative fsaverage T1w func MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2 --stop-on-first-crash --fs-license-file $HOME/freesurfer_license.txt –fs-subjects-dir /cubric/scratch/c1749990/HBN/derivatives/freesurfer -w /cubric/scratch/c1749990/HBN/work --n_cpus $SLURM_CPUS_PER_TASK --omp-nthreads 4

Best wishes,
Mason

Hi Mason,

In general this is right! This is because fmriprep makes a single subject T1 template across all structural scans across sessions. So by default you will have one freesurfer directory for each subject. If this is not convenient for you (e.g. the sessions are years apart, so you do not want to create a single template combining them), then you would want to hack stuff differently (I think some other threads here have discussed this).

This would not be best BIDS practice. sourcedata should contain things such as your DICOMs. If you want to only run certain sessions, you should instead pass in a BIDS filter file to the fmriprep command.

If you are reusing the work directory and all the contents from the original run are in the work directory, most of the processing for already processed runs will be skipped.

A quick style note on your fmriprep command, it looks like you only need one bind command for /cubric/scratch/c1749990, since the bind argument is recursive.

Best,
Steven

Hi Steven,

Thanks for the quick reply. Let me just add a few more details to double check we’re both on the same page. You’re correct, I use /sourcedata as my BIDS valid directory containing the data to be pre-processed, when in fact it should be called /rawdata or something similar. I’ll rename that so it’s in line with BIDS practice. I have also deleted my work directory after I’ve QC’d the fMRIPrep outputs because of issues with storage space. At present, each subject directory within /rawdata contains ses-1 and ses-2, my plan was to move the ses-3-14 data into /rawdata and call fMRIPrep (pointing to /rawdata). If I do this, and without the work dir, will the two sessions I’ve already pre-processed be included and be re-run?

Also, each of the 14 sessions contain an anat folder in which a T1 is stored. I am fairly confident it shouldn’t matter which T1 I use as these subjects were scanned in quick succession. Therefore, I’m happy with the template FreeSurfer has generated from the two sessions I have already pre-processed. Would you recommend removing the anat folders from the sessions I want to run now (sessions 3-14), or leave them in there and just use --fs-subjects-dir?

Perhaps it might be easier to just run fMRIPrep from scratch with all 14 sessions and thus all 14 anat folders? My concern is that it will take quite some time and given the FreeSurfer reconstruction looks good for the two sessions, it would save time to just reuse that data.

In short, I essentially want to say to fMRIPrep “you did a great job at pre-processing sessions 1 and 2, can you do the same thing for sessions 3-14 but reuse the FreeSurfer data you’ve already output and save me some time?”

The “without the work dir” will be a problem; without this, it will try to reprocess data by default. But again the bids filter file can specify which sessions you want to analyze.

Definitely do not remove the T1s from the session directories. Having a T1 is a requirement. The BOLD gets registered to the T1 as part of the anatomical workflow.

Personally, if you are still in the preprocessing stage and haven’t began analyzing, I would rerun everything to have the most representative surface reconstruction. Do NOT use the --longitudinal flag because that template will take forever to generate with 14 images. Since you are confident that the T1 images shouldn’t change considerably between sessions, the default not-unbiased template generation should be fine. But, ultimately it is your call regarding reusing those freesurfer derivatives.

Best,
Steven

Hi Steven,

Thanks for your advice. I bit the bullet and went ahead and tried to pre-process the entire dataset from scratch. I appear to be running into some issues I hoped you might be able to shed some light on.

I have tried pre-processing half the data by running each subject (14 sessions) in a single Singularity container (see below for command/bash setup). I don’t have a great amount of storage (1TB) and notice the work directory takes up a significant portion of this (over 500GB). My first question is can the size of the work directory be reduced?

Each subject has been running on the cluster for ~70 hours without finishing. Some of the error logs are blank, but for some they seem to suggest there was an issues with memory or storage: Below are some excerpts from an error log:

exception calling callback for <Future at 0x7f409922bf28 state=finished raised FileNotFoundError>
concurrent.futures.process._RemoteTraceback:
OSError: [Errno 122] Disk quota exceeded:

‘/cubric/newscratch/294_MRvis/HBN/work/fmriprep_wf/single_subject_0031123_wf/func_preproc_ses_SSV5_task_RESTINGSTATE_wf/bold_t1_trans_wf/bold_to_t1w_transform/result_bold_to_t1w_transform.pklz.tmp’

If indeed you believe the issue here is disk space, a way to reduce the size of the work directory might be helpful. However, if it is a memory issue, can I get your advice of the way I have set up my bash? You can see I specify --mem=12g, --n_cpus 8, --omp-nthreads 4. This worked fine for a smaller dataset (two sessions per subject), but perhaps it is not suitable for the entire dataset (14 sessions = ~10GB data).

My final question is whether fMRIPrep is able to pick up where it left off/crashed as I still have the work directories? Or would you advise deleting the derivatives and work directory and starting again with an improved setup?

Thanks again for all your help, it is much appreciated!

#SBATCH -p cubric-centos7
#SBATCH --job-name=S33_RunfMRIPrep
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=12g
#SBATCH -o Reports/S33_RunfMRIPrep%j.out
#SBATCH -e Reports/S33_RunfMRIPrep%j.err
#SBATCH --time=72:00:00


SUBJ="0031133"
echo $SUBJ$i

singularity run --cleanenv --bind /cubric/newscratch/294_MRvis/HBN/rawdata --bind /cubric/newscratch/294_MRvis/HBN/derivatives --bind /cubric/newscratch/294_MRvis/HBN/work /cubric/collab/494_viscortex/singularity.images/fmriprep_20.2.7.simg /cubric/newscratch/294_MRvis/HBN/rawdata /cubric/newscratch/294_MRvis/HBN/derivatives participant --participant_label $SUBJ --output-spaces fsnative fsaverage T1w func MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2 --stop-on-first-crash --fs-license-file $HOME/license.txt -w /cubric/newscratch/294_MRvis/HBN/work --n_cpus $SLURM_CPUS_PER_TASK --omp-nthreads 4

Not really.

It could be helpful to determine if this is happening in the data drive or in your home drive. fMRIPrep may try to write things such as templateflow to your home drive. Usually on HPCs your home drive is space limited, which could be leading to this error.

This is probably low memory for 14 runs. I may even go to 32GB and 16CPU if you can. You also might want to reconsider whether you need all of these output spaces: fsnative fsaverage T1w func MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2

As long as you do not change versions, reusing this work directory should be fine.

You can just only bind /cubric/newscratch/294_MRvis/HBN, since it contains the other folders.

Best,
Steven