Looking for some troubleshooting suggestions for fMRIPrep

We’re implementing fMRIPrep via Singularity container on a Linux cluster where we have an system architecture that functions as an interface to copy MRI data from an archive as well as a behind-the-scenes bash script-running environment to string together complex data-processing pipelines. In a typical scenario, this system will a) pull our fMRI/sMRI data and copy it over to a working storage array (i.e., a ‘work’ location), then execute a series of command line calls (e.g., to set env or path settings, move files or create directories, etc) and/or invoke various bash scripts in sequence… This whole string of ‘pipeline’ commands are served through a Sun Grid Engine to execute on the various compute nodes of our cluster.

Interestingly, I can get fMRIPrep to engage just fine on this system pretty easily. But once it gets going for a while it crashes… oh, maybe 30-45 minutes into things… with an error about the ‘fsaverage’ file structure access:

shutil.Error: [('/opt/freesurfer/subjects/fsaverage/label',
'/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage/label', 
"[Errno 1] Operation not permitted:
'/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage/label'"),

This list of errors goes on for about a dozen files within …/fsaverage (including the contents of the mri, surf, xhemi) and ends with an overall message that seems (?) as if it just doesn’t like the entire ‘fsaverage’ folder that fMRIPrep copies over into the output directory:

PosixPath('/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage'), 
"[Errno 1] Operation not permitted:
'/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage'")]

(Please note, the path name in those errors refers to ‘fmriprep_ciftify’, but the whole ciftify thing is a secondary step for older, archived data we’re playing with… Here, I’m really talking about running just ‘fMRIPrep’ itself, using a recent v20.2.0 version. Don’t be thrown off by the name… that code has nothing to do with this pipeline.)

Now, the crash happens with any dataset we try. But interestingly, the crash ONLY happens when the system I described above is processing the data. In contrast, if I try to run things by hand… that is, use the same data file structure that was set up by our system, and invoke the EXACT SAME fMRIPrep bash script call we created (i.e., in the same location), it all runs fine… A-to-Z. So there’s something tricky about how our system architecture and Singularity container are not getting along that I can’t quite figure out… Something that leads specifically to this ‘fsaverage’-focused set of Errno 1 messages. I did some Google searching on the error, but only turned up one thing that looked relevant. A prior listserv post somewhere says they ran into something similar using the -u UID option. The issue there was an incompatibility for file permissions between linux accounts that set up vs. were trying to process the data. This felt like a plausible issue here given our architecture. But a) I wasn’t using the -u UID option to begin with (so there couldn’t have been a purposeful incompatibility, merely one that possibly arises from issues I’m not aware of), and b) a variety of things I tried to manually intervene to overcome this issue (e.g., copying over fsaverage BEFORE fMRIPrep tried to do it itself, using chmod to make sure it had full 777 read/write permissions, etc.) all failed.

I’m wondering if anyone can suggest a few things to look into to troubleshoot this issue? It might be as simple as making clever use of the -u UID option as a fix… something I haven’t really tried in depth yet. Or this might be an altogether different problem than the issue above from the prior listserv post I dug up. But I’d appreciate any direction in how to solve things. It’s kinda hard to troubleshoot something when it feels like the error message you get is merely the tip of the iceberg and not about a specific problem. Also, I’m not entirely sure where this copy of ‘fsaverage’ is coming from… I assume it’s being simply copied, verbatim, from inside the container. But if that’s not right, it might help me think through possibilities of things to check.

Best,
Mike

Did you ever find a solution? I experienced the same error recently using an almost identical setup and found a solution.

In our setup, we run fmriprep in a singularity container on an HPC cluster. We have to copy working data to scratch space and submit jobs working with copied data on scratch. Then we copy the processed data back to storage. On the HPC, I would run into the same error, but not when running fmriprep on a local machine.

The problem for me was the scratch folder with the copied data had root group ownership. The error occurs when niworkflows copies the fsaverage directory. It uses the copytree function from shutil and this function wants to changed the copied files group ownership to your default group. The parent directory group’s ownership would propagate to child directories and files, so all files would start out with root group ownership for me. Changing root group ownership to a different group is an operation not permitted so we get that error.

On our scratch space, our user accounts owned the folders where we copy data, so I changed its group ownership to my default group and set the group sticky bit. This solved the issue fore me.