We’re implementing fMRIPrep via Singularity container on a Linux cluster where we have an system architecture that functions as an interface to copy MRI data from an archive as well as a behind-the-scenes bash script-running environment to string together complex data-processing pipelines. In a typical scenario, this system will a) pull our fMRI/sMRI data and copy it over to a working storage array (i.e., a ‘work’ location), then execute a series of command line calls (e.g., to set env or path settings, move files or create directories, etc) and/or invoke various bash scripts in sequence… This whole string of ‘pipeline’ commands are served through a Sun Grid Engine to execute on the various compute nodes of our cluster.
Interestingly, I can get fMRIPrep to engage just fine on this system pretty easily. But once it gets going for a while it crashes… oh, maybe 30-45 minutes into things… with an error about the ‘fsaverage’ file structure access:
shutil.Error: [('/opt/freesurfer/subjects/fsaverage/label', '/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage/label', "[Errno 1] Operation not permitted: '/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage/label'"),
This list of errors goes on for about a dozen files within …/fsaverage (including the contents of the mri, surf, xhemi) and ends with an overall message that seems (?) as if it just doesn’t like the entire ‘fsaverage’ folder that fMRIPrep copies over into the output directory:
PosixPath('/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage'), "[Errno 1] Operation not permitted: '/home/pipeline/onrc/data2/pipelineb/AutOO_fmriprep_ciftify/S0211BRU/1/derivatives/freesurfer/fsaverage'")]
(Please note, the path name in those errors refers to ‘fmriprep_ciftify’, but the whole ciftify thing is a secondary step for older, archived data we’re playing with… Here, I’m really talking about running just ‘fMRIPrep’ itself, using a recent v20.2.0 version. Don’t be thrown off by the name… that code has nothing to do with this pipeline.)
Now, the crash happens with any dataset we try. But interestingly, the crash ONLY happens when the system I described above is processing the data. In contrast, if I try to run things by hand… that is, use the same data file structure that was set up by our system, and invoke the EXACT SAME fMRIPrep bash script call we created (i.e., in the same location), it all runs fine… A-to-Z. So there’s something tricky about how our system architecture and Singularity container are not getting along that I can’t quite figure out… Something that leads specifically to this ‘fsaverage’-focused set of Errno 1 messages. I did some Google searching on the error, but only turned up one thing that looked relevant. A prior listserv post somewhere says they ran into something similar using the -u UID option. The issue there was an incompatibility for file permissions between linux accounts that set up vs. were trying to process the data. This felt like a plausible issue here given our architecture. But a) I wasn’t using the -u UID option to begin with (so there couldn’t have been a purposeful incompatibility, merely one that possibly arises from issues I’m not aware of), and b) a variety of things I tried to manually intervene to overcome this issue (e.g., copying over fsaverage BEFORE fMRIPrep tried to do it itself, using chmod to make sure it had full 777 read/write permissions, etc.) all failed.
I’m wondering if anyone can suggest a few things to look into to troubleshoot this issue? It might be as simple as making clever use of the -u UID option as a fix… something I haven’t really tried in depth yet. Or this might be an altogether different problem than the issue above from the prior listserv post I dug up. But I’d appreciate any direction in how to solve things. It’s kinda hard to troubleshoot something when it feels like the error message you get is merely the tip of the iceberg and not about a specific problem. Also, I’m not entirely sure where this copy of ‘fsaverage’ is coming from… I assume it’s being simply copied, verbatim, from inside the container. But if that’s not right, it might help me think through possibilities of things to check.