Keeping track in case someone else has a similar issue and sees this.
I tested multiple participants.
The only change I could come up with was moving the working directory to an external hard drive, putting it back on the main volume didn’t help.
I rebooted.
I changed the output directory to a clean one on the main volume.
Tried on a completely new dataset, which happens to have MBME RS scans.
Tried switching from --output-type censored to interpolate because the error was in Node censor_report.
I’m baffled because I didn’t change anything, I had sucessfully tested one participant and was just about to start the batch run when this error started happening.
Also (from the container)
Apptainer> python
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib.font_manager
>>> matplotlib.font_manager.findSystemFonts()
['/usr/share/fonts/truetype/dejavu/DejaVuSerif-Bold.ttf',
'/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf',
'/usr/share/fonts/truetype/dejavu/DejaVuSerif.ttf',
'/usr/share/fonts/truetype/dejavu/DejaVuSansMono-Bold.ttf',
'/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf',
'/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf']
I was eventually able to get this to work by emptying my ~/.cache, which I then emulated by creating an empty cache each time, I’m assuming the issue was with ~/.cache/fontconfig. I had to keep a cache around to store the TemplateFlow files.
That worked for one participant, but then crashed on the next.
However, @Steven it now crashes on the regress_and_filter_bold step, without writing to the output log file at all. I’m baffled again. I’ve rebooted, and made sure there is room on the drive where $wkdir is. I’ve also added --mem-mb and --low-mem flags, but I’m not getting a message from the OOM handler anyway.
Can confirm the new issue isn’t related to the old fonts issue.
I did end up solving the fonts issue by creating a fake empty cache for every run, which duplicates a little download time, but it’s not the end of the world. I think you could bind the templateflow cache back over the fake cache.
It turns out that MNI152NLin6Asymres-4 is 0.7 mm isotropic unlike MNI152NLin2009cAsymres-4, which is 4 mm. So when I switched, my bold files were blowing up in size and causing a memory problem.
Hope it’s okay to add to this thread, I just wanted to note that this seemingly idiosyncratic error happened to me as well.
I ran into the same censor_report / matplotlib findfont crash on an HPC system using Apptainer after having processed almost 1k subjects without any issues (1 subject per run), and then suddenly every run started crashing.
As for @Trevor_Day, binding /usr/share/fonts into the container did not resolve the problem, it kept failing with the same error. The issue was fixed by @Trevor_Day’s solution: forcing per-run writable caches, i.e: