Hi - I am new to nipype and FSL, not great w/ python.
My lab uses a python script as a wrapper for nipype running 1st level analyses, using FSL. We are running SpecifyModel() and we are doing some spatial smoothing from fsl.preprocess.create_susan_smooth()
I am not necessarily certain that it is coming entirely from the smoothing, but at some point in our preprocessing and analysis pipeline, there is some randomness being introduced. If I run the same subjects through the pipeline with the same data, I am ultimately getting slightly different results. Meaning, when I extract univariate responses from an ROI coming from an atlas, the mean magnitudes of GLM betas are different. These results are centered around a difference of 0 between iterations of running through the pipeline.
I am trying to narrow it down to something in fmriprep or the first level analysis, which is where I think it is coming from.
My questions are:
has someone encountered this before?
If I want to set a random seed and store the info, then feed the seed into nipype, such that each time I run through the pipeline, the seed is set as the same value, can I do that? Do I have to do this from scratch, or is there some built-in parameter that would allow something like this?
Since I don’t really know what I’m talking about I might be totally off – is this a problem that I’m trying to resolve in a totally incorrect way?
fMRIPrep is not entirely deterministic. I believe if you run it with --omp-nthreads 1 --random-seed <some seed>, then it does become deterministic. (cc @ltetrel for verification). Still, it would probably make the most sense to stick with one fMRIPrep output and see if your first-level analysis is deterministic given the same preprocessed data.
Nipype does not have a universal random seed. Individual interfaces may have a random seed that can be set. And for interfaces that don’t have a specific seed input, but the underlying tool does, you can still set the generic args input.
interfaces may also have a random seed field.
Setting random seeds is a reasonable thing to do from a determinism perspective. It’s a bit of a double-edged sword from an analytical perspective, as a result that depends on a random seed is problematic. Still, it’s good to know where the sources of your variability are.
Indeed, I confirm what @effigies just said. I imagine that because threads are not tied to a physical cpu, they are more “volatile”, inducing this non-reproducibility.
However using --nprocs, --nthreads, --n_cpus or --n-cpus seems to be reproducible (still --random-seed is mandatory here).
Overall, I cannot say if this comes from nipype, of FSL
@ltetrel and @effigies – Thank you both for your responses and advice, this was really helpful to start working out a solution.
I have a few follow up thoughts –
I was reading this thread:
… which is leaving me confused as to whether using --random-seed with fmriprep does actually produce deterministic results. As of the end of that thread in september, it seems like the answer is no, but that it’s obviously an fmriprep issue, and one that may also apply to any other preproc pipeline. After reading that thread it sounds like there is still some confusion and it may not be resolved among developers? Maybe this isn’t a question for either of you, but please point me in the right direction if not so I can find some updated info about that!
Much of the variability is coming from fmriprep, but some is still introduced in the first-level analysis run through the same fmriprep outputs (thanks again @effigies for the great troubleshooting suggestion).
I need to decide whether we want to try and implement a direct-reproduction option into our lab pipeline (i.e. build in a toggle to set the seeds or not), but first I’d really like to understand whether that is going to be effective and if it makes sense. @effigies your comment about a double-edged sword is a little confusing to me w/r/t this question:
a result that depends on a random seed is problematic.
If the seed is not fixed and I get a single output, isn’t the same thing still true but just in a way that isn’t reproducible? As in, the results are still dependent upon the seed, even if I am not the one setting it. In order for it to really not depend on the seed at all, we’d have to do something like what winkler suggested in the thread above, which is basically to run the process many times and merge the outputs across iterations to produce results that were not dependent upon a single fixed (or random) seed at all.
Could you explain the logic of why a process dependent upon a fixed seed is more problematic than results dependent upon a random seed? (I think I might be missing something about how it’s used?)
ok my other reply aside, a more pressing Q is how do I actually set the seed for my first level analysis? I am only using the FSL interface, and I’m not finding anything in the documentation on how to set a seed. Not sure how to feed in a seed as a generic arg either
--omp-nthreads 1 --random-seed <some seed> is deterministic. If either of these flags is changed, you will get different results. One additional source of variation is FreeSurfer, but if you use the same FreeSurfer directory (instead of recalculating it), then that will not introduce non-determinism.
Sorry, I just meant that any result that critically depends on the value of a seed is spurious. If a method does not converge, you have a problem that can’t be fixed with determinism. I agree with everything you said above.
If I replicate reading $randseed from a file, I still do not get totally deterministic results. The variability I am seeing is minor, but it is also not lesser variability than if I remove --omp-nthreads 1 --random-seed $randseed --skull-strip-fixed-seed . Why would this be?