Nipype fsl - results not exactly replicating, need to set random seed?

kolydic · October 8, 2021, 6:15pm

Hi - I am new to nipype and FSL, not great w/ python.

My lab uses a python script as a wrapper for nipype running 1st level analyses, using FSL. We are running SpecifyModel() and we are doing some spatial smoothing from fsl.preprocess.create_susan_smooth()

I am not necessarily certain that it is coming entirely from the smoothing, but at some point in our preprocessing and analysis pipeline, there is some randomness being introduced. If I run the same subjects through the pipeline with the same data, I am ultimately getting slightly different results. Meaning, when I extract univariate responses from an ROI coming from an atlas, the mean magnitudes of GLM betas are different. These results are centered around a difference of 0 between iterations of running through the pipeline.

I am trying to narrow it down to something in fmriprep or the first level analysis, which is where I think it is coming from.

My questions are:

has someone encountered this before?
If I want to set a random seed and store the info, then feed the seed into nipype, such that each time I run through the pipeline, the seed is set as the same value, can I do that? Do I have to do this from scratch, or is there some built-in parameter that would allow something like this?
Since I don’t really know what I’m talking about I might be totally off – is this a problem that I’m trying to resolve in a totally incorrect way?

Thanks!

effigies · October 8, 2021, 7:34pm

fMRIPrep is not entirely deterministic. I believe if you run it with --omp-nthreads 1 --random-seed <some seed>, then it does become deterministic. (cc @ltetrel for verification). Still, it would probably make the most sense to stick with one fMRIPrep output and see if your first-level analysis is deterministic given the same preprocessed data.

Nipype does not have a universal random seed. Individual interfaces may have a random seed that can be set. And for interfaces that don’t have a specific seed input, but the underlying tool does, you can still set the generic args input.
interfaces may also have a random seed field.
Setting random seeds is a reasonable thing to do from a determinism perspective. It’s a bit of a double-edged sword from an analytical perspective, as a result that depends on a random seed is problematic. Still, it’s good to know where the sources of your variability are.

ltetrel · October 8, 2021, 8:06pm

Indeed, I confirm what @effigies just said. I imagine that because threads are not tied to a physical cpu, they are more “volatile”, inducing this non-reproducibility.
However using --nprocs, --nthreads, --n_cpus or --n-cpus seems to be reproducible (still --random-seed is mandatory here).
Overall, I cannot say if this comes from nipype, of FSL

kolydic · October 16, 2021, 3:37pm

@ltetrel and @effigies – Thank you both for your responses and advice, this was really helpful to start working out a solution.

I have a few follow up thoughts –

I was reading this thread:

.. which is leaving me confused as to whether using --random-seed with fmriprep does actually produce deterministic results. As of the end of that thread in september, it seems like the answer is no, but that it’s obviously an fmriprep issue, and one that may also apply to any other preproc pipeline. After reading that thread it sounds like there is still some confusion and it may not be resolved among developers? Maybe this isn’t a question for either of you, but please point me in the right direction if not so I can find some updated info about that!

Much of the variability is coming from fmriprep, but some is still introduced in the first-level analysis run through the same fmriprep outputs (thanks again @effigies for the great troubleshooting suggestion).

I need to decide whether we want to try and implement a direct-reproduction option into our lab pipeline (i.e. build in a toggle to set the seeds or not), but first I’d really like to understand whether that is going to be effective and if it makes sense. @effigies your comment about a double-edged sword is a little confusing to me w/r/t this question:

a result that depends on a random seed is problematic.

If the seed is not fixed and I get a single output, isn’t the same thing still true but just in a way that isn’t reproducible? As in, the results are still dependent upon the seed, even if I am not the one setting it. In order for it to really not depend on the seed at all, we’d have to do something like what winkler suggested in the thread above, which is basically to run the process many times and merge the outputs across iterations to produce results that were not dependent upon a single fixed (or random) seed at all.

Could you explain the logic of why a process dependent upon a fixed seed is more problematic than results dependent upon a random seed? (I think I might be missing something about how it’s used?)

thanks again for your help

kolydic · October 18, 2021, 2:17pm

ok my other reply aside, a more pressing Q is how do I actually set the seed for my first level analysis? I am only using the FSL interface, and I’m not finding anything in the documentation on how to set a seed. Not sure how to feed in a seed as a generic arg either

Any info / links to documentation and examples?

thanks!!

effigies · October 19, 2021, 2:39pm

--omp-nthreads 1 --random-seed <some seed> is deterministic. If either of these flags is changed, you will get different results. One additional source of variation is FreeSurfer, but if you use the same FreeSurfer directory (instead of recalculating it), then that will not introduce non-determinism.

kolydic:

a result that depends on a random seed is problematic.

If the seed is not fixed and I get a single output, isn’t the same thing still true but just in a way that isn’t reproducible? As in, the results are still dependent upon the seed, even if I am not the one setting it. In order for it to really not depend on the seed at all, we’d have to do something like what winkler suggested in the thread above, which is basically to run the process many times and merge the outputs across iterations to produce results that were not dependent upon a single fixed (or random) seed at all.

Could you explain the logic of why a process dependent upon a fixed seed is more problematic than results dependent upon a random seed? (I think I might be missing something about how it’s used?)

Sorry, I just meant that any result that critically depends on the value of a seed is spurious. If a method does not converge, you have a problem that can’t be fixed with determinism. I agree with everything you said above.

What’s the tool? If you can’t find docs, you can try asking on the FSL mailing list or looking in the FSL source code (here’s a public copy: src · master · Michael Krause / fsl · GitLab) to see if there’s a hidden flag.

kolydic · October 21, 2021, 2:50pm

@effigies ok awesome thank you for clarifying all this and pointing me in the right direction. super helpful!!

kolydic · April 8, 2022, 9:29pm

I am opening this thread again because I have finally circled back to implementing the fixed-seed commands with updated fmriprep. I use the command

using fmriprep:latest container from dockerhub (retrieved a couple weeks ago)

fmriprep $bids_dir $out_dir participant --participant_label $subject --mem_mb 15000 --ignore slicetiming --nthreads 1 --omp-nthreads 1 --random-seed $randseed --skull-strip-fixed-seed --use-aroma -w $scratch --fs-license-file /cm/shared/openmind/freesurfer/6.0.0/.license --output-spaces MNI152NLin2009cAsym:res-2

If I replicate reading $randseed from a file, I still do not get totally deterministic results. The variability I am seeing is minor, but it is also not lesser variability than if I remove --omp-nthreads 1 --random-seed $randseed --skull-strip-fixed-seed . Why would this be?

kolydic · April 8, 2022, 9:55pm

e.g. i am seeing some minor variations in raw functional timecourses, and also in the edges of reconstruction (T1, brain mask), the variability of which is identical with and without the flags

effigies · April 9, 2022, 2:08am

You’re reusing a previously computed FreeSurfer directory?

kolydic · April 13, 2022, 2:52pm

No, I’m not reusing an existing Freesurfer dir, so I do expect some variation from that as you explained before.

Is this actually the only source of variation? I just don’t understand why using these flags / reusing a seed resulted in the same amount of variability as not using them

DiegoRg · September 10, 2023, 7:51pm

Does this mean that fmriprep results can’t be fully reproduced with datalad re-run, given that recon-all is non-deterministic (assuming that i am running recon-all inside fmriprep’s pipeline)?

Also, should I use a different random seed for every subject or is that irrelevant?