Robust-tedana implementation taking extremely long - seeking advice on optimal parameters

anastasios · February 11, 2026, 1:02pm

Summary of what happened:

Hi everyone,

I’m implementing the Robust-tedana pipeline from Tahayori et al. (2025) on my multi-echo, multi-band fMRI data, but I’m experiencing very long processing times and would appreciate guidance on parameter selection.

My pipeline:

MPPCA denoising on raw multi-echo data (Done)
fMRIPrep preprocessing on MPPCA-denoised data (Done)
Modified tedana with:

--tedpca 0.99999 (to preserve all variance, as paper states PCA should preserve all variance after MPPCA)
--ica-method robustica
--n-robust-runs 50
Modified decision tree from the paper

My data:

TR = 1.5s
~320-550 volumes per run
Voxel size: 2.5×2.5×2.5

Problem: With --tedpca 0.99999, I’m getting 340 PCA components, and each robustica iteration is taking ~3 hours. This means:

~150 hours per run (50 iterations × 3 hours)
Computationally infeasible for my dataset

Standard tedana (with default options) completes in 1.5-4 hours per run depending on the subject.

Questions:

Is --tedpca 0.99999 the correct interpretation of “preserve all variance” from the paper? The paper’s dataset was only 202 volumes, much smaller than mine.
Would --tedpca 0.95 or --tedpca 0.90 still be consistent with the robust-tedana approach, given that MPPCA has already removed thermal noise?
Is reducing --n-robust-runs from 50 to 30 reasonable for balancing robustness with computational time?
Are there other parameters I should adjust for larger datasets?

Any guidance would be appreciated!

Command used (and if a helper script was used, a link to the helper script or the command generated):

   "${TEDANA_BIN}" \
    -d "${echo1}" "${echo2}" "${echo3}" \
    -e ${TES} \
    --mask "${mask}" \
    --out-dir "${OUT_RUN_DIR}" \
    --tedpca 0.99999 \
    --ica-method robustica \
    --n-robust-runs 50 \
    --tree "${DECISION_TREE}" \
    --n-threads ${SLURM_CPUS_PER_TASK}

tsalo · February 11, 2026, 1:56pm

Bahman Tahayori mentioned in a tedana issue that --n-robust-runs plateaus around 20 - 30, so I agree that reducing the number of runs would be good. Other than that, increasing the number of threads is the only thing I can think of. @handwerkerd any suggestions?

handwerkerd · February 11, 2026, 3:42pm

For Robust-tedana, you are doing what is recommended in the paper. I have had some back-and-forth with the co-authors that the --tedpca 0.99999 confuses me. The logic of it is, if MPPCA removes most of the noise, then most of the remaining components are signal and --tedpca 0.99999 should be able to be modeled few components. It works on the data in the manuscript and is accurately presented.

If you’re getting 340 PCA components with 350 volumes, then this assumption is failing on your data. I think your intuition is correct that setting --tedpca to 0.95 or 0.9 (or lower) should be consistent with the overall approach. I would not trust tedana’s dimensionality estimation methods when you’ve first run MPPCA, since I’m not sure how they’d interact. That said, if you run tedana with one of these options (i.e. --tedpca aic) it will generate figures/pca_variance_explained.png which is the number of components across variance levels. If you’re existing data with --tedpca 0.9999 you can also generate a curve of explained variance for number of components using the variance explained column of desc-PCA_metrics.tsv I suggest finding a point where variance explained stops going up rapidly on your data and I expect that point to be around 1/3 of your total number of volumes.

Let me know if that partially helps.

That all said, 1.5-4 hours for a run with 2.5mm^3 voxels and only 350 volumes seems very slow. That takes less than 10 minutes on my Mac laptop from a few years ago. One thing that really causes slowing is if you are getting >340 components for 350 volumes. If that’s happening even without MPPCA then I again suggest setting a lower number of components using --tedpca FWIW, I’ve been recently digging into the estimates of the proper number of components and it seems like this is a generally less solved issue in fMRI than is often claimed, so, until I see a clear justification for a truly correct estimate, I’m more and more fine saying people can pick something that seems stable on their data (i.e. 1/3-1/2 the number of components as time points)

Dan

anastasios · February 11, 2026, 7:18pm

Hi Taylor and Dan,

Thank you both for such a quick reply! Your suggestions were spot-on.
Just for a bit of context, I’m working with two types of fMRI data: a naturalistic movie viewing paradigm with ~350 volumes, and high ecological event-related decision-making task with 3 runs of approximately 500 volumes each.
Following Dan’s advice, I analyzed the variance explained curve and discovered that default tedana with AIC selected 43 components explaining 63.6% variance, while Robust-tedana with tedpca 0.99999 selected 340 components, nearly all available volumes. Since my movie task for this specific participant has 341 volumes, the 0.99999 threshold was keeping essentially everything, resulting in approximately 3 hours per robustica iteration and making the analysis computationally extremelly demanding.
Dan’s 1/3 rule was remarkably accurate for my data. For the movie task with 341 volumes, one-third equals 113 components which captures 87.5% of variance. For the decision-making task with 542 volumes, one-third equals 180 components. The 113-component threshold sits right at the elbow where diminishing returns begin, and your prediction was validated perfectly on my data.
I’m now running with tedpca 0.875 to get approximately 113 components for the movie task while allowing it to scale appropriately for the decision-making runs, combined with n-robust-runs 30 reduced from 50.
Also, just to mention before implementing Robust-tedana, I compared fMRIPrep outputs alone versus fMRIPrep followed by “default” tedana and computed tSNR for both. I saw tSNR improvements across all tested subjects in the decision-making task when tedana was applied.
I’ll report back once the runs complete with how the results compare to “default” tedana. Thanks again for the expert guidance!

Best,
Anastasios