Tedana - Evaluating Output Measures for Quality Control

Hi everyone,

I’m new to tedana and multi-echo BOLD data, and I had a few questions about how to evaluate the tedana outputs for the purpose of quality control.

I’m working with a resting state fMRI dataset of multi-echo data (2.5 mm isotropic, TR = 1670 ms, TEs = 15.60 ms, 38.20 ms, 60.80 ms, 83.40 ms, GRAPPA acceleration factor 2, multiband factor 4) in depressed but otherwise healthy adults. Each participant has two runs (AP and PA) of 300 volumes each. The data has been pre-processed using fMRIPrep 23.2.3 with the --me-output-echos flag.

I ran a preliminary test of tedana 24.0.1 on 50 participants (100 runs) after removing dummy volumes by following the example here. I used the options tedpca="aic", fittype="curvefit", and tedort=True, with the intention of using the rejected components as custom confounds in XCP-D.

For the PCA step, the number of included components using AIC is on average 44.6 (range 21 to 113), and these components explain 63% of the variance in the data (range 41% to 87%). Considering that a single run has 300 volumes, 45 PCA components feels low. I have seen general guidelines on these and other forums that the number of components should be less than 1/2 of the number of volumes, but more than 1/5th.

After the ICA, on average there are 29 (11 to 70) accepted components and 15.5 (4 to 43) rejected components, and the variance explained by decomposition is 85.99% (65.42% to 95.30%).

To summarize my questions:

  • For this dataset, does the number of components or the variance explained after PCA seem too low considering 300 volumes? If so, should I try tedpca="kundu"?
  • Similarly, is the variance explained by ICA acceptable?
  • Are there other metrics that are useful in quality control?
  • In general, are there acceptable ranges for these metrics (for example, variance explained by decomposition should be > 80%), or thresholds where a participant should be excluded (either hard thresholds, or based on standard deviation)?

I know there often aren’t hard cutoffs for metrics like these, and every dataset is different, but any guidance would be appreciated. We plan to manually inspect the components from a subset of participants, but we will have up to 600 resting state runs, so I’d like to come up with some guidelines for data-driven or automated quality control. Thanks for the help!

Sincerely,
Keith Jones

Hi Keith,

These are great questions and I’ll try to answer them. First I’ll say that the PCA dimensionality estimation step is one remaining we’re seeing problems and instability so you’re fully right to look carefully this and have concern. A previous discussion that covers some of this is here Tedana - ME-ICA fails to converge or finds no BOLD components. We are also working on one approach that might be more stable (Adding robustica option to ICA decomposition to achieve consistent results by BahmanTahayori · Pull Request #1013 · ME-ICA/tedana · GitHub). I’d be happy to discuss this issues more (particularly if you might be interested in helping solve this issue)

  • Variance explained can appropriately vary widely even in a dataset with constant acquisition parameters. For example, if a scan has a particularly large linear drift or a larger amount of head motion, then more of the signal is structured in a way that can be modeled by PCA & ICA. I haven’t directly tested this, but, if you have the same acquisition parameters and the same task designs, I’d expect the total variance (not variance explained) of the accepted components to be slightly more stable.
  • Your average variance explained of 63% seems a bit low to me, but this also depends on acquisition parameters. Similarly 21 components is definitely too low. As mentioned in the other neurostars Q posted above, the tedana output includes ./figures/pca_criteria.png which is the cost function used for estimating the number of PCA components. If you look at the other post, there are example curves. They generally gradually drop, have a local minimum (the # of componts) and then gradually rise. If you see that, I’d say the estimate is plausible. There are some examples shown in that post where there is a steep drop and a rapid rise. That’s a sign that something went wrong with the cost function. The other failure is when the cost function gradually keeps increasing and the estimated number of components is close to the total number of volumes. It doesn’t look like you’re seeing this now, but it is another observed failure.
  • tedpca="kundu" might work, but it might also fail in different ways. Feel free to try
  • Short of waiting for us to implement a more robust solution, the recommendation I’ve given to a bunch of people is to idenitfy a typical number of components identified and set tedpca to a fixed number of components for either the runs that failed or all runs. For example, if when excluding the failure runs the 70th percentile run has 100 components use tedpca=100 This isn’t ideal, but it’s is reasonable. In the end, we are using components to identify which ones to reject so as long as there are not so many components that the ICA has trouble stably converging, in the end you get back to a single denoised time series per voxel. This is also essentially what other ICA packages, like GIFT do by using a consistent number of ICA components to align data across runs.

Let me know if this makes sense and if you additional questions.

Best

Dan

1 Like

Hi Dan,

Thanks so much for your thorough explanation! The thread you linked was also very helpful. I inspected some of the PCA criteria curves from my original tedpca=aic run, and it does seem that the curves look off (steep initial drop, then medium to large rise) for participants with very few components (30 or fewer). The curves looked more reasonable for participants that ended up with 50 or 60 components. So, it may be that the highly-accelerated nature of our data is causing issues with the PCA cost function, as hypothesized in the linked thread.

I ran a subset of the data using tedpca=kundu, which gave an average of 223 total components (as high as 298). This seems excessive for data with 300 volumes. I was also seeing a lot of warnings along these lines: "WARNING utils:dice:304 264 of 298 components have empty maps, resulting in Dice values of 0." It does seem that, for this dataset, this method isn’t a great option and may include many components that are just Gaussian noise.

After these tests, I decided to run the full dataset using tedpca=75, or 1/4th of the total volumes, based on your recommendation. This seemed to give more reasonable results:

  • Variance explained after PCA: 71% ± 8% (range 50 to 95)
  • Average accepted components: 49 ± 8 (range 21 to 67)
  • Variance explained by decomposition: 90% ± 5% (range 68 to 99)

What do you suggest when using these data to potentially exclude participants from further analysis? I’d be most concerned about the lower ranges, especially those that are multiple standard deviations below the mean, but it’s not clear to me how to best define a threshold for inclusion or exclusion.

I additionally looked at the variance explained by accepted vs. rejected components. As you mentioned, this can change with movement, etc. However, one run (which ended up having extremely high movement, upon inspection) stood out as having 99% of variance included in rejected components. Do you have a rule of thumb where, if X% of the variance lies in rejected components, there isn’t really enough good data left for further analysis? Thanks again for the great explanation!

Sincerely,
Keith Jones

Hi Keith,

I’m glad using a fixed number of components gave slightly more stable results.

I don’t have a strict rule for rejecting based on % rejected variance. If you have a high motion subject and a lot of variance is rejected motion, that’s a good thing. That said, if a run has motion artifacts of that magnitude, you might want to exclude due to motion (rather than specifically because of tedana outputs).

On thing to possibly look at is the number of accepted components. If there are only 10 accepted components, the brain should have more structured signal than that. Even 21 sounds reasonable, but that also depends on what you are seeing across your dataset. You might want to look at the html tedana report for the subjects with the most and least accepted components to see if the classifications are plausible. Descriptions of the figures in the report are here: Outputs of tedana — tedana 24.0.1 documentation

Primarily look for rejected components that should have been accepted because those are what might cause the most problems in later analysis steps.

Best

Dan