Hi Keith,
These are great questions and I’ll try to answer them. First I’ll say that the PCA dimensionality estimation step is one remaining we’re seeing problems and instability so you’re fully right to look carefully this and have concern. A previous discussion that covers some of this is here Tedana - ME-ICA fails to converge or finds no BOLD components. We are also working on one approach that might be more stable (Adding robustica option to ICA decomposition to achieve consistent results by BahmanTahayori · Pull Request #1013 · ME-ICA/tedana · GitHub). I’d be happy to discuss this issues more (particularly if you might be interested in helping solve this issue)
- Variance explained can appropriately vary widely even in a dataset with constant acquisition parameters. For example, if a scan has a particularly large linear drift or a larger amount of head motion, then more of the signal is structured in a way that can be modeled by PCA & ICA. I haven’t directly tested this, but, if you have the same acquisition parameters and the same task designs, I’d expect the total variance (not variance explained) of the accepted components to be slightly more stable.
- Your average variance explained of 63% seems a bit low to me, but this also depends on acquisition parameters. Similarly 21 components is definitely too low. As mentioned in the other neurostars Q posted above, the tedana output includes
./figures/pca_criteria.pngwhich is the cost function used for estimating the number of PCA components. If you look at the other post, there are example curves. They generally gradually drop, have a local minimum (the # of componts) and then gradually rise. If you see that, I’d say the estimate is plausible. There are some examples shown in that post where there is a steep drop and a rapid rise. That’s a sign that something went wrong with the cost function. The other failure is when the cost function gradually keeps increasing and the estimated number of components is close to the total number of volumes. It doesn’t look like you’re seeing this now, but it is another observed failure. tedpca="kundu"might work, but it might also fail in different ways. Feel free to try- Short of waiting for us to implement a more robust solution, the recommendation I’ve given to a bunch of people is to idenitfy a typical number of components identified and set
tedpcato a fixed number of components for either the runs that failed or all runs. For example, if when excluding the failure runs the 70th percentile run has 100 components usetedpca=100This isn’t ideal, but it’s is reasonable. In the end, we are using components to identify which ones to reject so as long as there are not so many components that the ICA has trouble stably converging, in the end you get back to a single denoised time series per voxel. This is also essentially what other ICA packages, like GIFT do by using a consistent number of ICA components to align data across runs.
Let me know if this makes sense and if you additional questions.
Best
Dan