Automatic estimation of number if ICA components in nilearn (CanICA)

Are there any pythonic solution for estimating the number of ICA components (ICAsso, BIC, laplacian methods…) that I could use with combination with nilearns CanICA? I’m looking for something similar to what melodic is providing - an automatic estimation of the optimal lower dimensionality. Any pointers would be very much appreciated.

BTW this is a nice summary of this topic across programming languages:

I don’t know whether this answers your question, but we used a cross-validation approach to a related question: how many PCs should you include in a noise regressor pool, to denoise task-based fMRI data.

This was done by keeping aside some data in each round and checking to see when prediction of these data reached an optimum. Details here:, with code available in Matlab (sorry…).

I imagine a similar approach could be used with canICA. If applied to rsfMRI, you would have to consider what exactly you are trying to predict, because predicting voxel time-series is probably out of the question, in contrast to the task-based fMRI case.

I want to use it for quality control - without the need to know what model people will want to fit to the data. That’s why cross-validation scheme would not work.

I found an interesting bootstrapping solution,
but I’m afraid it would be prohibitevely slow for my application (multiband data, looking at artefacts that are partially outside of the brain…). There is some work to speed it up, but it involves temporal filtering and downsampling which I would like to avoid (because I’m interested in describing raw data).

On a more practical note I’m not aware of any python based methods for estimating the optimal number of ICA components.

No, nilearn doesn’t provide BIC or other variants. We could, it’s pretty easy to implement, but I don’t believe at all that it is providing a useful or meaningful number. Even the authors of the original papers using these methods no longer set the dimensionality in such way. ICAsso or bootstrap are more interesting approaches, but they are costly.

My rule of thumb is that the more the data, the larger the number, starting with 20 to 30 components for single-subject data, and moving up to one or two hundred for large groups. But that really depends on how you’re going to use the results of the ICA. Also, I don’t want to code that rule of thumb in nilearn in any way, because then people are going to start using it as if it was something correct or optimal, while it really is not.

Thanks for your reply.

In my application, I am trying to visualize noise components derived from raw data. As you can imagine the optimal number of components will depend on the length of the scan, TR, type of task and nature of artefacts. I don’t think that setting the number of components to a fixed value in this application would be the best way to go. A data driven heuristic (even if noisy or biased) would be more informative.

Do you know of any example implementation of MDL for assessing optimal component number (in python)?

I agree that a good choice of number of component will depend on the richness of the data. I am not at all convinced that MDL is optimal in any way. And when you say MDL, you really mean Melodic’s implementation of MDL, and not the textbook MDL which isn’t applicable to fMRI. I’ve seen Melodic blow on some data, giving number of components as large as the data. I don’t think that you’ll find the MDL that you’re thinking about in anything else than Melodic. It’s not something very principled nor does it solve well any problem, so it has enjoyed limited success.

What I would do these days is stick to a very simple heuristic that is costless, easy to code, and easy to understand. Mind you, this is not what I would have replied years ago: in the original CanICA paper, we had a sophisticated bootstrap approach that worked quite well, but was terribly costly. It’s just that, these days, I believe that selecting the number of components is an imperfect art, that heuristics will get things wrong, so we might as well have simple ones.

I can suggest either taking a number of component that is related to the log of the number of data points (that’s like a cheap AIC/BIC). Or I can suggest taking a number of components to explain a fraction of the variance, eg 90%.