I agree that a good choice of number of component will depend on the richness of the data. I am not at all convinced that MDL is optimal in any way. And when you say MDL, you really mean Melodic's implementation of MDL, and not the textbook MDL which isn't applicable to fMRI. I've seen Melodic blow on some data, giving number of components as large as the data. I don't think that you'll find the MDL that you're thinking about in anything else than Melodic. It's not something very principled nor does it solve well any problem, so it has enjoyed limited success.
What I would do these days is stick to a very simple heuristic that is costless, easy to code, and easy to understand. Mind you, this is not what I would have replied years ago: in the original CanICA paper, we had a sophisticated bootstrap approach that worked quite well, but was terribly costly. It's just that, these days, I believe that selecting the number of components is an imperfect art, that heuristics will get things wrong, so we might as well have simple ones.
I can suggest either taking a number of component that is related to the log of the number of data points (that's like a cheap AIC/BIC). Or I can suggest taking a number of components to explain a fraction of the variance, eg 90%.