General questions about TDT toolbox function

Hello TDT users,

I have some general questions regarding TDT toolbox function!

In my project, I have a slight imbalance of data in each run. To give you a better picture of what my data looks like, I have attached a sample data that I created.

So, my questions are:

  1. I want to make a decoding design matrix to conduct leave-one-chunk-out cross-validation, so I am planning to use function make_design_cv . In this case, what happens if I set = ‘ok’ in my script? (If I do not set this in my script, it doesn’t run because the data is imbalanced.) Because I am not exactly sure what this setting does, I am having trouble deciding whether using this process is valid for my data.

  2. If the above design matrix is not valid for my data, I think I should use the function make_design_boot_cv because you’ve mentioned that when creating decoding design matrix, use function make_design_boot_cv to preserve balance if there is the imbalance of data in each run (in your paper). Is there any criteria to use this function , and is it more appropriate to use function make_design_boot_cv than to use function make_design_cv for my data?

  3. Lastly, If I use make_design_boot_cv for my data, I should set n_boot for iterations(steps). Here, are there any standard criteria for setting the n_boot? If I am correct, the result would be more accurate as the n_boot value increases, but it would also take longer to produce the result. How can I be sure if the current n_boot value is optimal for my data or not? So, I want to know what n_boot value is sufficient for my data.

Any feedback would be great, Thanks!

~ Taehyun Yoo


Hi Taehyun,

Thanks for your questions!

Re 1: Is it valid to use leave-one-chunk-out with unbalanced data?

I think it’s ok, depending on the classifier, but in your case this slight imbalance may still lead to a preference of the classifier for classes A and B. You may deal with this by using AUC_minus_chance instead of accuracy_minus_chance. Alternatively, I think the correlation_classifier is not as sensitive to imbalances, although I’m currently not sure it can deal with multiclass approaches.

Re 2: What is the point of make_design_boot_cv

Yes, this is the approach we would recommend, even though it takes longer.

Re 3: How many iterations?

I think you can live with very few since the imbalance is rather small. So probably n_boot = 10 should be enough, or even n_boot = 5. Note that this is not bootstrapping but subsampling (we used the wrong term for this when creating the function, bootstrapping would be sampling with replacement, but subsampling is without replacement).

Hope that helps!

P.S.: I’m assuming you are using trialwise decoding, right? For runwise decoding, you wouldn’t have such imbalances, since you would have one beta per condition per run, and everything would run super fast. The imbalances are so small that this shouldn’t affect the estimability of the betas.

While the results would have rather large steps in terms of possible accuracy values, if you smooth your searchlight results they would become closer to a normal distribution. Might be worth a try.

Dear Martin,

Thanks for your fast and kind answer! This is exactly what I wanted, so it is really helpful :slight_smile:
Yes, I am using trialwise decoding (for correct trials), so I will try it as soon as possible!!
Thank you once again.

~Taehyun Yoo

1 Like