TDT: Factors rather than each condition as my labels

Dear all,
I have a 2x3 repeated measures design, therefore there are two factors (difficulty and distractor type) and a total of six conditions in my design matrix. While decoding, I would like to use my main factors as the labels rather than using all conditions. Is there a way in the decoding toolbox functions I can do it without changing my design matrix in SPM?

Thanks in advance,
Hilal

Hi Hilal,

Yes. For the automatic selection of betas (which is the convenient way), you can use asterisks or regular expressions. In essence, when calling decoding_describe_data, you would have to define a string search that would allow you to find both / all three conditions. For example, if your conditions are called 'difficulty level 1' and 'difficulty level 2', then you could just provide 'difficulty level*' for label1. If they are called very differently, you can use regular expressions: start the labelname with regexp:, followed by the regular expression that defines all conditions. In the simplest case, this could be 'regexp:(difficult)|(easy)'.

For the manual selection of betas, you would just have to figure out which beta goes with each condition, select them accordingly and assign them labels (see decoding_tutorial.m).

All that said, please note that with factorial designs you need to make sure that your conditions remain balanced even during cross-validation, else you risk getting false above chance or below chance accuracies. The reason is that cross-validation may destroy the nicely balanced factorial design and will prefer conditions that are more frequent in the training data over conditions that now are more frequent in the test data, which will lead to below-chance accuracies. More details can be found here and here (Fig 3 bottom).

If you are worried that this might affect your results, consider using cross-validated MANOVA, which just allows you to run the same type of analysis you are used to in your GLM but using a searchlight approach. The result is d, an unbiased estimate of the pattern discriminability for your contrast of interest that you can feed into a second-level analysis.

Happy decoding!
Martin

1 Like

Dear Martin,
Thanks a lot for your quick and detailed response! If you don’t mind, I’d like to ask a more detailed question.
Because I am a beginner at decoding, in order to understand what you’ve written, I will ask one easy and three considerably difficult follow up questions based on decoding_tutorial.m:
My current model is as follows
Factors: Difficulty (Easy, Hard); Distractor Type (A,B,NoDist) (Note: The trial amount for NoDist is the sum of distractor A and B)
Design matrix: Easy-A, Easy-B, Easy-NoDist, Hard-A, Hard-B, Hard-NoDist + motion regressors repeated over 8 runs
For the decoding…
regressor_names = design_from_spm(beta_loc); → here I get the regressor names from the SPM.mat file directly
(Q1.) Then, if I want to classify the images based on only the difficulty (in which my labels would be ‘Easy’ and ‘Hard’), how should I provide input for “cfg=decoding_describe_data;”? Should it be as follows?
cfg = decoding_describe_data(cfg,{Easy-A, Easy-B, Easy-NoDist, Hard-A, Hard-B, Hard-NoDist},[1 1 1 -1 -1 -1 ],regressor_names,beta_loc);
(Q2.) For the distractor type (distractor vs no_dist), should it be like this?
cfg = decoding_describe_data(cfg,{Easy-A, Easy-B, Easy-NoDist, Hard-A, Hard-B, Hard-NoDist},[0.5 0.5 -1 0.5 0.5 -1],regressor_names,beta_loc); → should I create the labels like a contrast,or should they be like [1 1 -1 1 1 -1]?
(Q3.) A pairwise classification for the distractors A and B?
cfg = decoding_describe_data(cfg,{Easy-A, Easy-B, Easy-NoDist, Hard-A, Hard-B, Hard-NoDist},[1 -1 0 1 -1 0], regressor_names,beta_loc); ← Here actually I would like to exclude the NoDist condition, this is why I wrote ‘0’ as if it was a contrast vector
(Q4.) And lastly, in order to classify the images based on all three distractors
cfg = decoding_describe_data(cfg,{Easy-A, Easy-B, Easy-NoDist, Hard-A, Hard-B, Hard-NoDist},[1 2 3 1 2 3],regressor_names,beta_loc); ← And here, I am not sure about the labelling at all

Thank you so much for sparing time, and I am so sorry if I have messed up and my questions are too basic or if you do not enjoy replying too specific questions.

Kind regards,
Hilal

Hi Hilal,

Q1/Q2: If you want to classify easy vs. hard, this might work (I can’t check right now, but just inspect the design afterwards). For distractor vs. no distractor this doesn’t work easily since it’s an imbalanced design. Generally, you also need to take into account how many trials go into the regressors and whether the regressors have different collinearity in each run (e.g. collinearity of A vs. B is larger than nodist). Check the papers I referenced for why, focusing on the examples where decoding is possible despite the absence of any difference in the estimates, only in the estimability.

Assuming the number of trials in A, B, and NoDist are equal and the collinearity is comparable, then two possibilities come to my mind: Separately classify A vs. nodist and B vs. nodist and average the results, or classify A and B vs. nodist and try using AUC as output metric, which can better deal with imbalanced data (but not sure how it would deal with such a strong imbalance). The labels are only dummy coded for classification, i.e. it’s not like a contrast but any two integers would work (even -121 and 124023).

If you are unsure about the design, then instead of decoding, use an encoding approach which is not sensitive to differences in the variability. In TDT, that could be crossnobis (we have a template for that). If you want to make full use of your factorial design, then consider CV-MANOVA.

Q3: just leave out nodist. But again, we are assuming equal number of trials in A and B.

Q4: Yes, that works.

Please take the time and read those two papers I referenced, it might otherwise come back at you later. It’s really worth the time.

Best,
Martin

1 Like

Thank you so much for sparing time for answering my question!