TDT: options to run crossnobis classification?

iris · December 20, 2021, 4:07pm

Hi there,

I’ve never done classification before, but I am currently using your TDT classification toolbox, and I really like the many options and flexibility it provides you with! However, I have quite some (basic?) method questions specifically regarding the many settings that exists, i.e., what they do, how to interpret them, and how they differ… As I said, I’m new to classification so please bear with me, and/or refer me to literature where applicable!

Because I would like to run a classification using crossnobis. And although the script is running, I feel like I don’t understand very well what is being done, exactly, or what would happen if I change some settings. So I’d mainly like to get more background information, if possible?
Specifically, I’m currently running a 2-class classification based on the decoding_template_crossnobis (i.e., using make_design_similarity_cv and setting cfg.results.output to {‘other_meandistance’, ‘accuracy_minus_chance’}). However, the accuracy output doesn’t give an error but simply returns NaNs only. So first I thought the script would use the t-values as a way to classify (hence also being able to return accuracy), but it seems that is not the case?

But do I indeed (only) get t-values as output, currently? Because that is the most important output that I want… More specifically; mean distance per voxel (here only unique values averaged across cross-validation iterations), expressed in a t-value? So the t-value then reflects how well the 2 classes can be classified/distinguished, correct? Because it returns both positive and negative values; are both informative (i.e., reflecting a categorization of either labelname1 or 2, respectively), or not?

Or if I do not get t-values; how should I set the output to get that? I understood you can also use RSA_beta or other_average. But specifically for RSA_beta, how does the beta-value differ from the t-value? And actually, I get an error when trying to use RSA_beta as output (Unrecognized function or variable ‘transres_RSA_beta’.). Is that because I’m running a classification instead of RSA? Or in which case would/could you use which output measure? And are these the ‘only’ meaningful output options, or are there others? Again; just for me to understand what’s possible…

And for cfg.decoding.software, I can set it to “distance” or “similarity”, where the former averages across data with the same label; the latter does not. But how are those averaged/not-averaged values used in the classification, exactly? And what do they represent? The average dissimilarity vs. a vector with similarity values to obtain a t-value from, or …? And when would you choose which? Because also here I’ve tried keeping everything the same except setting software to similarity, but it got me an error:
(Error using cveuclidean2 (line 16)
Size of both inputs needs to match.)

And a more general question: How does crossnobis differ from LDA, exactly? Would LDA (unlike crossnobis) return a classification accuracy? Because I’ve been trying to run lda by setting cfg.decoding.software to ‘lda’, but that returns an error about dot indexing. So I haven’t been able to compare the crossnobis and lda outputs. Should something else in the code be changed as well in order for lda to run, e.g., the model_parameters?

The error itself:
Dot indexing is not supported for variables of this type.
Error in ldatrain (line 51)
switch lower(param.shrinkage)
Error in lda_train (line 9)
model = ldatrain(labels_train,data_train,cfg.decoding.train.classification.model_parameters);
Error in decoding (line 535)
model = cfg.decoding.fhandle_train(labels_train,data_train,cfg);
Error in decoding_template_lda_tryout (line 239)
results = decoding(cfg);

And my last (general) question: I currently run a classification, so when running a second-level model, should I always use a permutation test, or might a simple t-test be ok if I use other_meandistance as output measure instead of an accuracy measure that’s bounded between 0 and 1?

Thanks so much in advance!

Best,
Iris

Martin · December 20, 2021, 8:39pm

Hi Iris,

Thanks for your message. There is a lot going on here, so I’ll try to respond to everything.

Crossnobis is an encoding approach which computes the (cross-validated) distance between two distributions. In essence, it’s quite similar to LDA (see below). Using accuracy as an estimate doesn’t make sense in this context. Hence, I’d stick to the defaults.

There is the so-called LD-c and LD-t. The former gives you the linear discriminant contrast (which is simply the cross-validated distance between both distributions, after whitening of the data). The latter is a new type of approach that also takes the variability of the prediction into account and scales it accordingly. The former has very nice theoretical properties, and the latter was only very slightly superior to the former. Hence, we did not implement it because we didn’t think that it’s well enough established.

I’d also not go into RSA_beta since this is related to running multiple regression models on several RDMs. Which is a little more involved.

distance is used for computing distances between e.g. 2 or more conditions, whereas similarity should return a similarity matrix. RSA_beta and related approaches are used for actually running RSA. In other words, think of distance as just another approach to tell you about the information content. I hope this makes sense.

In general, though, I totally agree with you that the options for RSA are not as intuitive as the rest of the toolbox. This really grew organically, and extensive and simplified RSA support is something I would like to add to the new version of the toolbox.

This is a longer topic that I think I might have covered in the past. It depends on what you would like to do with the test. If you would like to get an idea if your sample is significant, then you can run a t-test. However, the results would not necessarily generalize to the population but could be caused by a small subset of individuals. The reason is that for accuracies or related distance metrics, the random effects test collapses to a fixed effects test. For details, see Allefeld et al. on “valid population inference”.

Looks like I missed this question. LDA gives you the same result as LD-c, but the result is in form of classification and distance to a hyperplane. However, running LDA with continuous output (e.g. distance) and LD-c need not give the same result because LDA can have a different bias term, whereas for LD-c the bias term does not matter. This can matter when the distribution means are the same but when the variance is different between classes. For LD-c the result would mean “zero distance” on average, while for LDA this can yield very high classification accuracies. For details, see Görgen et al (same analysis approach) or Hebart et al. (deconstructing multivariate decoding).

Best,
Martin

iris · December 23, 2021, 3:26pm

Dear Martin,

Thanks so much for your extensive response; very helpful! I’ll read up on the mentioned references as well.

But now I’ve also got some questions about shrinkage. Because I got LDA to work now, by manually setting the param.shrinkage. I’ve tried none, pinv and lw2, basically to see what that does to the results. And it actually matters quite a bit! I also apply shrinkage when using crossnobis, and the shrinkage method is there by default set to lw2 - but e.g., lw also works.

Could you please tell me how the different shrinkage methods (e.g., pinv, lw, lw2, and oas) differ in terms of how they regularize the variance/covariance matrix? Different ways to induce sparsity? And how to know which method is optimal?

And do these shrinkage methods always regularize to the extent that is required only, i.e. so that the matrix is invertible? Because in that case, does that mean that the extent of shrinkage could differ per participant? And how could I know how much shrinkage is applied, or is that not really relevant?

And is it even required to apply shrinkage if you only receive a warning about singularity, or are there good reasons to always do it?

A reference to get me up to speed on this would also be greatly appreciated!

Thanks so much.

best,
Iris