Negative Euclidean distances after cross-validation for RSA


#1

Hi all,

I’m conducting my first RSA (via the Decoding Toolbox), and have just run the decoding_template_crossnobis_filled.m script to cross-validate and normalise my results (as suggested by Walther et al., 2016, NeuroImage). However, I get negative distances in my matrix, and even though I’ve read that this is possible due to Euclidean distances not being mathematical distances per se, I’m a bit confused about how to interpret such results.

More specifically:

  1. Is the direction (positive/negative) interpretable?
    and following from this;
  2. Can you directly compare positive values to negative values in a meaningful way?

Any help or insight would be much appreciated!

Best,
Haeme


#2

Hi @haeme,

cool and important question!

I never used this functionality within TDT (BTW could you maybe add tags to all your future
posts on neurostars.org, as this allows for better documentation, indexing and searches, as well as pointing experts, that will most likely able to help you, into the direction of your question? For example, useful tags for your question would be: TDT, RSA, cross-validation). I assume it works comparable to the respective RSA toolbox functions?

I assume you already read Alex Walther’s paper on the reliability of dissimilarity measures? There is also a great introductory tutorial and a new preprint that maybe shed some light on your question. Full disclosure: I’m by far no expert on this. Basically, the occurrence of negative distances is not related to the euclidean distances per se, but their cross-validation. As you cross-validate between runs and assuming that the estimated noise is independent between them, their true distance should be zero if they only differ by noise (this is also related to the multivariate noise normalization). As mentioned in the tutorial, especially very small distances can sometimes become negative, which overall is “an inevitable characteristic of an unbiased estimator”. So, if you cross-validate across runs and then average across folds, chances are that your RDM estimates can contain negative values in the diagonal and off-diagonal.

Interpretation is another thing and I’m definitely not skilled enough to provide any advice on that. Regarding your 2. question: do you mean applying inferences directly on the matrix and its included distances or comparing those matrices with models?

I’m gonna include @Martin in this post and hope he has time to have a look, as he will be able to provide a way more helpful answer and more insights.

HTH, best, Peer


#3

thanks for bringing me in, @PeerHerholz, I think you already did a fantastic job at explaining!

The direction can be interpretable if you are dealing with a typical “cross-classification” situation, but for regular crossnobis, it’s the same as decoding: It can happen by chance, but it shouldn’t really happen. If it happens, it indicates that you are dealing with very low SNR or with some sort of confound. Anything in your design that is partially correlated with your effects of interest can cause spurious above-chance or below-chance, and specifically below-chance if that correlation is present only in some folds. The easiest way to figure this out is to see how global this trend is: if you find it pretty much everywhere, chances are you have a design confound. For dealing with it, I would recommend this paper and this paper.

Kind of. If you treat is as a statistical estimation issue yes. If the negative effects are really big or if it is a confound, then not really.

Best,
Martin


#4

Thank you @PeerHerholz and @Martin for your super helpful replies and I totally agree that these values could indicate a potential confound in my data. However, as the effect (of negative values) is not global across the whole matrix, I’m hoping that there may be other interpretations of the current results so will start by reading the papers you’ve linked in your replies. Thanks again for your help!