Minimal data for a reasonable hyperalignment or shared response model

Does anyone know how much film-viewing data are necessary to fit a good functional alignment model, both in terms of number of subjects and amount of data per subject?

I’m specifically interested in fitting a shared response model or running hyperalignment on the film-viewing task in the Queenland Twin Adolescent Brain dataset. The task is 5 minutes and 11 seconds long, with a TR of 0.8, and there are ~300 subjects with the task.

Hi Taylor!

I have recently worked on something which could partially answer your question.
It might not be exactly what you are looking for, as I did not use a Shared Response Model (SRM), but another functional alignment method based on optimal transport (which I would advocate is methodologically close to hyperalignment). Moreover, the dataset is not strictly speaking movie-watching, but more comparable to “clip-watching” (i.e. watching long sequences of 10-second video clips).

In short, it seems that if you rely solely on functional data, 5 minutes is probably not enough to derive alignments which are anatomically relevant (i.e. some voxels are going to be matched together even though they are very distant from one another on the cortex). However, with more data (about two hours), there is already enough signal that you can align some functional areas not only in the occipital lobe, but also in the parietal and temporal lobes. Nevertheless, is seems that alignments which are not anatomically relevant could still be useful depending on the downstream task.
Note that I was computing alignments between pairs of subjects, so not leveraging information from the entire group of subjects (which you could in part do with an SRM, although I am not sure it would work).

You can find more in this pre-print: [2312.06467] Aligning brain functions boosts the decoding of visual semantics in novel subjects
For context, I was using functional alignment to study how it can help to transfer brain decoders trained in one individual to another individual. Maybe Figure 4 is what is most relevant to you:

1 Like

@alexisthual that’s extremely helpful, thanks! Your tests using different amounts of data paint a pretty clear picture.

Obviously I wish that there was an easy tradeoff between the amount of subject-level data and number of subjects, but I can see why that wouldn’t be the case with functional alignment. It’s good to at least know that this isn’t an analysis I can apply to the QTAB dataset.

Sincerely, I would not want to discourage you from running this sort of analysis on the QTAB dataset haha. I think the resources I gave you are relevant when computing subject-to-subject alignments, but they might not hold for datasets with a high number of participants.

Actually, I’d be very curious to see if an SRM can pick up something interesting in this data regime (loads of subjects, little data per individual). I’d also be curious as to whether an optimal transport barycenter can help (something in this line maybe: https://github.com/alexisthual/fugw/blob/58cdfee03f39e6ae6f66b701489071af24664391/tests/mappings/test_barycenter.py#L34)