TDT: how to perform RSA-GLM and obtain beta values?

Ryuhei · June 3, 2020, 10:13am

Hi everyone,

I really appreciate the authors for creating the wonderful decoding toolbox, TDT.
Let me ask about how to perform GLM with representation similarity analysis (RSA-GLM) in TDT.
I would be very happy if you could give me some tips.

I would like to obtain beta values of three factors (behavior dissimilar matrix) against neural (dis)similarity matrix across each trial for each participant.
I believe that what I want to do is very similar to this previous post by Phil: RSA on individual behavioral ratings
But I could not find how to correctly perform RSA-GLM in TDT.

My current settings are like this (under version 3.994):
cfg.decoding.software = ‘similarity’;
cfg.decoding.method = ‘classification’;
cfg.decoding.train.classification.model_parameters = ‘zcorr’;
cfg.results.output = {‘rsa_beta’, ‘other’}; % I believe that ‘other’ output values here correspond to Z-correlations across each trial
cfg.design = make_design_rsa(cfg);

This script successfully runs and I could obtain rsa_beta values of each dissimilar matrix as well as a neural similarity matrix for each participant.
However, I am not sure whether cfg.decoding.method = ‘similarity’ is correct, instead of ‘regression.’
If it should be ‘regression’, what kind of parameters should be chosen for cfg.decoding.software?

I hope you will be able to provide the information.

Thank you!
Ryuhei

Martin · June 3, 2020, 11:41am

Hi Ryuhei,

In general, the RSA-GLM option is really quite advanced because it’s so flexible, so I would just be cautious. I haven’t used it myself in the last 4-5 years, and to be honest, it’s not written in a very intuitive way (sorry about that!)

I would try the following:

put in your three RSMs:
cfg.files.components.matrix = {matrix1, matrix2, matrix3};
The following selects all cells in the RSM for the comparison (it automatically just selects the lower triangular part though, so no need to worry):
cfg.files.components.index = (1:numel(matrix1))'; % this would allow you to select indices to include.

If that is in your code, everything else looks fine.

Now the catch is that I think it would be better to use distances rather than similarities for the comparisons, and potentially even cross-validated distances to get estimates unbiased by noise. I think 'cveuclidean2' would give you the cross-validated squared Euclidean distance. This means you might need to use cfg.design = make_design_rsa_cv(cfg); for this to work.

As I said, there are so many options that it becomes a little complicated at some point The alternative would be not to run this as a GLM but instead as three separate similarity analyses.

Hope this helps!
Martin

Ryuhei · June 4, 2020, 3:00am

Hi Martin,

I really appreciate your helpful comments!
I am happy to confirm that my code is fine, including cfg.files.components.
Separate similarity analyses might be nice, too. Thanks for a helpful suggestion.

Cross-validated squared distance sounds nice. I would be happy if you could give me some more tips about this procedure.

My current design is very similar to the previous post by Phil: RSA on individual behavioral ratings.
I have 6 sessions neural data. Half of 80 items (= 40 items) appeared on session 1, 3, and 5. Another half items appeared on session 2, 4, and 6. Therefore, output neural (dis)similarity matrix has size of 240 x 240.
Each behavior RDM in RSA-GLM indicates data of differences between two ratings across 80 items (size: 80 x 80). These behavior ratings were obtained in the task outside the scanner (not during fMRI task).

To get cross-validated output values, I should correctly arrange decoding labels for cross-validation. I am wondering how I should perform it.
I believe that in normal RSA, labels are meaningless, so settings are like this:
labels(1:2:length(labelnames)) = -1;
labels(2:2:length(labelnames)) = 1;
(* labelnemes: 80 items)
But such an arbitrary labeling seems not appropriate for cross-validated RSA.

I look forward to hearing from you.
Thank you!
Ryuhei

Martin · June 4, 2020, 8:15am

Hi Ryuhei,

I think what you are trying to do is really sophisticated. Essentially, if I understand correctly, you would like to create a similarity matrix that focuses on single trials and integrates across runs and then run multiple regression on it, all in a cross-validated framework, with a 240x240 matrix. I honestly think this analysis is not going to work. Single trials in an event-related design are likely to be correlated, and data within sessions will likely be more similar than between sessions. Unless you have 6 sessions and not 6 runs, results are likely going to be very noisy with a size of 240x240 and cross-validation. But beyond that, I think cross-validation will likely turn out to be very difficult, because based on your description you would have to do it with items from the same run, and cross-validation within run doesn’t work easily. Is your goal to relate the behavioral measures of similarity of a pair to the neural measure of similarity of the same pair? Why not just model an 80x80 matrix?

I think getting such a sophisticated analysis set up is not going to be easy. In any case, you just need to set cfg.files.chunk and then determine for each file in cfg.files.name how you would like to split it. For example, since 40 of your 80 items appeared in session 1, 3, and 5, it might make sense to make sessions 1 and 2 one chunk, then sessions 3 and 4 another, and then sessions 5 and 6 another. This would, however, only work for an 80x80 matrix.

If you would like to stick to the 240x240 approach, I would suggest not using cross-validation and not using the GLM approach, but classical RSA.

Hope this helps,
Martin

Ryuhei · June 4, 2020, 10:34am

Hi Martin,

Thanks for a quick and helpful reply!

Yes, that is right. And I am sorry for confusing you about the neural matrix model.

Before performing TDT, I obtained SPM.mat containing a model for each participant, which included 80 item onset-timing regressors (item01-80); thus, I think that the size of the neural matrix model is 80x80, not 240x240 (i.e. I set regressors in the way like item01, item02, … item80, not item01-session1, item01-session3, item01-session5, item02-session2,…).
In TDT, followng the template files, I set cfg information like this:
regressor_names = design_from_spm(beta_loc); % extract beta names and corresponding session information from SPM.mat
labelnames = {‘item01’, ‘item02’, … ‘item80’};

SPM.mat file has session regressors (session1-6), so the size of output neural similarity matrix is 240x240, not 80x80.
I think that this includes similarity values between same items of different sessions, such as item01 in session 1 – item01 in session 3.

I am sorry if my explanation is confusing…
I believe that this is 80x80 approach, not 240x240. But just let me know if I am wrong.
Following your suggestions, if the current procedure is surely being performed with 80x80 matrix model, setting chunks seems to allow me to successfully perform cross-validated RSA.

Many thanks!
Ryuhei

Ryuhei · June 24, 2020, 8:09am

Hi Martin,

I have two additional questions on how to obtain beta values in the RSA-GLM.

First, I wonder how I should appropriately model 80x80 matrix.
I actually could do it without session regressors. In more detail, I generated SPM.mat design matrix for each participant in a “concatenated” manner; that is, in the single session I included all EPIs obtained in 6 sessions as well as a hand-made session regressor for each EPI (e.g., EPI obtained in the session 3 was labelled as [0, 0, 1, 0, 0, 0]).
By doing this, I obtained 80 beta images without session regressor (e.g., betas = Sn{1}item01, Sn{1}item02, …). Each of them has 3 events. The size is the same as sizes in behavior RDM 80x80 matrix, so TDT did work.

However, if I generate a design matrix in a normal way (in which EPIs were set in separate sessions) and obtain 240 beta images (e.g., Sn{1}item01, Sn{3}item03, …), it did not work because the size was not consistent with 80x80 behavior RDM matrix.

Are there any appropriate ways to perform it, or is it inevitable to perform it with the concatenated design?
Actually, output rsa_beta values were very similar in both ways. I would like to know whether I can do it.

Second, I am feeling concern about whether output rsa_beta values are correct for what I would like to obtain.
To check it, I calculated beta values by using output neural similarities (obtained by the output = “other”) with R function “lm”, lm(neural ~ f1 + f2 + f3). I expected that I could replicate the results, but the results were totally different from rsa_beta values.
I checked the implemented script named “transres_rsa_beta.m”, but I could not figure it out well.
I would be happy if you could tell me how to calculate rsa_beta values in TDT.

I am really sorry for the long questions.
I look forward to hearing from you.

Thank you!
Ryuhei

Martin · June 24, 2020, 1:26pm

Dear Ryuhei,

Thanks for checking. The beta values are not standardized, so that could explain the difference, but I’m not sure. I don’t know what lm does internally. Another option is that you passed not the lower triangular matrix but the entire matrix. Another option is that the output is the similarity matrix but you ran the regression on the dissimilarity matrix (or vice versa).

As I said, the option is very advanced and might not be very stable when a lot of other things are done / tried.

Feel free to send me an email with your code and I could try to figure it out. Alternatively, for now perhaps use the output “other” and run it manually.

Best,
Martin

Ryuhei · June 25, 2020, 4:43am

Hi Martin,

Thanks for a kind reply!
I would like to send you an email with my code later.

Best regards,
Ryuhei

Ryuhei · July 1, 2020, 7:45am

Hi Martin,

I understand the reason why rsa_beta values were not consistent with the beta values estimated by lm-function in R.

As I mentioned, I originally set up the model in R as lm(neural ~ f1 + f2 + f3).
This model automatically includes the intercept term, which estimated beta values inconsistent with rsa_beta values.
Now, when I set up the model as lm(neural ~ f1 + f2 + f3 - 1), which means that the intercept term is not included, the output values were totally consistent with rsa_beta values.

I would be happy if I could know why rsa_beta values are calculated in the model without the intercept term. Is it usually recommended for RSA-GLM? Or is it for higher flexibility of the code?
I also would like to know how I can include the intercept term in TDT.

Best regards,
Ryuhei

Martin · July 1, 2020, 3:09pm

Hi Ryuhei,

I’m glad to hear you found the source of the difference! Ah, an intercept term is not included by default, that is correct. The reason is that the code is not only used for calculating betas, and sometimes users might not want to include an intercept. But indeed, it can make sense. In the future, we will make this more explicit in the documentation.

If you want to include an intercept, you just add another matrix that consists only of ones.

Hope that helps,
Martin

Ryuhei · July 2, 2020, 1:46am

Hi Martin,

Thank you for a quick reply!
I understand the reason why an intercept term is not included by default and how to include it.

I really appreciate your help!
Ryuhei

Ryuhei · April 8, 2021, 12:50am

Hi Martin,

It’s been a long time. Thanks to your great toolbox, I’m really enjoying decoding analyses.

I have an additional question on how I should set functions for the RSA-GLM using the dataset that I mentioned in the previous post (please see above my posts in Jun '20 for a detail task design).

Now, I am trying to construct stimulus-by-stimulus RDM (size = 80 x 80), instead of trial-by-trial RDM (size = 240 x 240). The design matrix has 6 sessions data with item-by-item event regressors (i.e., onset = presentation timing of the stimulus), which was generated in SPM.

In tdt, following template files, I passed SPM.mat into the beta_loc function. For other functions, please see above my previous posts in Jun '20. When I set 80 x 80 behavior RDMs (differences in ratings between stimuli) into the cfg.files.components, tdt returns error due to unmatched sizes between the neural and behavior matrices. It works when I set 240 x 240 behavior RDMs (differences in ratings between trials, instead of stimuli). So, for now, I can only get rsa_betas for trial-by-trial design. I’m wondering how I can get rsa_betas for stimulus-by-stimulus design (If I understand correctly, this procedure will return beta values averaged across runs; is this less sensitive to uninterested variances due to different sessions, as you mentioned in your post in Jun '20?). I guess that appropriate cfg.files.chunk setting would be a key…but I could not figure it out.

I look forward to hearing from you.
Thanks a lot!
Ryuhei

Martin · April 8, 2021, 9:29am

Hi Ryuhei,

Just use a model from SPM as input that consists of 80 conditions rather than 240. If you rerun the firstlevel model that way it should work.

Best,
Martin

Ryuhei · April 9, 2021, 12:53am

Hi Martin,

I appreciate your quick reply.
I’m using a model created by SPM that consists of 80 conditions rather than trial-by-trial 240. However, there are now 240 beta image files corresponding to each trial in each session (e.g., ‘beta_0001.nii’ indicating ‘Sn(1) stimulus_01’, ‘beta_0095.nii’ indicating ‘Sn(3) stimulus_01’, which I found in ‘regressor_names.m’). This seems to be because I set each item regressors in each session (‘Data&Design’ > ‘Subject/Session’ in SPM Specify 1st-level). Otherwise, the model does not include session regressors.
When I set this SPM.mat into beta_loc in TDT, it returns trial-by-trial neural RDM (size = 240 x 240), rather than item-by-item (size = 80 x 80).
Let me know if I’m doing a wrong procedure for item-by-item RSA.

I’m thinking about another way to achieve an item-by-item design analysis. Before using SPM.mat in TDT, I guess that I can create t-statistics map from each stimulus averaged across sessions using ImCalc function in SPM (please see also p. 4 ‘2.2 Representational Similarity Analysis’ in CosMoMVPA toolbox paper by Oosterhof et al. 2016; https://www.frontiersin.org/articles/10.3389/fninf.2016.00027/full; this is the very thing that I would like to try in a current model RSA). Then, I pass those T map images in TDT based on ‘decoding_template_nobetas.m’, which would calculate rsa_betas against manually passed T maps.

I would be very happy if you could give me some tips on that.

Best,
Ryuhei

Martin · April 9, 2021, 6:43am

Hi Ryuhei,

I think it’s not trial-by-trial but perhaps session-by-session given you have 3 sessions?

As you said, averaging stimuli across sessions would do the trick. You could use t-stats which amounts to univariate noise normalization. Alternatively, you could scale your averages betas using the noise covariance. But I’d probably first get the standard analysis running before playing around with such things.

Best of luck!
Martin

Ryuhei · April 9, 2021, 8:01am

Hi Martin,

Thanks for helpful comments.

I think it’s not trial-by-trial but perhaps session-by-session given you have 3 sessions?

As I said in the previous posts, my design is pretty tricky, but anyway, I have 6 sessions. So as you said, it would be a session-by-session RSM.

Following your advice, I’m going to try using t-stats as an averaged index. I think that I can achieve it by doing as follows: (1) create a design matrix (1st specify and estimate in SPM), (2) use Contrast Manager to get weighted maps for each item (code '1’s for one item regressor, and ‘0’ for the others), (3) run RSA-GLM in TDT using T map images generated by Contrast Manager.

Thanks a lot!
Ryuhei

Alex_Woolgar · January 20, 2022, 1:51pm

Hi Martin,

Thanks for this very helpful thread. I noticed that this lower triangle includes the diagonal - is there a reason for that? We’re using Spearman correlation within the dataRDM (no cross-validation) so including the diagonal tends to inflate the beta values (as we have perfect prediction there). Obviously it depends on what you do next with them! But I wondered if there is a good reason to include the diagonal that I might have missed?

Also, beginner question (sorry) - can the toolbox return other metrics from the RSA GLM, other than betas - e.g., model fit/total variance explained? I see the implementation in transres_rsa_beta is only calculating betas.

Many thanks,
Alex

Martin · January 20, 2022, 2:38pm

Hi Alex,

I’ll look into this. For classical RSA it is not indicated that it should be included (since it’s only a measure of reliability or in the worst case just adds a positive bias) but for regression it might. But I’ll look into this for sure. It’s been a while since I wrote the code.

I have several functions for correlations here that might be useful but I haven’t done unit testing on them so far. Happy for you to try them though.

Best,
Martin