RDM with task regression residuals with RSA Toolbox 3.0?

Kind-of long time lurker, and finally a first time poster with a question I haven’t seen explicitly addressed (or at least not with the key words I’ve used). :smile:

Reaching out to ask for any advice on using representational similarity analysis (RSA) with RSA Toolbox 3.0. I have 4 conditions to be compared across all subjects, with conditions pulled from 2 task types. For a toy example to illustrate the problem, both of my tasks involve social and non-social conditions, but differ in terms of visual features (e.g., one task uses photos to represent sociality, the other uses avatars). This would leave four types of trial: task1_social (photos), task1_nonsocial (photos), task2_social (avatars), task2_nonsocial (avatars).

As I see it, there are two types of similarity here, one of which I am interested in assessing (social vs. non-social, across and within tasks) and one that I am interested in controlling for (photos vs. avatars, across tasks). Basically, I would like to determine when and how to “regress” task out, in order to examine my primary comparison of interest (in this example, social vs. non-social).

My current pipeline involves:
Dissimilarity matrix values generated using rsatoolbox.rdm.calc_rdm (method=correlation) on the raw, zstat values, computed by running the pairwise correlations between all four conditions (task1_social (photos), task1_nonsocial (photos), task2_social (avatars), task2_nonsocial (avatars))

> def create_rdm_dict(participant_ids, df, order):
>     rdms_dict = {}
>     for participant_id in participant_ids:
>         wide_df = pivot_wide_format(df, participant_id)
>         wide_df = wide_df[wide_df['Cond'].str.endswith('_diff')]
>         wide_df['Cond'] = pd.Categorical(wide_df['Cond'], categories=order, ordered=True)
>         wide_df = wide_df.sort_values('Cond').reset_index(drop=True)
>         wide_array = wide_df.drop(columns='Cond').to_numpy()
>         voxel_array = [i for i in range(1, wide_array.shape[1] + 1)]
>         cond_array = wide_df['Cond'].to_numpy()
>         data = rsatoolbox.data.Dataset(
>             wide_array,
>             channel_descriptors={'voxel': voxel_array},
>             obs_descriptors={'condition': cond_array}
>         )
>         rdm = rsatoolbox.rdm.calc_rdm(data, method='correlation')
>         rdms_dict[participant_id] = rdm
>     return rdms_dict

Example output:

> {1001: rsatoolbox.rdm.RDMs(
> dissimilarity_measure = 
> correlation
> dissimilarities = 
> [[0.41419128 1.37773521 1.14586877 1.27610213 1.05142896 0.57960723]]
> descriptors = 
> {}
> rdm_descriptors = 
> {'index': [0]}
> pattern_descriptors = 
> {'condition': array(['task1_social(photos)', 'task1_nonsocial(photos)', 'task2_social(avatars)', 'task2_nonsocial(avatars)'], dtype=object)}
  1. Regress out the task variance from the dissimilarity vectors
> def regress_out_task(rdm, regressor, conditions):
>     dissimilarities = rdm.dissimilarities[0]
>     num_conditions = len(conditions)
>     dissimilarities_matrix = np.zeros((num_conditions, num_conditions))
>     upper_triangle_indices = np.triu_indices(num_conditions, k=1)
>     dissimilarities_matrix[upper_triangle_indices] = dissimilarities
>     dissimilarities_vector = dissimilarities_matrix[upper_triangle_indices]
>     model = LinearRegression()
>     model.fit(regressor.reshape(-1, 1), dissimilarities_vector)
>     residuals = dissimilarities_vector - model.predict(regressor.reshape(-1, 1))
>     new_rdm_matrix = np.zeros_like(dissimilarities_matrix)
>     new_rdm_matrix[upper_triangle_indices] = residuals
>     new_rdm_matrix = new_rdm_matrix + new_rdm_matrix.T
>     return new_rdm_matrix
> 
> # Apply regression to all RDMs
> def apply_regression_to_rdms(rdms_dict, regressor, conditions):
>     for participant_id in rdms_dict.keys():
>         rdm = rdms_dict[participant_id]
>         new_rdm_matrix = regress_out_task(rdm, regressor, conditions)
>         rdms_dict[participant_id].dissimilarities[0] = new_rdm_matrix[np.triu_indices_from(new_rdm_matrix, k=1)]
>     return rdms_dict

Example output (where s hits the fan)

> {1001: rsatoolbox.rdm.RDMs(
> dissimilarity_measure = 
> correlation
> dissimilarities = 
> [[-0.08270798  0.16495144 -0.066915    0.06331836 -0.16135481  0.08270798]]
> descriptors = 
> {}
> rdm_descriptors = 
> {'index': [0]}
> pattern_descriptors = {'condition': array('task1_social(photos)', 'task1_nonsocial(photos)', 'task2_social(avatars)', 'task2_nonsocial(avatars)', dtype=object)}
  1. Compare these task-regressed dissimilarity vectors/matrices to hypothetical RDMs for the psychological construct
> def compare_hypotheses(mean_rdm, h1, h2):
>     """Compare the mean RDM (of all participants) to two hypotheses and return p-values."""
>     # Flatten the upper triangle of the RDM and hypotheses
>     mean_rdm_vector = mean_rdm[np.triu_indices_from(mean_rdm, k=1)]
>     h1_vector = h1[np.triu_indices_from(h1, k=1)]
>     h2_vector = h2[np.triu_indices_from(h2, k=1)]
>     
>     # Perform linear regression for H1
>     model_h1 = LinearRegression()
>     model_h1.fit(h1_vector.reshape(-1, 1), mean_rdm_vector)
>     predicted_h1 = model_h1.predict(h3_vector.reshape(-1, 1))
>     
>     # Diagnostic: Inspect predictions and coefficients for H1
>     print("Predicted H1:", predicted_h1)
>     print("H1 Coefficient:", model_h1.coef_)
> 
>     _, p_val_h1 = wilcoxon(mean_rdm_vector, predicted_h1)
>     
>     # Perform linear regression for H2
>     model_h2 = LinearRegression()
>     model_h2.fit(h2_vector.reshape(-1, 1), mean_rdm_vector)
>     predicted_h2 = model_h2.predict(h2_vector.reshape(-1, 1))
>     
>     # Inspect predictions and coefficients for H2
>     print("Predicted H2:", predicted_h2)
>     print("H2 Coefficient:", model_h2.coef_)
> 
>     _, p_val_h2 = wilcoxon(mean_rdm_vector, predicted_h2)
>     
>     return p_val_h1, p_val_h2

Step 3 results seem pointless to share here given the negative RDM values after step 2, but I can if it helps, and feedback on process is welcome!

As you can see, the current way that I am regressing out the task variable results in RDMs taking on negative values. I figure this is because task variance should be regressed prior to creating the RDM, and that the regression I am running pulls the beta values for the new rdms_dict (in step 2), when it should be pulling the residuals. These residuals should then be used as the DVs for my hypothetical RDMs. Can I pull the residual values from the regression on my raw data’s dissimilarity matrix, and then create a RDM of these residuals? Also toying with somehow using partial RDMs. Thank you in advance for any guidance!