Kind-of long time lurker, and finally a first time poster with a question I haven’t seen explicitly addressed (or at least not with the key words I’ve used).
Reaching out to ask for any advice on using representational similarity analysis (RSA) with RSA Toolbox 3.0. I have 4 conditions to be compared across all subjects, with conditions pulled from 2 task types. For a toy example to illustrate the problem, both of my tasks involve social and non-social conditions, but differ in terms of visual features (e.g., one task uses photos to represent sociality, the other uses avatars). This would leave four types of trial: task1_social (photos), task1_nonsocial (photos), task2_social (avatars), task2_nonsocial (avatars).
As I see it, there are two types of similarity here, one of which I am interested in assessing (social vs. non-social, across and within tasks) and one that I am interested in controlling for (photos vs. avatars, across tasks). Basically, I would like to determine when and how to “regress” task out, in order to examine my primary comparison of interest (in this example, social vs. non-social).
My current pipeline involves:
Dissimilarity matrix values generated using rsatoolbox.rdm.calc_rdm (method=correlation) on the raw, zstat values, computed by running the pairwise correlations between all four conditions (task1_social (photos), task1_nonsocial (photos), task2_social (avatars), task2_nonsocial (avatars))
> def create_rdm_dict(participant_ids, df, order):
> rdms_dict = {}
> for participant_id in participant_ids:
> wide_df = pivot_wide_format(df, participant_id)
> wide_df = wide_df[wide_df['Cond'].str.endswith('_diff')]
> wide_df['Cond'] = pd.Categorical(wide_df['Cond'], categories=order, ordered=True)
> wide_df = wide_df.sort_values('Cond').reset_index(drop=True)
> wide_array = wide_df.drop(columns='Cond').to_numpy()
> voxel_array = [i for i in range(1, wide_array.shape[1] + 1)]
> cond_array = wide_df['Cond'].to_numpy()
> data = rsatoolbox.data.Dataset(
> wide_array,
> channel_descriptors={'voxel': voxel_array},
> obs_descriptors={'condition': cond_array}
> )
> rdm = rsatoolbox.rdm.calc_rdm(data, method='correlation')
> rdms_dict[participant_id] = rdm
> return rdms_dict
Example output:
> {1001: rsatoolbox.rdm.RDMs(
> dissimilarity_measure =
> correlation
> dissimilarities =
> [[0.41419128 1.37773521 1.14586877 1.27610213 1.05142896 0.57960723]]
> descriptors =
> {}
> rdm_descriptors =
> {'index': [0]}
> pattern_descriptors =
> {'condition': array(['task1_social(photos)', 'task1_nonsocial(photos)', 'task2_social(avatars)', 'task2_nonsocial(avatars)'], dtype=object)}
- Regress out the task variance from the dissimilarity vectors
> def regress_out_task(rdm, regressor, conditions):
> dissimilarities = rdm.dissimilarities[0]
> num_conditions = len(conditions)
> dissimilarities_matrix = np.zeros((num_conditions, num_conditions))
> upper_triangle_indices = np.triu_indices(num_conditions, k=1)
> dissimilarities_matrix[upper_triangle_indices] = dissimilarities
> dissimilarities_vector = dissimilarities_matrix[upper_triangle_indices]
> model = LinearRegression()
> model.fit(regressor.reshape(-1, 1), dissimilarities_vector)
> residuals = dissimilarities_vector - model.predict(regressor.reshape(-1, 1))
> new_rdm_matrix = np.zeros_like(dissimilarities_matrix)
> new_rdm_matrix[upper_triangle_indices] = residuals
> new_rdm_matrix = new_rdm_matrix + new_rdm_matrix.T
> return new_rdm_matrix
>
> # Apply regression to all RDMs
> def apply_regression_to_rdms(rdms_dict, regressor, conditions):
> for participant_id in rdms_dict.keys():
> rdm = rdms_dict[participant_id]
> new_rdm_matrix = regress_out_task(rdm, regressor, conditions)
> rdms_dict[participant_id].dissimilarities[0] = new_rdm_matrix[np.triu_indices_from(new_rdm_matrix, k=1)]
> return rdms_dict
Example output (where s hits the fan)
> {1001: rsatoolbox.rdm.RDMs(
> dissimilarity_measure =
> correlation
> dissimilarities =
> [[-0.08270798 0.16495144 -0.066915 0.06331836 -0.16135481 0.08270798]]
> descriptors =
> {}
> rdm_descriptors =
> {'index': [0]}
> pattern_descriptors = {'condition': array('task1_social(photos)', 'task1_nonsocial(photos)', 'task2_social(avatars)', 'task2_nonsocial(avatars)', dtype=object)}
- Compare these task-regressed dissimilarity vectors/matrices to hypothetical RDMs for the psychological construct
> def compare_hypotheses(mean_rdm, h1, h2):
> """Compare the mean RDM (of all participants) to two hypotheses and return p-values."""
> # Flatten the upper triangle of the RDM and hypotheses
> mean_rdm_vector = mean_rdm[np.triu_indices_from(mean_rdm, k=1)]
> h1_vector = h1[np.triu_indices_from(h1, k=1)]
> h2_vector = h2[np.triu_indices_from(h2, k=1)]
>
> # Perform linear regression for H1
> model_h1 = LinearRegression()
> model_h1.fit(h1_vector.reshape(-1, 1), mean_rdm_vector)
> predicted_h1 = model_h1.predict(h3_vector.reshape(-1, 1))
>
> # Diagnostic: Inspect predictions and coefficients for H1
> print("Predicted H1:", predicted_h1)
> print("H1 Coefficient:", model_h1.coef_)
>
> _, p_val_h1 = wilcoxon(mean_rdm_vector, predicted_h1)
>
> # Perform linear regression for H2
> model_h2 = LinearRegression()
> model_h2.fit(h2_vector.reshape(-1, 1), mean_rdm_vector)
> predicted_h2 = model_h2.predict(h2_vector.reshape(-1, 1))
>
> # Inspect predictions and coefficients for H2
> print("Predicted H2:", predicted_h2)
> print("H2 Coefficient:", model_h2.coef_)
>
> _, p_val_h2 = wilcoxon(mean_rdm_vector, predicted_h2)
>
> return p_val_h1, p_val_h2
Step 3 results seem pointless to share here given the negative RDM values after step 2, but I can if it helps, and feedback on process is welcome!
As you can see, the current way that I am regressing out the task variable results in RDMs taking on negative values. I figure this is because task variance should be regressed prior to creating the RDM, and that the regression I am running pulls the beta values for the new rdms_dict (in step 2), when it should be pulling the residuals. These residuals should then be used as the DVs for my hypothetical RDMs. Can I pull the residual values from the regression on my raw data’s dissimilarity matrix, and then create a RDM of these residuals? Also toying with somehow using partial RDMs. Thank you in advance for any guidance!