Summary of what happened:
I have a pandas DataFrame containing thousands of features related to functional connectivity between pairs of brain regions. I want to perform a statistical test to assess the relationship between these features and the age variable (continuous outcome). Could the Permuted OLS method from the Nilearn library be used in this situation? Or does it require any specific data formatting to be applied correctly?
If this method is not suitable for this analysis, what would be the most appropriate statistical approach to infer the relationship between functional connectivity and age, considering the large number of features?
Command used (and if a helper script was used, a link to the helper script or the command generated)
Example for visualization:
Suppose I have a DataFrame where each row represents a subject and each column represents a measure of connectivity between two brain regions. Additionally, I have a column with the age of each subject:
import pandas as pd
import numpy as np
from nilearn.mass_univariate import permuted_olsCreating a fictional DataFrame with functional connectivity and age
np.random.seed(42)
data = {
“subject_id”: [f"sub_{i}" for i in range(1, 6)],
“age”: np.random.randint(20, 70, size=5), # Y: Age
“PFC_Amygdala”: np.random.rand(5), # X: Connectivity
“PFC_Hippocampus”: np.random.rand(5), # X: Connectivity
“PFC_Thalamus”: np.random.rand(5), # X: Connectivity
“Amygdala_Hippocampus”: np.random.rand(5), # X: Connectivity
“Amygdala_Thalamus”: np.random.rand(5) # X: Connectivity
}df = pd.DataFrame(data)
# Defining X as the functional connectivity columns and Y as the age variable
X = df.drop(columns=[“subject_id”, “age”]) # X: Connectivity
y = df[“age”] # Y: Aget_stats, p_values, _ = permuted_ols(y, X, n_perm=5000)
Given this format, would it be possible to apply Permuted OLS from Nilearn (nilearn.mass_univariate.permuted_ols - Nilearn) to test the relationship between age and the functional connectivity measures? If not, which statistical method would be more appropriate for this analysis, considering the high dimensionality of the data?
Environment (Docker, Singularity / Apptainer, custom installation):
Python