First level analysis for continuous variable

SiebrenKuiper · March 5, 2026, 1:59pm

Hi everyone,

I’m new to fMRI analysis and I’m working on my first dataset. I’d like to test whether a continuous self-report score is associated with the BOLD signal.

I’m using Nilearn’s first-level GLM. The typical workflow uses make_first_level_design_matrix(), which expects an events table with a trial_type column to define conditions/contrasts. In my case, however, I don’t have discrete experimental conditions; instead, I have a continuous rating/score [per trial], and I want to model its relationship with the BOLD response.

From what I understand, this should not be treated as a parametric modulator. I found suggestions to use compute_regressor() to build such a regressor manually, but I’m not sure whether my implementation is correct or whether there is a more idiomatic way in Nilearn.

   func_img = image.load_img(func_file, dtype=np.float32)
    beh_data = pd.read_csv(beh_file, sep="\t")
    confound_data = pd.read_csv(confound_file, sep="\t")
    n_volumes = func_img.shape[-1]
    frame_times = np.arange(n_volumes)
    
    beh_data.rename(columns={"ConfTime": "onset"}, inplace=True)
    beh_data["duration"] = 0
    behavior_vars = ["ConfRating", "Correct", "Difficulty"]
    regressors_dict = {}
    for var_name in behavior_vars:      
        regressor, reg_names = glm.first_level.compute_regressor(
            exp_condition=(
                beh_data["onset"].values.astype(float),
                beh_data["duration"].values.astype(float),
                beh_data[var_name].values.astype(float),
            ),
            hrf_model="spm",
            frame_times=frame_times,
            oversampling=100
        )
        regressors_dict[var_name] = regressor.flatten()
    reg_df = pd.DataFrame(data=regressors_dict, index=frame_times)
    combined_regs = pd.concat([confound_data[confound_cols], reg_df], axis=1)
    combined_reg_names = confound_cols + behavior_vars
    design_matrix = glm.first_level.make_first_level_design_matrix(
        frame_times=frame_times,
        events=None,
        hrf_model="spm",
        oversampling=100,
        drift_model=None,
        add_regs=combined_regs,
        add_reg_names=combined_reg_names
    )

Remi-Gau · March 6, 2026, 7:12am

just to be sure sure: have you checked at this example that shows how to build a parametric modulation design matrix in nilearn:

SiebrenKuiper · March 6, 2026, 8:36am

Thank you for your reply. Before writing the script, I consulted the Nilearn tutorial on design matrices, as well as discussions within the neurostars community regarding parametric modulation.

As I understand it, parametric modulation is used to model how neural responses within a given experimental condition vary as a function of a continuous numerical variable (e.g., reward trials modulated by reward magnitude, relative to control trials). This approach is not well-suited for studies that focus exclusively on reward values without a corresponding non-reward or control condition.
In other words, I intend for my design matrix to use the parametric modulator as the primary independent variable, without including trial_type as a distinct regressor.

Remi-Gau · March 6, 2026, 10:16am

This approach is not well-suited for studies that focus exclusively on reward values without a corresponding non-reward or control condition.

That sounds strange to me. Can you elaborate?

Also you still using compute_regressor with beh_data[var_name] as modulation which is what make_first_level_design_matrix does under the hood anyway:

github.com/nilearn/nilearn

nilearn/glm/first_level/design_matrix.py

faf8812d4


      
          events_copy = check_events(events)
          cleaned_events = handle_modulation_of_duplicate_events(events_copy)
          
          trial_type = cleaned_events["trial_type"].to_numpy()
          onset = cleaned_events["onset"].to_numpy()
          duration = cleaned_events["duration"].to_numpy()
          modulation = cleaned_events["modulation"].to_numpy()
          
          for condition in np.unique(trial_type):
              condition_mask = trial_type == condition
              exp_condition = (
                  onset[condition_mask],
                  duration[condition_mask],
                  modulation[condition_mask],
              )
              reg, names = compute_regressor(
                  exp_condition,
                  hrf_model,
                  frame_times,
                  con_id=condition,
                  fir_delays=fir_delays,

SiebrenKuiper · March 6, 2026, 11:32am

My understanding of parametric modulation is that a continuous regulatory variable W modulates the BOLD signal intensity associated with a categorical independent variable A. For example, compared with a control condition, painful stimulation elicits stronger activation in the ACC—a typical effect driven by the categorical variable of experimental condition (pain vs. control). In this framework, parametric modulation can be illustrated by subjective pain ratings: incorporating participants’ self-reported pain scores helps better distinguish and refine the BOLD response to pain relative to the control condition (i.e., treatment × rating → BOLD).

However, my current focus is the main effect of pain rating itself (rating → BOLD) rather than treatment. This effect cannot be directly represented in the trial_type column of the design matrix, as trial_type is typically used for condition contrasts rather than continuous regression.

bthirion · March 6, 2026, 12:32pm

If you include a “rating” trial_type, that includes the continuous value as modulation, I think that you obtain the design you would like to have.
Sorry if I misunderstood.
HTH,
Bertrand

SiebrenKuiper · March 6, 2026, 1:50pm

This seems to be consistent with my approach using compute_regressor ()?

        regressor, reg_names = glm.first_level.compute_regressor(
            exp_condition=(
                beh_data["onset"].values.astype(float),
                beh_data["duration"].values.astype(float),
                beh_data[var_name].values.astype(float),
            ),
            hrf_model="spm",
            frame_times=frame_times,
            oversampling=100
        )

Remi-Gau · March 6, 2026, 4:41pm

can you send me one of the beh_file you are reading from ?

I think that your approach may model each event several times but it’d be easier if I don’t have to mock data to double check it.

SiebrenKuiper · March 7, 2026, 2:35pm

Thank you for your patient reply and selfless help. The behavioral data is attached.

beh_data.txt (1.2 KB)

SiebrenKuiper · March 10, 2026, 3:02am

beh_data.txt (1.2 KB)

Remi-Gau · March 10, 2026, 9:47am

thanks
let me have a look and get back to you

Remi-Gau · March 10, 2026, 10:37am

Here I am trying to use your code (slightly adjusted for the data) to generate a design matrix:

from nilearn import glm
from nilearn.plotting import plot_design_matrix, plot_design_matrix_correlation, show
import pandas as pd
import numpy as np
from pathlib import Path

beh_file = Path(__file__).parent / "beh_data.tsv"

# pretending get got a TR of 1 second
n_volumes = 410

frame_times = np.arange(n_volumes)

beh_data = pd.read_csv(beh_file, sep="\t")

behavior_vars = ["ConfRating", "Correct", "Difficulty"]
regressors_dict = {}
for var_name in behavior_vars:      
    regressor, reg_names = glm.first_level.compute_regressor(
        exp_condition=(
            beh_data["onset"].values.astype(float),
            beh_data["duration"].values.astype(float),
            beh_data[var_name].values.astype(float),
        ),
        hrf_model="spm",
        frame_times=frame_times,
        oversampling=100
    )
    regressors_dict[var_name] = regressor.flatten()

reg_df = pd.DataFrame(data=regressors_dict, index=frame_times)
combined_regs = reg_df
combined_reg_names = behavior_vars

design_matrix = glm.first_level.make_first_level_design_matrix(
    frame_times=frame_times,
    events=None,
    hrf_model="spm",
    oversampling=100,
    drift_model=None,
    add_regs=combined_regs,
    add_reg_names=combined_reg_names
)


plot_design_matrix(design_matrix)

plot_design_matrix_correlation(design_matrix)

show()

The figures generated

With your approach almost all events are modeled 3 times which leads to high correlation between your regressors which is not good.

If you are unclear why it is not good: have a look at the excellent series of videos (and articles) by Jeanette Mumford about this: https://www.youtube.com/watch?v=-2N5aCawArM&list=PLB2iAtgpI4YGoXjV-Ar_xuRIxvND9Q3WJ

Also not that your ‘correct’ column only contains 1 and 0 so if you use it as a parametric modulator you are just left with the trials that are correct (that have a 1 in the correct column): so you probably DO NOT want to use correct as a parametric modulation.

I don’t have much context to go by but my hunch would be to:

first divide your trials into 2 categories: correct and incorrect
see in general what correlation you have between your ConfRating and Difficulty: because if they are highly correlated in general you may be better of just using one of them as parametric modulator.

pinging @bthirion to double check that I am spouting complete here

bthirion · March 10, 2026, 6:05pm

I can just add that the correlation between ConfRating and Difficulty is intrinsic to the question you’re asking. It’s not a matter of implementation. Maybe centering these values could help a bit though.
HTH,
Bertrand

SiebrenKuiper · March 11, 2026, 1:28am

What if I only want to use Correc and Difficulty as control variables (like head movement), and explore the relationship between ConfRating and BOLD signals? The protocol you provided seem to be the difference in BOLD signals between correct and incorrect responses.

Remi-Gau · March 11, 2026, 11:56am

not quite

you can decide to put your correct and incorrect events in different regressors but still look at their average effect together


from nilearn import glm
from nilearn.plotting import plot_design_matrix, plot_design_matrix_correlation, show
import pandas as pd
import numpy as np
from pathlib import Path

beh_file = Path(__file__).parent / "beh_data.tsv"

# pretending get got a TR of 1 second
n_volumes = 410

frame_times = np.arange(n_volumes)

design_matrix = glm.first_level.make_first_level_design_matrix(
    frame_times=frame_times,
    events=Path(__file__).parent / "beh_data.tsv",
    hrf_model="spm",
    drift_model=None,
)

plot_design_matrix(design_matrix)

beh_data = pd.read_csv(beh_file, sep="\t")

beh_data["trial_type"] = "correct"
beh_data.loc[beh_data["Correct"] == 0, "trial_type"] = "incorrect"

design_matrix = glm.first_level.make_first_level_design_matrix(
    frame_times=frame_times,
    events=beh_data,
    hrf_model="spm",
    drift_model=None,
)

plot_design_matrix(design_matrix)

show()

You can use [1, 0] contrast on the following design where all your events (correct and incorrect are in one regressor)

And it will give you the same as a contrast [1, 1, 0] in this one where correct and incorrect are in different regressors

SiebrenKuiper · March 12, 2026, 1:51am

My research topic still focuses on the role of the continuous variable ConfRating and its interaction effect with Correct. I followed bthirion’s suggestion and centered ConfRating within each participant. Then I recoded Correct as -1 and 1, and calculated the interaction between the two. The specific code is as follows:

    beh_data.rename(columns={"RespTime": "onset"}, inplace=True)
    beh_data["Correct"] = 2 * beh_data["Correct"] - 1
    beh_data["Interaction"] = beh_data["Correct"] * beh_data["ConfRating"]
    beh_data["duration"] = 0
    behavior_vars = ["ConfRating", "Correct", "Interaction"]
    regressors_dict = {}
    for var_name in behavior_vars:      
        regressor, reg_names = glm.first_level.compute_regressor(
            exp_condition=(
                beh_data["onset"].values.astype(float),
                beh_data["duration"].values.astype(float),
                beh_data[var_name].values.astype(float),
            ),
            hrf_model="spm",
            frame_times=frame_times,
            oversampling=100
        )
        regressors_dict[var_name] = regressor.flatten()
    reg_df = pd.DataFrame(data=regressors_dict, index=frame_times)
    combined_regs = pd.concat([confound_data[confound_cols], reg_df], axis=1)
    combined_reg_names = confound_cols + behavior_vars
    design_matrix = glm.first_level.make_first_level_design_matrix(
        frame_times=frame_times,
        events=None,
        hrf_model=None,
        oversampling=100,
        drift_model=None,
        add_regs=combined_regs,
        add_reg_names=combined_reg_names
    )
    ###### skip some lines....
    fmri_glm = glm.first_level.FirstLevelModel(
        verbose=1,
        high_pass=0.01,
        noise_model='ar1',
        minimize_memory=True,
        n_jobs=n_cpu,
        mask_img=Mask,
    )
    fmri_glm_fit = fmri_glm.fit(
        run_imgs=func_img,
        design_matrices=design_matrix
    )

    z_map = fmri_glm_fit.compute_contrast(
        contrast_def="Interaction",
        stat_type="t",
        output_type="z_score"
    )
    z_map_filename = os.path.join(
        FirstLevel_path,
        f"sub-{sub_id}_zmap.nii.gz"
    )

The interaction term should reflect the ConfRating*Correct interaction effect.
Design is as follows: