Adapt nipype first level analysis in a way that it also can be run locally via SPM

I downloaded the hcp data set and want to run a first-level analysis using SPM on it (to create beta-images). I know that I theoretically can only rely on the /tmp files that are created during the workflow execution (which are deleted after the workflow ran), however I would like to tweak the pipeline in such a way that the SPM.mat file and the corresponding images are saved locally and that other users who don’t want to use nipype can simply run the estimation using their SPM bare-metal installation, the respective NIFTI files and the SPM.mat file. For this, all my file paths have to point to paths on my local machine and not to the temporarily saved tmp/ files. Is there any way how to save the SPM.mat file in such a way that all its settings point to my output file:

/output/952863/first_level_pipe_nback/952863.nii?

Here’s my data structure (only one dummy subject). People that have experience with the HCP dataset should be familiar with this (I show only the showing the folders, not the files):

\DATA
└───952863
    ├───.xdlm
    │       952863_3T_tfMRI_WM_preproc.json
    │
    ├───MNINonLinear
    │   └───Results
    │       ├───tfMRI_WM
    │       │       prepare_level2_feat_analysis.sh
    │       │       tfMRI_WM_hp200_s4_level2.fsf
    │       │
    │       ├───tfMRI_WM_LR
    │       │   │   brainmask_fs.2.nii.gz
    │       │   │   Movement_AbsoluteRMS.txt
    │       │   │   Movement_AbsoluteRMS_mean.txt
    │       │   │   Movement_Regressors.txt
    │       │   │   Movement_Regressors_dt.txt
    │       │   │   Movement_RelativeRMS.txt
    │       │   │   Movement_RelativeRMS_mean.txt
    │       │   │   PhaseOne_gdc_dc.nii.gz
    │       │   │   PhaseTwo_gdc_dc.nii.gz
    │       │   │   REC_run2_TAB.txt
    │       │   │   SBRef_dc.nii.gz
    │       │   │   tfMRI_WM_LR.L.native.func.gii
    │       │   │   tfMRI_WM_LR.nii.gz
    │       │   │   tfMRI_WM_LR.R.native.func.gii
    │       │   │   tfMRI_WM_LR_Atlas.dtseries.nii
    │       │   │   tfMRI_WM_LR_Atlas_MSMAll.dtseries.nii
    │       │   │   tfMRI_WM_LR_hp200_s4_level1.fsf
    │       │   │   tfMRI_WM_LR_Jacobian.nii.gz
    │       │   │   tfMRI_WM_LR_Physio_log.txt
    │       │   │   tfMRI_WM_LR_SBRef.nii.gz
    │       │   │   WM_run2_TAB.txt
    │       │   │
    │       │   ├───EVs
    │       │   │       0bk_body.txt
    │       │   │       0bk_cor.txt
    │       │   │       0bk_err.txt
    │       │   │       0bk_faces.txt
    │       │   │       0bk_nlr.txt
    │       │   │       0bk_places.txt
    │       │   │       0bk_tools.txt
    │       │   │       2bk_body.txt
    │       │   │       2bk_cor.txt
    │       │   │       2bk_err.txt
    │       │   │       2bk_faces.txt
    │       │   │       2bk_nlr.txt
    │       │   │       2bk_places.txt
    │       │   │       2bk_tools.txt
    │       │   │       all_bk_cor.txt
    │       │   │       all_bk_err.txt
    │       │   │       Sync.txt
    │       │   │       WM_Stats.csv
    │       │   │
    │       │   └───RibbonVolumeToSurfaceMapping
    │       │           goodvoxels.nii.gz
    │       │
    │       └───tfMRI_WM_RL
    │           │   brainmask_fs.2.nii.gz
    │           │   Movement_AbsoluteRMS.txt
    │           │   Movement_AbsoluteRMS_mean.txt
    │           │   Movement_Regressors.txt
    │           │   Movement_Regressors_dt.txt
    │           │   Movement_RelativeRMS.txt
    │           │   Movement_RelativeRMS_mean.txt
    │           │   PhaseOne_gdc_dc.nii.gz
    │           │   PhaseTwo_gdc_dc.nii.gz
    │           │   REC_run1_TAB.txt
    │           │   SBRef_dc.nii.gz
    │           │   tfMRI_WM_RL.L.native.func.gii
    │           │   tfMRI_WM_RL.nii.gz
    │           │   tfMRI_WM_RL.R.native.func.gii
    │           │   tfMRI_WM_RL_Atlas.dtseries.nii
    │           │   tfMRI_WM_RL_Atlas_MSMAll.dtseries.nii
    │           │   tfMRI_WM_RL_hp200_s4_level1.fsf
    │           │   tfMRI_WM_RL_Jacobian.nii.gz
    │           │   tfMRI_WM_RL_Physio_log.txt
    │           │   tfMRI_WM_RL_SBRef.nii.gz
    │           │   WM_run1_TAB.txt
    │           │
    │           ├───EVs
    │           │       0bk_body.txt
    │           │       0bk_cor.txt
    │           │       0bk_err.txt
    │           │       0bk_faces.txt
    │           │       0bk_nlr.txt
    │           │       0bk_places.txt
    │           │       0bk_tools.txt
    │           │       2bk_body.txt
    │           │       2bk_cor.txt
    │           │       2bk_err.txt
    │           │       2bk_faces.txt
    │           │       2bk_nlr.txt
    │           │       2bk_places.txt
    │           │       2bk_tools.txt
    │           │       all_bk_cor.txt
    │           │       all_bk_err.txt
    │           │       Sync.txt
    │           │       WM_Stats.csv
    │           │
    │           └───RibbonVolumeToSurfaceMapping
    │                   goodvoxels.nii.gz
    │
    ├───release-notes
    │       tfMRI_WM_preproc.txt
    │
    └───T1w
        └───Results
            ├───tfMRI_WM_LR
            │       PhaseOne_gdc_dc.nii.gz
            │       PhaseTwo_gdc_dc.nii.gz
            │       SBRef_dc.nii.gz
            │
            └───tfMRI_WM_RL
                    PhaseOne_gdc_dc.nii.gz
                    PhaseTwo_gdc_dc.nii.gz
                    SBRef_dc.nii.gz

And here’s the code I have so far (strongly adapted from Michael Notter’s nipype-tutorial, and mostly dummy code that only serves the purpose of explaining the problem):

Code

import pandas as pd
from nipype.interfaces.utility import Rename
from nipype.interfaces.base import Bunch
from nipype.interfaces.io import SelectFiles, DataSink
from nipype import Workflow,Node,MapNode
from nipype.interfaces.utility import Function, IdentityInterface
from nipype.algorithms.misc import Gunzip
from os.path import join as opj

from nipype.interfaces.spm import Level1Design, EstimateModel, EstimateContrast
from nipype.algorithms.modelgen import SpecifySPMModel

# set data directory
data_dir = '/data/'

# take only this one subject
subject_list = ['952863']

# create a node that iterates over the subject(s)
subject_iterator = Node(IdentityInterface(fields=["subject_id"]),name="subject_iterator")
subject_iterator.iterables = [("subject_id",subject_list)]

# create a node that selects only the LR-n-back NIFTI file based on a template
templates = {'func':'{subject_id}/MNINonLinear/Results/tfMRI_WM_LR/tfMRI_WM_LR.nii.gz'}
selectfiles = Node(SelectFiles(templates,base_directory=data_dir),name='selectfiles')

# create a function that graps the event files from a run and converts them to a Bunch object
# NOTE! We have to import Bunch and pandas again because in nipype functions are closed environments
def get_nback_task_info(subject_id):
    
    import pandas as pd
    from nipype.interfaces.base import Bunch

    # create a path to the directory where the event files are stored based on the input subject id
    subject_dir = f"/data/{subject_id}/MNINonLinear/Results/tfMRI_WM_LR/EVs/"

    # set up the names for all different event files
    ev_txt_list = ['0bk_faces.txt','0bk_places.txt','0bk_body.txt','0bk_tools.txt',
                   '2bk_faces.txt','2bk_places.txt','2bk_body.txt','2bk_tools.txt']

    # provide a list of names for all different event files
    ev_names = ['0bk_faces','0bk_places','0bk_body','0bk_tools',
                '2bk_faces','2bk_places','2bk_body','2bk_tools']

    # initialize an empty list where the upcoming data frames will be stored
    ev_dfs_list = []

    for idx,ev_txt in enumerate(ev_txt_list):

        # read in the .txt file as pandas data frame
        ev_df = pd.read_table(subject_dir + ev_txt_list[idx],names=['onset','duration','amplitude'])

        # add a column 'trial_type' to give a description
        ev_df['trial_type'] = ev_names[idx]

        # add df to event list
        ev_dfs_list.append(ev_df)

    # concatenate all dfs
    ev_df = pd.concat(ev_dfs_list,axis=0)
    
    # the ouput of our function will be passed over to SpecifyModel 
    # (https://nipype.readthedocs.io/en/1.6.0/api/generated/nipype.algorithms.modelgen.html#specifymodel)
    # This class requires the input to be a Bunch-Object so we have to convert the dataframe to Bunch-Type

    conditions = []
    onsets = []
    durations = []

    for name,group in ev_df.groupby('trial_type'):
        conditions.append(name)
        onsets.append(group['onset'].tolist()) 
        durations.append(group['duration'].tolist())

    # the bunch object can contain more columns such as regressors (e.g. motion regressors?) that can be used to specfiy the model
    nback_info = [Bunch(conditions=conditions,
                          onsets=onsets,
                          durations=durations,
                          #amplitudes=None,
                          #tmod=None,
                          #pmod=None,
                          #regressor_names=None,
                          #regressors=None
                         )]
    
    return nback_info


# define a Node that takes the function above as input
nback_info = Node(Function(input_names=['subject_id'],
                          output_names=['nback_info'],
                          function=get_nback_task_info),
                  name='nback_info')

# create a node that unzips the original file (this is necessary for SPM to run)
gunzip = Node(Gunzip(),name='gunzip')

# create another node that dynamically changes the output filename based on the subject id
rename = Node(Rename(format_string="%(subject_id)s.nii"),name='rename_func_files')

# define a DataSink node where the (manipulated) files should be stored
# NOTE: The boolean parameterization = False ensures that the files are directly saved
# in the datasink folder. Otherwise for each subject there would be another folder _subject_123 created.
datasink = Node(DataSink(base_directory='/output',parameterization=False),name="datasink")

# Define SPM First Level Model

# SpecifyModel - Generates SPM-specific Model
modelspec = Node(SpecifySPMModel(concatenate_runs=False,
                                 input_units='secs',
                                 output_units='secs',
                                 time_repetition=0.720,
                                 high_pass_filter_cutoff=200),
                 name="modelspec")

# Level1Design - Generates an SPM design matrix
level1design = Node(Level1Design(bases={'hrf': {'derivs': [1, 0]}},
                                 timing_units='secs',
                                 interscan_interval=0.720,
                                 model_serial_correlations='FAST'),
                    name="level1design")

# define workflow
wf = Workflow(name='spm_first_level_analysis_nback')

# pass the subject id over to selectfiles to select one file for that subject
wf.connect(subject_iterator,'subject_id',selectfiles,'subject_id')

# pass the subject id over to nback info to get the task information for that subject
wf.connect(subject_iterator,'subject_id',nback_info,'subject_id')

# pass the nback info to the modelspec node
wf.connect(nback_info,'nback_info',modelspec,'subject_info')

# pass the session info that is generated by SpecifySPM Model over to level-1 design node
wf.connect(modelspec,'session_info',level1design,'session_info')

# pass the file over to the Unzip Node to unzip it
# then pass the unzipped file over to the rename node
# also pass the subject id over to the rename node so that it knows how to rename the file using the subject_id
wf.connect(selectfiles,'func',gunzip,'in_file')
wf.connect(gunzip,'out_file',rename,'in_file')
wf.connect(subject_iterator,'subject_id',rename,'subject_id')

# pass the unzipped, renamed file over to modelspec
wf.connect(rename,'out_file',modelspec,'functional_runs')

# # Pass the unzipped, renamed file and the SPM.mat file to the datasink
# NOTE: Theoretically one doesn't even need to take this 'indirect' path from the datasink to the model
# you could also directly pass the file to the SPM model. But I need a way 
# that other users who don't want to use nipype can run the SPM.mat files, so we are saving
# a hard copy of the files in each subject's directory. Also 
wf.connect(subject_iterator,'subject_id',datasink,'container')
wf.connect(rename,'out_file',datasink,'first_level_pipe_nback.@unzipped_renamed_file')
wf.connect(level1design,'spm_mat_file',datasink,'first_level_pipe_nback.@spm_mat_file')

# Run the workflow
wf_results = wf.run()

Here’s the output graph

# Create 1st-level analysis output graph
wf.write_graph(graph2use='colored', format='png', simple_form=True)
from IPython.display import Image
Image(filename='/home/neuro/nipype_tutorial/graph.png')

Have a look at the output

This looks like I want it to be.

\OUTPUT
└───952863
    └───first_level_pipe_nback
            952863.nii
            SPM.mat

Problem

When having a look at the second node nback_info.a0, one can see why this can’t be run by SPM locally, because SPM of course does not have access to a file 'scans': '/tmp/tmpgim1n61i/spm_first_level_analysis_nback/_subject_id_955465/modelspec/955465.nii'that does not exist anymore:

node_list = list(wf_results.nodes())
node_list
[spm_first_level_analysis_nback.selectfiles.a0,
spm_first_level_analysis_nback.nback_info.a0,
spm_first_level_analysis_nback.modelspec.a0,
spm_first_level_analysis_nback.level1design.a0,
spm_first_level_analysis_nback.gunzip.a0,
spm_first_level_analysis_nback.rename_func_files.a0,
spm_first_level_analysis_nback.datasink.a0]
node_list[2].result.outputs
session_info = [{'cond': [{'name': '0bk_body', 'onset': [36.145], 'duration': [27.5]}, {'name': '0bk_faces', 'onset': [221.975], 'duration': [27.5]}, {'name': '0bk_places', 'onset': [250.257], 'duration': [27.5]}, {'name': '0bk_tools', 'onset': [107.48899999999999], 'duration': [27.5]}, {'name': '2bk_body', 'onset': [150.565], 'duration': [27.5]}, {'name': '2bk_faces', 'onset': [79.26100000000001], 'duration': [27.5]}, {'name': '2bk_places', 'onset': [178.7], 'duration': [27.5]}, {'name': '2bk_tools', 'onset': [7.997000000000001], 'duration': [27.5]}], 'hpf': 200.0, 'regress': [], 'scans': '/tmp/tmpgim1n61i/spm_first_level_analysis_nback/_subject_id_955465/modelspec/955465.nii'}]

It seems that the only way to create an SPM.mat file via nipype that is also locally executable is to post-hoc change its SPM.xY.P attribute (and/or possibly other fields)?

Answering my own question here (therefore cannot guarantee that the following is completely right): It is possible to save all the files that are created during the workflow on your local hard drive, for that you have to define a base_dir for your workflow:

For that, have a look at this jupyter notebook from Michael Notter’s nipype tutorial where it says:

The specification of base_dir is very important (and is why we needed to use absolute paths above) because otherwise all the outputs would be saved somewhere in the temporary files. Unlike interfaces, which by default spit out results to the local directly, the Workflow engine executes things off in its own directory hierarchy.

However, as long as you’re using Docker or Singularity and mount your local directories to directories within the containers, you will always end up with paths in your SPM.mat file(s) that point to directories within the containers. So the only solution to solve this would be to either not use Singularity or Docker but a bare-metal installation of SPM + Nipype (see this post for more information on that) or (but I cannot guaranteee that I oversaw something here) to use Docker or Singularity and post-hoc tweak the attributes such as SPM.xY.P (and possibly more?..)