Select files Node

Hello community,

I’m writing a pipeline using Nipype to preprocess MRI data (animal). My study paradigm is the following - single subject undergoes a set of tasks throughout a single MR session. Subject’s BIDS folder contains a single anatomical data and multiple functional scans of different tasks (task_id). In order to separate processing of the functional data for different “task_id” I iterate over them using “IdentityInterface” from nipype.interfaces.utility. But there is a caveat - for every iteration of the “task_id” for functional scans, pipeline repeats processing steps for anatomical data as well (even though it’s the same anatomical data for a subject). So one can imagine that the same anatomical T2 goes through the bias correction as many times as elements in “task_id” list.

For now my solution is to separate data sources by creating a separate nodes for anatomical and functional scans. But it’s far from being an elegant solution.

Is there a better way to avoid such a waste of computational time?

You probably want to split your anatomical and functional preprocessing into separate workflows. Something like:

inputnode = Node(IdentityInterface(fields=['anatomical_file', 'functional_file']), name='inputnode')
inputnode.iterables = ('functional_file', functional_files)

# Assume we have functions to generate subworkflows
anat_wf = init_anat_wf(...)
func_wf = init_func_wf(...)

workflow.connect([
    (inputnode, anat_wf, [('anatomical_file', 'inputnode.in_file')]),
    (inputnode, func_wf, [('functional_file', 'inputnode.in_file')]),
    (anat_wf, func_wf, [('outputnode.out_file', 'inputnode.anatomical_ref')]),
])

The only issue is that in nipype 1.x, there’s no way to dynamically create an iterable based on the output of a node. You might consider using pydra which can split over workflow-runtime outputs.


Note that I’m not 100% positive about the behavior of iterables. But if iterating over functional_file causes anat_wf to be iterated over, you can always split the sources into multiple IdentityInterfaces and only provide an iterable to one. e.g.,

anat_src = Node(IdentityInterface(fields=['anatomical_file']), name='anat_src')

func_src = Node(IdentityInterface(fields=['functional_file']), name='func_src')
func_src.iterables = ('functional_file', functional_files)

# Assume we have functions to generate subworkflows
anat_wf = init_anat_wf(...)
func_wf = init_func_wf(...)

workflow.connect([
    (anat_src, anat_wf, [('anatomical_file', 'inputnode.in_file')]),
    (func_src, func_wf, [('functional_file', 'inputnode.in_file')]),
    (anat_wf, func_wf, [('outputnode.out_file', 'inputnode.anatomical_ref')]),
])

Yet another option, which is what fMRIPrep does, is to actually specifically create a workflow for each BOLD file. (We do this because the metadata about each BOLD file can differ and cause us to construct the workflow differently.)

# Assume we have functions to generate subworkflows
anat_wf = init_anat_wf(...)

for functional_file in functional_files:
    func_wf = init_func_wf(...)

    workflow.connect([
        (anat_wf, func_wf, [('outputnode.out_file', 'inputnode.anatomical_ref')]),
    ])
1 Like

Thank you for your explanatory and well written reply. I’m also trying to implement the first option with separate workflows. The second solution you provided is what I use now - separate nodes for anatomical and functional scans. It does the job. The last option is good for human field but I think is a bit overkill for the animal one. We have around 16 repetitions of the same task (16 BOLD scans of a single ‘task_id’) and 4-8 tasks for a single study ~8-9 hours. I need to create a single study EPI template for co-registration from all of EPIs and them average BOLD within each ‘task_id’ group. So for me it’s easier to think in the ‘task_id’ way rather than individual BOLD way.
I was looking toward “pydra” for a while. On your opinion - how difficult is to transfer a pipeline from nipype to pydra and what benefits do you see as a key points for this matter?

There are a few significant advantages to Pydra:

  1. A single model for application of nodes across arrays of inputs that combines the advantages of MapNode (dynamic, can be based on the results of a previous node) and iterables (does not recombine the results of each sub-node immediately, reducing synchronization bottlenecks). The combination allows for moving a lot of dynamic logic out of workflow construction and into the workflow itself.
  2. Elimination of the Node/Interface distinction. In practice, this just led to excessive boilerplate.
  3. More “pythonic”. With Python types and dataclasses becoming a standard part of the language, the need for a custom typing package like traits is largely eliminated. We also have tooling to convert plain Python functions into Pydra tasks by decorating with @pydra.mark.task, eliminating a lot of boilerplate for pure Python tasks.

The main disadvantage is that it does not have over a decade of accumulated interfaces that you can drop into your workflows. While people are working on task packages (see pydra-* repositories in nipype · GitHub), it will take time. As a shortcut, I did write a wrapper that will convert a Nipype 1 interface into a Pydra task: GitHub - nipype/pydra-nipype1: Tools for importing nipype1 interfaces into Pydra

The main disadvantage to nipype 1 is that there is no funded development effort there. It is still usable, but the majority of future work will be in Pydra.

1 Like

Got it, thank you! Will try to move towards pydra.