Nipype recalculates preprocessing

I have a workflow that contains preprocessing and level1 analysis, based loosely on the fmri_fsl_reuse.py example. I have noticed while debugging an issue with the level1 analysis, that the preprocessing is also re-calculated. This is an issue because my preprocessing workflow is very computationally intensive (includes slice-by-slice motion registration).

I have the following settings:

from nipype import config
config.enable_debug_mode()

# redundant with above...
level1_workflow.stop_on_first_crash = True
level1_workflow.remove_unnecessary_outputs = False
level1_workflow.keep_inputs = True
level1_workflow.hash_method = 'content'

level1_workflow.write_graph()
level1_workflow.run()

Where level1_workflow includes preprocessing?

Is it typical for the early parts of a NiPype workflow to have to be re-calculated? Does anyone have hints on how to prevent the workflow from recalculating the preprocessing? Should I run preprocessing separately from the level1 analysis?

@Jonathan_R_Williford
The preprocessing is imported within that script.

A little info on how nipype handles rerunning:

Nipype will always search the node’s working directory when rerunning. If this isn’t found, the node will fully run through. If it is found, the node has a hash value for each input provided - by default, this hash is made from the file’s size and modification time. This method is much quicker to compute than checking the full file content, and generally will catch any changes.

If every calculation performed during a workflow is handled through nodes, the default hashing isn’t likely to run into problems. However, one of nipype’s features is connection string functions (like this one), which pass data from a node, to a helper function, to another node. These helper functions avoid the hash checking that node’s go through, so they are always rerun at runtime. If the helper functions are just selecting the nth index of a list, rerunning the workflow will always return the same file. But, if a helper function is writing out data to a file (as we saw in this post), the modification time of the file will change, causing nipype to treat the newly produced file as changed inputs. This can be avoided by either A) assigning the connection string function to a node or B) setting the hash method config setting to content. I recommend the former, since depending on the size of files you are working with, hashing may take quite a bit.

@mgxd Thank you very much for your reply.

I do have a string helper function that might have caused issues. I did turn on the content hash method, because the cost of hashing should be much less than computation costs (a 400 volume image took about 4 hours for preprocessing :slight_smile:). For now, I’m planning to run the model fitting separately while debugging.

@mgxd, can you provide latest links for the ones below?

However, one of nipype’s features is connection string functions (like this one )

setting the hash method config

Both of them appear broken at my end.

sure

example of helper function within connection:

Nipype config options:
https://miykael.github.io/nipype_tutorial/notebooks/basic_execution_configuration.html

1 Like

Unfortunately, I am still unable to run 1st_level analysis using the already created preproc outputs. Just like @Jonathan_R_Williford noted, preproc happens again for 1st_level analysis. I already opened a similar issue on nipype GitHub, but didn’t get any support yet.

To reiterate, see the symbolic workflows below:

wf1= WorkFlow(name='preproc', base_dir='/tmp') # working dir becomes /tmp/preproc
wf1.connect([(node1, node2, [('output', 'input')]),
             (node2, node3, [('output', 'input')])
             ])
wf1.run()

wf2= WorkFlow(name='1st_level', base_dir='/tmp') # working dir become /tmp/1st_level
wf2.config['execution']= {'hash_method': 'content'}
wf2.connect([(wf1,node4, [(node2.output, 'input')])])
wf2.run()

Now looking at @mgxd’s explanation:

Nipype will always search the node’s working directory when rerunning. If this isn’t found, the node will fully run through. If it is found, the node has a hash value for each input provided

It’s may be because working directory for wf2 and wf1 being different. Since wf2 doesn’t have any hash files known, it is bound to redo wf1.

Any more advise for me?

@Jonathan_R_Williford, do you have your workflow publicly available so I can take a look at it to understand how you solved this problem?