I am interested in using nipype to set up my analysis pipeline and datalad, especially datalad run
for provenance tracking (and version control). However, I’m not sure about how to go about this and I would appreciate your opinions on this. If someone has tried (successfully or not) to do this, please share your experience too
As far as I understand it, for datalad run
to work well (especially on a cluster with parallel execution) one has to specify the input files & output files. I think there are two conceptually different ways of how it might be possible to achieve this
Option A: Wrap all nipype calls with a datalad run
, providing the mapping to inputs/outputs “manually”
Option B: Have nipype call datalad run
by leveraging the internals to pass on inputs/outputs. This option could be implemented via a plugin-like framework (analogous to the execution plugins) for provenance tracking.
Option A seems to me to have the advantage that I can do that right away, without having to change anything about neither nipype nor datalad. But I might need to create a (potentially not so) small set of scripts to help with the inputs/outputs for datalad run
.
However, option B seems superior to me. It would leverage what is there to the highest degree (book keeping done in nipype), while hiding the actual provenance tracking from the “end user” while it gets handled by datalad. Is there any issue I am not seeing which would fundamentally block implementing this?
And the next question goes to the devs of datalad & nipype: have you guys considered other options? Are they more promising? Any avenues that should be avoided (also for reasons like “upcoming changes”)?