Ipyparallel plugin not being able to distribute over multiple machines

Hi, I want to run Nipype in a distributed fashion on our cluster using ipyparallel.

I can run the processing in parallel on a single machine using multiproc, ipcluster or ipcontroller + ipengines. Now, whenever I try to run the engines in a distributed manner on different machines, using ipcontroller on a host machine and ipengines on different machines it fails. The engines are able to successfully register with the controller. Also my DataGrabber part starts out on all machines. But apparently my workflow nodes cannot find the temporary files which the previous nodes output… They seem to be in a /tmp directory i.e. locally on one of the machines. I don’t understand why I cannot change the directories (I tried the base_dir) nor why the different steps are not done on the same machine. I.e. first_step[file1]->second_step[file1]->last_step[file1] should all be on the same machine, right? But then they should have access to the same /tmp directory.

FileNotFoundError: [Errno 2] No such file or directory: ‘/tmp/tmpsv83oyqc/preprocessing/_subject_id_ADNI_136_S_0426/DataGrabber/result_DataGrabber.pklz’

Any help will be appreciated!

To be clear, your working directory is on a shared filesystem that all ipengines have access to using the same path? And you are setting that path in the workflow.base_dir?

from nipype.pipeline import engine as pe
wf = pe.Workflow(base_dir='/shared_fs/nipype_scratch')
...
wf.run(...)
1 Like

Thank you! I had only set a base directory for each node, rather than the workflow itself. Setting it in the workflow did the trick :slight_smile:

1 Like

Great! Yeah, base_dir only has an effect on things that are directly run. So setting it for a Node makes sense, if you’re going to run that Node as a one-off. But if you run a workflow, it will create its own base_dir (in a temporary directory, by default), and then reset the base_dirs of all nodes as subdirectories to match the hierarchy of the workflow graph itself.

Ah that makes sense. Good to know!