Nipype exponential numbers of join nodes

Gregory_Ciccarelli · December 3, 2016, 5:02pm

If I have leave one subject out iterable, I have a join node that grabs the results and stacks them. Great.

Now, let’s say I have leave one subject out iterable, and that goes to two different machine learning algs (extra trees and ridge regression). Then, I have two join nodes, joint_et and join_ridge, that grabs the results. Great.

Now, let’s say I have two feature selection methods before going into the machine learning algs.
FS1: takes all features
FS2: takes top correlating features

Pipeline looks like:

FS1 -> Clf_et
FS1 -> Clf_ridge

FS2 -> Clf_et
FS2 -> Clf_ridge

Question: do I need four (4) join nodes, one for the output of each of these chains? Or is there some magical expansion that will let me avoid creating four separte node_join_Xyz = pe.Node( …)
statements in my code with four separate sets of connect statements?

Again, I add another preprocessor,
PP1 -> FS1
PP2 -> Fs1

Then, I have 8 total “paths” through the data. Does that mean I now need 8 separate join nodes?
e.g
PP1 -> FS1 -> CLF_et
PP1 -> FS1 -> CLF_ridge
PP1 -> FS2 ->
…

Seems like I’m gaining with nipypes implicit expansion and losing when I need to join things up.

Circling back to the beginning- perhaps I only actually need one join node instead of join_et, and join_ridge for my current framework?

satra · December 3, 2016, 5:11pm

without going into iterables optimization, the simplest is to have a single IdentityInterface that has iterables for each of the dimensions

node.iterables = [('subject_left_out', [list of subjects]),
                  ('feature_selection_procedure', [list of steps]),
                  ('learning_algorithm', [list of algorithms])]

this creates a cube of combinations, and the output from this node can be used to drive different Function nodes.

at the end then you join these back.

Gregory_Ciccarelli · December 3, 2016, 6:02pm

This seems to imply:

Only one join node is needed
That join node will have to do some fancy reapportioning/consolidation, though maybe some of that will be taken care of depending on how it does the join
Pseudocode
cube_of_params = node.iterables()

def bignode(cube_of_params)
return y_hat_for_subj_x_fs_y_learnalg_z

connect(node_iterables, big_node)
connect(big_node, join_node)

join_node then returns y_hat_cube

satra · December 3, 2016, 6:57pm

For the moment, i would recommend trying it with a simple example. There may be bugs to resolve. However, in the next generation engine (sometime next year), this will be explicitly taken care of.

Gregory_Ciccarelli · December 3, 2016, 6:59pm

While this works in that it runs, no clever joining appears to happen at the join node. It just mashes all the inputs together. So, the “source” from which each input to the join node comes (subj x, fs y, learn_alg z) must also be passed out from big node not just the y_hat value itself. Then, join node can use that information to slice things up.

satra · December 3, 2016, 9:12pm

can you please file an issue in nipype for this?