Parallelization of the TDT Toolbox?

Hi TDT experts,

I run the famous TDT toolbox (v3.996) on a cluster equiped with MATLAB 2016a and 32 cores.
My analysis is a searchlight one (with 140434 searchlights), performing 3 decoding steps for 180 files.

I use the foolowing commands (I have 4 conditions which should elicited a gradual BOLD answer)
cfg = decoding_describe_data(cfg,{labelname1 labelname2 labelname3 labelname4},[-1.5 -0.5 0.5 1.5],regressor_names,beta_loc);
cfg.decoding.method = ‘regression’; % choose this for regression
cfg.results.output = ‘zcorr’;

The problem is : it takes many days to run …

Did I make a mistake somewhere ?
Is it possible to accelerate the processing by parallelizing the TDT toolbox ?
If yes, how should I proceed ?

Thank you very much in advance for your help.
Best regards,
Jean-Luc

Hi Jean-Luc,

Sorry for the delay, I missed this message and have been traveling.

The slow-down can be related to several factors. The two most likely things that will improve processing speed are (1) using scaling (e.g. z-transform using the option all, see help decoding_scale_data) and (2) setting the cost parameter to a smaller value. The latter will typically help a lot without affecting performance, and I would try c = 0.001. You can do this by setting
cfg.decoding.train.regression.model_parameters = '-s 4 -t 0 -c 0.001 -n 0.5 -b 0 -q';

Best,
Martin

Dear Martin,

Thank you very much for your answer!
I am currently changing the cost parameter: the software seems to work much faster. I’ll see what the different results look like.

But I’m still wondering if it’s possible or not to parallel the TDT toolbox on a cluster.
Thank you again for your help.

The best,
Jean-Luc

Hi Jean-Luc,

sorry I didn’t reply to that part of your question. There are several ways you can parallelize TDT:
(1) use Matlab’s parfor (not recommended unless no other possibility)
(2) use multiple instances of Matlab (recommended if possible)
(3) use the Matlab compiler

In general, I do not recommend using Matlab’s parfor, instead I would just recommend running multiple instances of Matlab. If, however, there is a licensing issue, then parfor should work. In case multiple Matlab sessions work, I would call them from command line with matlab -singleCompThread & (without & if you are using Windows). Since libsvm doesn’t use multithreading anyway, this would just make sure that there is no cross-talk between the different Matlab sessions, i.e. it will speed everything up. If neither of these options work and you have the Matlab compiler, you can also compile your code. A guide is provided in prepare_tdt_compiler.m.

Now, if your question is what you could parallelize, then I would say either subjects or searchlights within subject. Parallelize across subjects should be quite straightforward. Within subject, you can run a different subset of searchlights and then combine them together, e.g. using SPM’s imcalc. First, you would figure out how many searchlights there are by starting to run the code assuming you do not parallelize (alternatively you can just overestimate the number). Say it says 49,000 and you have 8 computing cores. Then you want to run 7 parallel iterations. The relevant code would be

iter = 1;
n_sl = 49000;
n_split = 7;
splitsize = ceil(n_sl/n_split);
cfg.searchlight.subset = (iter-1)*splitsize + (1:splitsize); % would run the first 7k searchlights

Now you can change iter for each iteration or make it a (par)for loop or even a function with the input iter.

Hope that helps!
Martin

I know that this is resurrecting an old thread, but I am working with some very large 7T data (~400k voxels in the mask) and have written a couple of implementations that might I thought might be helpful to others. Obviously they’re written for me rather than general use, and as such contain some absolute paths, but this should be easy to deal with.

The first is a simple wrapper to use a Matlab parfor to parallelise by voxel as Martin suggests, then recombine and save at the end. Write your configuration script in the normal way, making sure to load your residuals into a misc structure, but then when you would call:

results = decoding(cfg,[],misc);

instead call

pool = gcp(‘nocreate’);
num_workers = pool.NumWorkers;
num_searchlights = size(misc.residuals,2);
searchlights_per_worker = ceil(num_searchlights/num_workers); % Divide the task up into the number of workers
parfor crun = 1:num_workers
results{crun} = decoding_parallel_wrapper(cfg,misc,searchlights_per_worker,crun)
end
all_results = results{1};
for crun = 2:num_workers
all_results.decoding_subindex = [all_results.decoding_subindex; results{crun}.decoding_subindex];
all_results.other_average.output(results{crun}.decoding_subindex) = results{crun}.other_average.output(results{crun}.decoding_subindex);
end
results = all_results;
disp(‘Crossnobis on the whole brain complete, saving results, note this could take some time’)
save(fullfile(cfg.results.dir,‘res_other_average.mat’),‘results’,’-v7.3’)
assert(sum(cellfun(@isempty,all_results.other_average.output))==0,‘Results Output not completely filled despite completion of the parallel loop - please check’)
delete(fullfile(cfg.results.dir,‘parallel_loop*.mat’))

The decoding_parallel_wrapper function can be found here (7T_pilot_analysis/decoding_parallel_wrapper.m at master · thomascope/7T_pilot_analysis · GitHub). It is a very simple wrapper that sets the searchlight bounds and temporary resultsname for saving.

function results = decoding_parallel_wrapper(cfg,misc,searchlights_per_worker,worker_number)
cfg.searchlight.subset = ((worker_number-1)searchlights_per_worker)+1:worker_numbersearchlights_per_worker;
cfg.results.resultsname = cellstr([‘parallel_loop_’ num2str(worker_number)]);
addpath /group/language/data/thomascope/spm12_fil_r6906/ % Your SPM path for the workers
spm(‘ver’); % Needed or sometimes the decoding toolbox complains in parallel that SPM is not initialised.
results = decoding(cfg,[],misc);

The second is a method for downsampling your searchlight space, without downsampling or losing the input data, and producing a .nii file of the correct dimensions at the end. The resulting searchlight volume is of lower resolution by the downsampling factor, but the output data are significantly smaller and the decoding runs significantly more quickly. A downsampling factor of 2 results in 8 times fewer searchlight locations.
The script can be found here 7T_pilot_analysis/TDTCrossnobisAnalysis_1Subj.m at master · thomascope/7T_pilot_analysis · GitHub and can be put within a parfor of its own to be parallelised by subject. It includes an example script to make a simple effect map.

1 Like