Tdt with perfusion data with 2 classes (pre/post)

Dear TDT users

I have a question whether the design was correctly made and how to perform permutations. I have 146 subjects with CBF (perfusion) maps during two different conditions and I want to classify the two different conditions with the TDT (the maps are normalized in MNI space and corrected for global CBF-values).

I started the script with the decoding_template_nobetas.m and decoding_template_between subjects.m scripts and defined the following.

path_to_group = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/data_dir’;
group1_files = spm_select(‘FPList’, path_to_group, ‘^.*pre_gm04_res3.nii$’); % Condition 1
group2_files = spm_select(‘FPList’, path_to_group, ‘^.*post_gm04_res3.nii$’); % Condition 2

data_files = [cellstr(group1_files); cellstr(group2_files)];

cfg = decoding_defaults();
cfg.analysis = ‘searchlight’
cfg.results.overwrite = 1;

cfg.verbose = 2;

cfg.classification.method = ‘libsvm’;
cfg.decoding.method = ‘classification_kernel’;

cfg.labelname = {‘Post’, ‘Pre’};

label_all=[ones(146,1);-ones(146,1)] %first 146 post then 146 pre
cfg.labels = label_all;

cfg.results_dir = output_dir;

cfg.scale.method = ‘min0max1’;
cfg.scale.estimation = ‘all’;

cfg.files.label=label_all;

cfg.scale.estimation=‘all’;
cfg.files.chunk=[(1:146) (1:146)]; % that means each subjects has their own chunck number

cfg.design.unbalanced_data =‘ok’; %unclear why script gave an error because 146 images in both cases

cfg.files.mask = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/masks/GM_thresh04_res3_bin.nii’;
cfg.files.name=data_files;

% using make_design_boot_cv compared to make_design_cv yielded total different results
%cfg.design = make_design_boot_cv(cfg,100,1) %
cfg.design = make_design_cv(cfg) % I decided to use this approach because with fix pairs (leave one subject (pair) out)
cfg.results.output = (‘AUC_minus_chance’,‘Accuracy_minus_chance’,'balanced_accuracy))

%cfg.results_dir = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_dir’;
cfg.results.dir = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_dir’;

display_design(cfg)
results = decoding(cfg)

Then I tried to run permutation test using the examples from make_design_permutation
cfg = rmfield(cfg,‘design’); % this is needed if you previously used cfg.
cfg.design.function.name = ‘make_design_cv’;
n_perms = 1000; % pick a reasonable number, the function might compute less if less are available
combine = 0;
designs = make_design_permutation(cfg,n_perms,combine);
for i_perm = 1:n_perms
cfg.design = designs{i_perm};
cfg.results.dir = [‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_norm/perm’ sprintf(‘%04d’,i_perm)];
decoding(cfg); % run permutation
end

However, after one day only 6 permutations were finished. How can I adjust the script to speed up the process? It’s not clear for me how to run in parallel mode (parallel pool in matlab). Should I run several make_design_permutations and combine them later?
When every subject is an own chunk (like in this design), what is the best solution?

Moreover, I was wondering about negative values in res_AUC_minus_chance and res_accuracy_minus_chance maps (with values less than -25).

Best,
Ralf

Hi Ralf,

It seems like you are primarily asking about speeding up your analysis. The good news: You don’t have to do leave-one-pair-out. Instead, I would suggest doing 5-fold cross-validation which will dramatically speed up your analysis. Essentially, you want to chunk people together in an 80-20 split, i.e., make it 5 chunks.

If you are, however, unsure if that will lead to an unbalanced analysis, you could also just select a random split for each run and just do a random 80-20 split as many times as you like, e.g. 20 times or 30 times. However, for maximal speed I would go with 5-fold CV.

Values below chance are normal and expected when running this and should occur by chance. If they are systematic, then that indicates systematic discrepancies in your data that lead to imbalances. But that’s rather unlikely.

Hope this helps!
Martin

Hi Martin,

Thanks for your fast reply. That means for the decoding I can use the procedure with one chunk per subject, but for the permutation I should create a chunk with 80/20 distribution (5-fold). Let’s say we have 145 subjects with pre-post pairs (first 145 pre and then 145 post) then I would
create a variable cfg.files.chunk=[ones(1,29) 2ones(1,116) ones(1,29) 2ones(1,116)] and run the permuation according to your examples.

cfg = rmfield(cfg,‘design’); % this is needed if you previously used cfg.
cfg.files.chunk=[ones(1,29) 2ones(1,116) ones(1,29) 2ones(1,116)];
cfg.design.function.name = ‘make_design_cv’;
n_perms = 100; % pick a reasonable number, the function might compute less if less are available, I used 100 for testing
combine = 0;
designs = make_design_permutation(cfg,n_perms,combine);
for i_perm = 1:n_perms
cfg.design = designs{i_perm};
cfg.results.dir = [‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_norm/perm’ sprintf(‘%04d’,i_perm)];
decoding(cfg); % run permutation
end

But I’m not sure whether this is correct with the chunks, because the 20/80 is always the same (see the attached design that pop-up after the first and second permutation).
Or should I make
cfg.files.chunk=[ones(1,29) 2ones(1,29) 3ones(1,29) 4ones(1,29) 5ones(1,29) ones(1,29) 2ones(1,29) 3ones(1,29) 4ones(1,29) 5ones(1,29)];?

Should I create lets say 20-30 random distribution with 5 chunks run permuatation and then later combine it somehow?
How should I proceed for the statistics?

Best,
Ralf


Hi Ralf,

Sorry, I wasn’t quite clear. You want to keep the procedure the same for the regular run and permutation testing, but only shuffle the assignment of labels and data. There is one catch: Since you have two labels per participant, you actually want to shuffle labels within subject. I would suggest a two step procedure.

First, you create your permutation design. This will give you the right labels for cfg_perm.design.label. However, for cfg_perm.design.train and cfg_perm.design.test, I would create a new design, and for simplicity just choose a single 80/20 split. Then for this design which I will call cfg, you take the first column of cfg.design.train and cfg.design.test, replicate it 1000 times, and use that to replace the original cfg_perm design fields (depending on the exact design it might actually be not the first column but more columns - just make sure that there is a direct match).

I hope this makes sense!

Alternatively, you can also manually create the shuffling. But then you want to make sure that the shuffling is only done within subject (because that respects exchangeability) and that the chunks remain the same.

Hope this helps!
Martin

Hi Martin,

with some help from chatGPT :wink: I created the following script.
Could you check if the code is OK so far? The n-fold 5 permutation is now done manually, with the permutation always taking place at the same samples (see design perm1 perm2).

clear all
clc
%addpath(genpath(‘/opt_prg/spm_tool_12/decoding_toolbox_3999I/decoding_toolbox’));
path_to_group = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/data_dir_norm’;

output_dir=‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_par’;
cd(output_dir);

% Define input files (NIfTI images)

group1_files = spm_select(‘FPList’, path_to_group, ‘^.*pre_gm_norm.nii$’); % Condition 1
group2_files = spm_select(‘FPList’, path_to_group, ‘^.*post_gm_norm.nii$’); % Condition 2

% Set decoding configuration
cfg = decoding_defaults();
% Combine into one dataset
data_files = [cellstr(group1_files); cellstr(group2_files)]; %check reversed order
data_files = sort(data_files); %sort subjects_first post than pre

cfg.files.name=data_files;

% Create labels (1 for condition 1, -1 for condition 2)
cfg.files.labelname = {‘Post’,‘Pre’};
cfg.files.label = repmat([1 -1], 145, 1)
cfg.files.label = cfg.files.label(:);
cfg.files.chunk = repelem(1:145, 2)';

% Define the directory where the preprocessed NIfTI images are stored

cfg.results_dir = output_dir;

cfg.scale.method = ‘min0max1’;
cfg.scale.estimation = ‘all’;

%cfg.analysis = ‘classification’; % Binary classification
cfg.analysis = ‘searchlight’; % Binary classification
cfg.results.overwrite = 1; % Overwrite existing results

%cfg.decoding.method = ‘classification’;
cfg.decoding.method = ‘classification_kernel’;
%print all
cfg.verbose = 2;
% Specify classifier
cfg.classification.method = ‘libsvm’; % SVM classifier (can be ‘lda’, ‘svm’, 'libsvm’etc.)

cfg.files.mask = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/Pronto_class/masks/GM_thresh04_res3_bin.nii’;

cfg.results.dir = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_par’;
cfg.design = make_design_cv(cfg);
cfg.design.unbalanced_data =‘ok’;

cfg.results.output = {‘accuracy_minus_chance’,‘AUC_minus_chance’,‘balanced_accuracy’}

results = decoding(cfg);

% create permutation labels

n_perm = 1000;
n_subjects = 145;

perm_labels = zeros(2 * n_subjects, n_perm);

for p = 1:n_perm
flip = rand(n_subjects,1) > 0.5; % randomly choose whether to flip
for s = 1:n_subjects
orig_labels = [1; -1];
if flip(s)
orig_labels = -orig_labels; % flip labels
end
perm_labels(2s-1:2s, p) = orig_labels;
end
end

%setup a train test design for permutation
cfg_perm = cfg; % copy your original cfg

%manually assignemt 5 fold

n_subjects = 145;
n_samples = 2 * n_subjects;
n_perm = 1000;

% — (1) Create a single 80/20 train/test split manually —
design_train = false(n_samples, 1);
design_test = false(n_samples, 1);

% Random split of subjects
rng(42); % for reproducibility
rand_subjects = randperm(n_subjects);
n_train_subj = round(0.8 * n_subjects); % ~116 subjects
train_subj = rand_subjects(1:n_train_subj);
test_subj = rand_subjects(n_train_subj+1:end);

for s = 1:n_subjects
idx = (s-1)*2 + (1:2); % indices for this subject’s 2 samples
if ismember(s, train_subj)
design_train(idx) = true;
else
design_test(idx) = true;
end
end

% — (2) Replicate train/test for each permutation —
cfg_perm = cfg; % copy the original cfg with data info etc.
cfg_perm.results.dir = ‘/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_par/results/permutation’;
cfg_perm.results.output = {‘accuracy’};
cfg_perm.decoding.method = ‘classification’;

cfg_perm.design.train = repmat(design_train, 1, n_perm);
cfg_perm.design.test = repmat(design_test, 1, n_perm);

%
% Replicate train/test masks for use in loop
cfg_train_mask = repmat(design_train, 1, n_perm);
cfg_test_mask = repmat(design_test, 1, n_perm);

perm_accuracies = zeros(n_perm, 1);
% for p = 1:n_perm
% perm_accuracies(p) = results_perm{p}.accuracy_minus_chance; %
% end

save(‘results/permutation/permutation_accuracies.mat’, ‘perm_accuracies’);

% === Prepare ===
results_perm = cell(n_perm, 1);
perm_accuracies = zeros(n_perm, 1); % preallocate for speed

for p = 1:n_perm
cfg_perm_p = cfg; % make a separate config for each worker

cfg_perm_p.results.dir = fullfile('/server/fo2-22/data/CBF_all_data/CBF_post_minus_pre/TDT_class/output_par/results/permutation/', sprintf('perm_%04d', p));
cfg_perm_p.design.label = perm_labels(:, p);
cfg_perm_p.design.train = cfg_train_mask(:, p);
cfg_perm_p.design.test = cfg_test_mask(:, p);

result = decoding(cfg_perm_p); % run decoding

results_perm{p} = result; % store full result
perm_accuracies(p) = results_perm(p).accuracy_minus_chnace

end

original_accuracy = results.accuracy;
p_empirical = mean(perm_accuracies >= original_accuracy);
fprintf(‘Empirical p-value = %.4f\n’, p_empirical);


Any improvements or suggestions?
Best
Ralf