TDT PCA no results.accuracy_minus_chance

Hi all,

I use the TDT in matlab, and I dont know why it does not exist the file "accuracy_minus_chance.mat" when I add the 3 lines of PCA.

Here are the codes of configuration of TDT.

cfg = decoding_defaults;
cfg.analysis = 'ROI'; % 'wholebrain' but limited to the ROI using masks
cfg.software = 'SPM12';
%cfg.results.output = {'accuracy_minus_chance'}; % 'accuracy_minus_chance' by default
cfg.results.dir = outdir;
cfg.results.filestart = ['perm' sprintf('%04d',0)];
%cfg.results.write = 0; % Set to 0 to avoid saving output files by default
cfg.results.overwrite = 1; % overwrite allowed
%cfg.plot design = 0; % Cross-validation diagrams are neither generated nor saved
cfg.verbose = 0;
cfg.plot selected_voxels = 0; % 0: no plotting, 1: every step, 2: every second step, 100: every hundredth step...

cfg.files.name = dtFiles; % a 1xn array of file names of the selected subject
cfg.files. label= fileLabels; % a nx1 array
cfg.files.chunk = fileChunks; % a 80x1 vector of lables, for each subject
cfg.files.mask = fileMasks; % Select the correspondant roi masks

cfg.design = make_design_cv(cfg); % Automatic creation of the leave-one-run-out cross validation design
%% Random Splitting (RS) Design

% This creates a design where all combinations of ntrain trials are integrated in the design
% L00 ntrain = ntrial-1
% Matrices train and test of size = [ntrial x length(labels)] x [nchunk]
% Here [40 (trials: 20x2runs) x 2(conditions: sweet or water)] x 40 (chunks) = 80x40
nsplit = nchoosek(ntrial, ntrain); % All combinations of ntrain trials
% C = nchoosek(1:ntrial, ntrain); % All combinations of ntrain trials, array exceeds maximum array size preference in this case
if nsplit<nchunk
    disp("Rechoose nchunk");
end
train_run = zeros(ntrial, nchunk); % Initialize the train matrix for each run
for i 1:nchunk
    train_trials = randperm(ntrial, ntrain); % Trials used for training
    train_run(train_trials, 1) = 1; % Set the corresponding entries to 1
end
train = [train_run; train_run];
test = -train; % All trials belong to train or test

% Affectation in the cfg.design structure
cfg.design.label= repmat(fileLabels, 1, nchunk);
cfg.design.set = repmat(cfg.design.set(:,1), 1, nchunk);
cfg.design.test = test;
cfg.design.train = train;
cfg.design.function.name = mfilename; % Put current script name
cfg.design.function.ver = date; % Put current date

And below the 3 lines of PCA I want to add
I don’t know exactly how to set the feature.transformation.critical_value, so I just give it to 0.
But it does not generate the “accuracy_minus_chance.mat” any more.

cfg.feature_transformation.method = 'PCA';
cfg.feature_transformation.estimation = 'all';
cfg.feature_transformation.critical_value = 0; % only keep components that explain at least 10 percent variance

And also, I would like to use the permutation after PCA, but it seems that after adding the PCA, none of the exchanges were performed.

I am confused BECAUSE everything was fine until the three lines of code about PCA were added. Thanks in advance for the responses.

SSS

Hi SSS,

I’m sorry, I don’t know what could have caused your issue but your design looks a little weird. For train and test you want to set all values to 1 that are included and all values to 0 that are excluded. Also I would recommend against an actual leave one out design since this will be unbalanced in the labels. Instead, I would use a leave one pair out design (i.e. leaving out one trial per level).

If you could let me know about warnings you get, that would probably help resolve this issue.

The critical value is usually percent variance explained (values between 0 and 1).

Best,
Martin

Hi Martin,
Thanks for your answer. Here is the error when the codes are:

3 lines of PCA
results = decoding(cfg);

Even though I add
cfg.scale.method = "mean’;
It does not work.

Thanks again.
SSS

I’m not entirely sure, but I think you need to add
cfg.feature_transformation.scale.method since this is the method that is being scaled. Not all settings are inherited.

It looks as if we weren’t catching this potential error when people don’t provide this field, sorry about this!

I guess you could just set
cfg.feature_transformation.scale = cfg.scale;
but this has to be set after you set cfg.scale.

I hope this does the trick for you.

Best,
Martin

Hi Martin,
Thanks a lot for your solution! Now, after a few tests, I notice that the problem is it works but it cannot work with ‘parallel calculating’. Because I can get the right output each time when I dont use parallel calculating.
I am confused because I change nothing, it should work.
Here are the codes :

%% Permutation
        combine = 0; % Should be 0 because combine mode - do not work with this design
        ns = size(cfg.design.train,2); % Number of design steps
        if new_perms
            [~,perms] = make_design_permutation(cfg,n_perms,combine); % Leave the same design but change the labels by permutation
            save('perms10000_rs.mat','perms');
        else
            %load('perms10000.mat'); % 80x10000
            load('perms10.mat'); % 80x10000
        end

        outjob_filename = ['perm_acc_' subject '_' roiName '_dt' sprintf('%02d', dt) '.txt'];
        outjob_filename = join(outjob_filename, '');
        outjob_filepath = fullfile(outdir, outjob_filename);




%%%% parellel calculating
        fh = str2func(fhname); % Convert string to function handle
        if run_on_worker

            % Disable warnings temporarily
            %warnState = warning('off', 'all');
            job = batch(clust, fh, 0, {cfg,perms,outjob_filepath}); % Run MATLAB decoding function on worker
            %job = batch(clust, fh, 0, {cfg,perms},"AdditionalPaths",{'/home/jsun/Data/final/MVPA/BoMa08_12dt','/home/jsun/Data/final/Masks_Lobe'}); % Run MATLAB decoding function on worker

            %warning(warnState);
        else
            vacc = fh(cfg,perms,outjob_filepath); % Run function as usual
            title(sprintf('Roi(%d) - dt(%f)',iroi,dt));
            plot(vacc,'o-r');
        end

In the function of “decoding_time_RandomSplitting”:

%% Run the decoding analysis and generate results
np = size(perms,2); % Number of permutations
vacc = zeros(np+1,1);
[results, ~, passed_data] = decoding(cfg);
vacc(1) = results.accuracy_minus_chance.output; % Accuracy without permutation

ns = size(cfg.design.label,2);
for ip = 1:np
    cfg.design = rmfield(cfg.design, 'label');
    cfg.design.label = repmat(perms(:, ip), 1, ns);
    cfg.results.filestart = ['perm' sprintf('%04d',ip)];
    results = decoding(cfg, passed_data); % run permutation
    vacc(ip+1) = results.accuracy_minus_chance.output; % Accuracy with permutation
end

id = fopen(outjob_filepath, 'w');
for ip = 1:np+1
    fprintf(id,'perm(%d) acc(%f)\n',ip-1,vacc(ip));
end
fclose(id);
end

Thank you!

Hi Martin,

I reorganized my thoughts. I now have data divided into two categories, each category consists of 100 individuals and 500 features. I want to reduce the data, in this case {supervised learning}, is using PCA a good choice? {because PCA does not consider class information and only focuses on maximizing the data variance}, and my purpose is to find the features that best distinguish the two categories by maximizing the two classes Data variance.
Especially in TDT, is there any other better method?

Thanks very much,
SSS

Hi SSS,

I would just leave it to your classifier to figure out the importance of each feature. This typically works pretty well.

You could try more laborious methods like recursive feature elimination. I just wanted to let you know that we haven’t tested all possible options heavily when deploying Matlab since this is quite time consuming. But basic options were tested and work as expected.

Best,
Martin