TDT with LDA classifier

Hi TDT experts,

I tried the decoding tutorial to set up a cross-classification analysis (adding z scaling and embedded RFE) with the default libsvm classifier, and it all worked fine. However, when I changed from ‘libsvm’ to cfg.decoding.software = ‘lda’; I got the below error message. Presumably I need to change some default parameters to make them suitable for LDA?

Dot indexing is not supported for variables of this type.
Error in ldatrain (line 51) switch lower(param.shrinkage)

Error in lda_train (line 9) model = ldatrain(labels_train,data_train,cfg.decoding.train.classification.model_parameters);

Error in RFE (line 34) model = feval(cfg.feature_selection.decoding.fhandle_train,labels_train,data_train(:,ranks),cfg.feature_selection);

Error in feature_selection_embedded (line 80)
ranks = feval(cfg.feature_selection.embedded_func,cfg,ranks,final_n_vox,iteration,data_scaled,labels);

Error in decoding_feature_selection (line 235)
[fs_index,n_vox_steps,output] = feature_selection_embedded(cfg,labels,data_scaled,n_vox,nested_n_vox,i_train);

Error in decoding (line 474)
[fs_index,fs_results,previous_fs_data] = decoding_feature_selection(cfg,fs_data);

Error in TDT_ROI (line 115)
[results, cfg] = decoding(cfg);

Hi Fredrik,

Yes, this is a little difficult to automatically catch, but there are defaults set for using libsvm, i.e. when switching to lda, you need to change them. You are asked to provide parameters specific to lda. Type help ldatrain into your command window and you will see the required options. You could then replace the defaults for

cfg.decoding.train.classification.model_parameters.shrinkage = 'lw2';

which would then apply Ledoit-Wolf shrinkage but shrinking towards the variances, not a unit matrix.

Now, things are a little more complicated for your specific analysis. So far, I had not implemented RFE for something other than libsvm. It’s really difficult to create such general methods with extremely nested structures, which is required for maintaining this high level of computational performance. However, I just made a version which I’ll upload sometime soon that should work with lda. If you send me an email, I’ll send you the function to replace. I think this would work for most implemented tools we have, but won’t work for more than 2 classes.

Best,
Martin

Thanks Martin,
the updated feature selection code worked with LDA! However, it seems like I am only getting it to work as long as n_vox is one value, but not when it is a range or ‘automatic’. Most likely I am missing something simple in how to set it up, but it did not work when I tried the example in decoding_feature_selection.

I have tried:
cfg.feature_selection.method = ‘embedded’;
cfg.feature_selection.embedded = ‘RFE’;
cfg.feature_selection.direction = ‘backward’;
cfg.feature_selection.n_vox = ‘automatic’; % also tried [5 10 25 50 75 100];
cfg.feature_selection.nested_n_vox = ‘automatic’; % also tried 5:100;

Inside cfg.feature_selection.design.function I have name = ‘make_design_xclass_cv’ and ver = ‘v20140107’.

My error message:
Error using feature_selection_embedded>run_nest (line 107) Could not create design for nested cross-validation. Need correct information in field ‘cfg.feature_selection.design.function!’

Error in feature_selection_embedded (line 47)
[n_vox_selected,nested_output,cfg.feature_selection.design.msg] = run_nest(cfg,data_scaled,i_train,n_vox,nested_n_vox); %#ok % determine optimal number of features

Error in decoding_feature_selection (line 235)
[fs_index,n_vox_steps,output] = feature_selection_embedded(cfg,labels,data_scaled,n_vox,nested_n_vox,i_train);

Error in decoding (line 474)
[fs_index,fs_results,previous_fs_data] = decoding_feature_selection(cfg,fs_data);

Error in TDT_ROI (line 118)
[results, cfg] = decoding(cfg);

So, the error seems to be because the design is for cross-classification, changing the embedded design to ‘make_design_cv’ fixed it.

I guess the use of feature selection in general can be tricky with cross-classification, since the features are selected based on one class, and the optimal features within that class might not be optimal for the other class, and could be based on the properties or confounds one is trying to get away from with the cross-classification. Do you have any general thoughts or tips regarding optimizing feature selection for cross-classification?

Could one for example do cross-validated cross-classification with embedded RFE to automatically select the optimal number of features by, in each iteration, training across class1 and class2 on all except one withheld run to determine optimal features, and use the optimal features for that iteration’s cross_classification (i.e., training on all but one class1 runs and testing on the withheld class2 run), and then do the same thing for each iteration while withholding and testing on a new run? Or would this be double dipping somehow?

Hi Fredrik,

My general recommendation is actually not to use feature selection, because we usually don’t have enough data to do this in a stable way. Or if you use feature selection, use something stable or based on external data (e.g. a functional contrast) and use heuristics. This would be some of the ‘filter’ methods. That should always work well, because there is usually no overfitting possible to the training set.

Cross-classification doesn’t work for the feature selection part, because feature selection is based on the training data only. Hence, there is no data to cross-classify to. For this, you have no other choice but using cross-validation or random subsampling. Now, there is the exception where you “sacrify” some of the data you would like to cross-classify to for optimizing your internal loop. But that’s a little unusual and will likely not work, since optimization in feature selection would be based on a very small test set.