Model comparison of machine learning models

Dear all
I’m struggling with a statistical problem in machine learning:

I’m using event-related fMRI responses (voxels) evoked by graded painful stimulations as feature. The targets are the scores (labels) of different pain-related questionnaires (one measures intensity, others the quality and unpleasantness of pain, fear of pain etc. ).

So, I’ve computed a Multi Kernel Learning (MKL) prediction model for each questionnaire, returning measures of model performance (correlation between predicted and true labels and mean squared error) and brain region weights.

Using 10’000 permutations for each model (swapping the questionnaires scores) and hyperparamteter optimization, each questionnaire model was significant (p<0.05), meaning that the scores of all three questionnaires were predictable by brain activity. However, and interestingly, the models had different underlying neural predictors, characterized by different region weights. Model complexity was comparable between models as all models used C=100 as best hyperparameter.

Now, if I would like to compare the different pain-related questionnaires regarding model performance (r and MSE) and underlying neural predictors (region weights) do I have to correct for multiple comparisons?
I ask because, in this case, the significance of each model (p<0.05) is determined by using 10’000 permutations. Therefore, the probability that the significance of one of these models is driven by a false positive result, is very low….Right ? Why correction for multiple comparisons here?

Many thanks for your feedback and comments on this!
Best, mike

2 Likes