TDT input data scaling: per run?

Yinan_Cao · May 19, 2020, 10:02pm

Hi,

I was just wondering whether the scaling is done across run-specific samples per voxel (dimension), or is done across all samples in training set. Let’s say I have fMRI bold estimates, one beta per condition per run, and I have 3 runs. Thus, in a leave-1-run-out-crossval SVM analysis, my training data consist of 4 samples (because training data = 2 runs). Should I scale the data independently for each run in the first place (this results in -0.7071 and +0.7071, only 2 possible values in all dimensions), or I should scale the data across the 4 training samples? (assuming that I don’t scale my test data). Any general comments or suggestions on scaling for SVM are highly welcomed as well!

Many thanks!
Best,
Yinan

Martin · May 20, 2020, 1:10pm

Hi Yinan,

You can choose what kind of scaling you would like to use, but only ‘separate’ would scale by chunk (i.e. run)

I only use scaling if I feed in unscaled data (betas are usually standardized i.e. scaled anyway) or if deciding for some reason is really slow. Otherwise I haven’t seen much of a benefit myself really.

Best,
Martin

Yinan_Cao · May 20, 2020, 6:15pm

Thank you so much for your prompt reply, Martin! Just checked in real data: scaling doesn’t change much for the decoding results, if anything, raw betas work actually better.