Hi Tamir,
Great you like our toolbox! Scaling is used mostly to make the range of the data more “normal” for the classifier and can speed up things significantly depending on what classifier you use. I’d check out the help file for decoding_scale_data
.
Also, this is from our paper on TDT.l and goes into a bit more detail. Have a read, and in case you have other questions, perhaps we have covered it!
Hope this helps,
Martin
Scaling
Scaling is the process of adjusting the range of data which enters the classifier. This can be done to bring data to a range which improves the computational efficiency of the classifier (for example LIBSVM recommends scaling all data to be between 0 and 1). It can, however, also be used to change the relative contribution of individual features or individual samples or to remove the influence of the mean spatial pattern (Misaki et al., 2010; but see Garrido et al., 2013) which might affect classification performance. Scaling is also known as normalization, but we prefer the term scaling to distinguish it from another meaning of the term “normalization” which is commonly used in the MRI community to refer to spatial warping of images.
Typically, row scaling is used, i.e., scaling across samples within a given feature. Although scaling can theoretically improve decoding performance, for some data sets it may not have any influence (Misaki et al., 2010). Practically, scaling often has little or no influence on decoding performance when beta images or z-transformed data are passed, because this data already represents a scaled form of the raw images which is scaled relative to each run, rather than to all training data. However, scaling may still speed-up classification.
TDT allows a number of different settings: Either all data are scaled in advance (in TDT: “all”), which is only valid when scaling carries no information about class membership that influences test data, or scaling is carried out on training data only and these estimated scaling parameters are then applied to the test data (in TDT: “across”). The typically used scaling methods which have also been implemented in TDT are min0-max1 scaling or z-transformation. Min-max scaling scales all data to a range of 0 and 1, while z-transformation transforms data by removing the mean and dividing by the standard deviation. In addition to scaling data to a specified range, cut-off values can be provided for outlier reduction (Seymour et al., 2009). With this setting, all values larger than the upper cut-off are reduced to this limit, and all values smaller than the lower cut-off are set to this value. In TDT, these approaches can be combined with outlier reduction.
Example call:
cfg.scale.method = ’across’;
% scaling estimated on training data and
applied to test data
cfg.scale.estimation = ’z’;
% z-transformation as scaling approach
cfg.scale.cutoff = [-3 3]; % all values
> 3 are set = 3 (here: 3 standard
deviations, because data is
z-transformed)