in my project , i use a nested cross validation in regression task : External cross validation : to evaluate the model performance and Internal cross validation : to estimate the hyperparameters to select the best model.
in External cross val , i use StratifiedKFold(k=10) and for the internal i use StratifiedShuffleSplit … But , i have a problem, for the external cross validation, i get always a result like [0.32, -0.12,-0.34,-0.8, …] (the first fold with a positive R2 score and the rest with negative score…
i try it many times and i have always the same problem …
help please , thank you very much .
StratifiedKFold is deterministic, hence it is logical that you obtain the same result if you repeat the experiment.
Actually, I’d rather do the converse: StratifiedKFoldfor the inner loop and StratifiedShuffleSplit for the outerloop, er even StratifiedShuffleSplit for both.
I personally prefer and recommend the repeated holdout CV (also called shuffle split etc), for ease of implementation as well as reuse for posthoc stats etc
what you describe is difficult to follow, to provide a precise answer… sharing your code might make it easier to see what’s going wrong.
also, if this is a simple model comparison, I’d recommend using existing tools when possible to avoid making such mistakes, for example: GitHub.com/raamana/neuropredict (disclosure I am the dev) … the developer version allows you to run regression …
also take a look my slides on cross validation, if that helps you clarify anything:
hello, thank you very much @bthirion
hello, thank you very much @raamana
FYI: This dev version is now merged into master and released as v0.6: https://github.com/raamana/neuropredict/releases/tag/0.6
You can install the latest version with
pip uninstall neuropredict
pip install -U neuropredict