Creation of non-finite values during partial least squares (SVD) analysis and cross-validation

hechlera · June 20, 2022, 3:01pm

Hi everyone,

I have a very resilient problem during a partial least squares analysis which, I guess, might be a more general problem when running and validating PCA’s or SVD’s. I am using the pyls python library (GitHub - rmarkello/pyls: A Python implementation of Partial Least Squares (PLS) decomposition) to run a behavioral PLS, associating relative changes in 360 ROIs (Glasser 2016 parcellation) to two behavioral variables of my experiment. The error occurs when pyls turns to scipy and sklearn to validate the results of the SVD (see traceback snippet below). Maybe I am missing something general here that is not specific to the behavioral PLS.

Now, even though I thoroughly cleaned my data of any non-finite entries and as a last step, even of 0’s, I get a ValueError somewhere along the line, either during bootstrapping or during cross-validation of the results.

My data is attached.
roi_values.txt (103.9 KB)
behavioral_values.txt (517 Bytes)

Snippet of traceback:

~/.conda/envs/pr_andre_env/lib/python3.9/site-packages/pyls-0.0.1-py3.9.egg/pyls/types/behavioral.py in _single_crossval(self, X, Y, inds, groups, seed)
166 # calculate r & r-squared from comp of rescaled test & true values
167 r_scores = compute.efficient_corr(Y_test, Y_pred)
→ 168 r2_scores = r2_score(Y_test, Y_pred, multioutput=‘raw_values’)
169
170 return r_scores, r2_scores

~/.conda/envs/pr_andre_env/lib/python3.9/site-packages/sklearn/metrics/_regression.py in r2_score(y_true, y_pred, sample_weight, multioutput)
787 -3.0
788 “”"
→ 789 y_type, y_true, y_pred, multioutput = _check_reg_targets(
790 y_true, y_pred, multioutput
791 )

~/.conda/envs/pr_andre_env/lib/python3.9/site-packages/sklearn/metrics/_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
94 check_consistent_length(y_true, y_pred)
95 y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
—> 96 y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
97
98 if y_true.ndim == 1:

~/.conda/envs/pr_andre_env/lib/python3.9/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
798
799 if force_all_finite:
→ 800 _assert_all_finite(array, allow_nan=force_all_finite == “allow-nan”)
801
802 if ensure_min_samples > 0:

~/.conda/envs/pr_andre_env/lib/python3.9/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
112 ):
113 type_err = “infinity” if allow_nan else “NaN, infinity”
→ 114 raise ValueError(
115 msg_err.format(
116 type_err, msg_dtype if msg_dtype is not None else X.dtype

ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’).

I would be glad for any help, I tried debugging for many, many hours but I am running out of idea.

Thanks so much,

André