NaNs when running CanICA

Hi everyone,
I am using CanICA to build a brain parcellation of a certain part of an insect (Drosophila) brain. Recently I keep on running into 2 kinds of errors.
The first seems to be related to joblib and it seems i can prevent it by using only a single job (obviously slowing everything down enormously). The second is fastica crashing because of NaNs.

My data is quite big, not sure this might be related. its 512*256*54*60 (x*y*z*t) * 16 'subjects'. I am running this on a node with 64 cores & 500GB RAM. I usually use a 2x downsampling.

Details:

  1. I started running my code on a cluster and keep on getting crashes SystemError: error return without exception set. Some googling seems to pint into a joblib bug and suggests reducing the number of parallel threads, but reducing from 32 to 3 still keeps on crashing

  2. Running with 1 job takes forever and then keeps on resulting in ValueError: array must not contain infs or NaNs errors. I had this before, running on my local machine, and repeating it over and over would eventually yield a successful run. I tried to remove all NaNs & Infs from the input data, without success. turning off normalization (input data is DF/F) seemed to help at some point put doesn’t help anymore.

Here is the full error log regarding the NaN issue:

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
WARNING: The directory '/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
WARNING: The directory '/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
  WARNING: The scripts f2py, f2py2 and f2py2.7 are installed in '/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts nib-dicomfs, nib-diff, nib-ls, nib-nifti-dx, nib-tck2trk, nib-trk2tck and parrec2nii are installed in '/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script skivi is installed in '/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
/.local/lib/python2.7/site-packages/nilearn/__init__.py:73: DeprecationWarning: Python2 support is deprecated and will be removed in the next release. Consider switching to Python 3.6 or 3.7.
  _python_deprecation_warnings()
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
/.local/lib/python2.7/site-packages/sklearn/decomposition/fastica_.py:121: ConvergenceWarning: FastICA did not converge. Consider increasing tolerance or the maximum number of iterations.
  ConvergenceWarning)
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed: 174.4min remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 349.8min remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed: 525.8min remaining:    0.0s
/.local/lib/python2.7/site-packages/sklearn/decomposition/fastica_.py:61: RuntimeWarning: invalid value encountered in sqrt
  return np.dot(np.dot(u * (1. / np.sqrt(s)), u.T), W)
Traceback (most recent call last):
  File "mindPeek_parcellation_ica__cluster_2states_1.py", line 78, in <module>
    canica.fit(files_list)
  File "/.local/lib/python2.7/site-packages/nilearn/decomposition/base.py", line 413, in fit
    self._raw_fit(data)
  File "/.local/lib/python2.7/site-packages/nilearn/decomposition/canica.py", line 245, in _raw_fit
    self._unmix_components(components)
  File "/.local/lib/python2.7/site-packages/nilearn/decomposition/canica.py", line 197, in _unmix_components
    for seed in seeds)
  File "/.local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 924, in __call__
    while self.dispatch_one_batch(iterator):
  File "/.local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "/.local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/.local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 182, in apply_async
    result = ImmediateResult(func)
  File "/.local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 549, in __init__
    self.results = batch()
  File "/.local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/.local/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 355, in __call__
    return self.func(*args, **kwargs)
  File "/.local/lib/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 344, in fastica
    W, n_iter = _ica_par(X1, **kwargs)
  File "/.local/lib/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 111, in _ica_par
    - g_wtx[:, np.newaxis] * W)
  File "/.local/lib/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 58, in _sym_decorrelation
    s, u = linalg.eigh(np.dot(W, W.T))
  File "/.local/lib/python2.7/site-packages/scipy/linalg/decomp.py", line 374, in eigh
    a1 = _asarray_validated(a, check_finite=check_finite)
  File "/.local/lib/python2.7/site-packages/scipy/_lib/_util.py", line 239, in _asarray_validated
    a = toarray(a)
  File "/.local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 498, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

Looking at your traceback, the NaNs appear in the fastICA code. I think that this happens when the number of components is too high compared to the data.

What number of components are you using?

As a side note, it would help if you pasted the traceback as a preformatted block, to respect the line returns.

Another comment: I notice that you are using Python 2, which is no longer maintained (Python 3 has been out for ten years). New versions of nilearn, bringing improvements or bugs fixes, will not run under Python 2.

Thanks for the suggestions.

Indeed it happened while I was playing with high number of components. I was using 400 after having previously run it with 200. How would I determine the max number of components I can use in the group-ica?

I managed to reformat the post, thanks.

Also I am aware of the Python 2 issue, still quite new to Python and no idea why started using Python 2. CHanging my code is on my todo list.

After running it in Python3 it seems much more stable, yet sometimes the error happens. For most of my data it was sufficient to re-run the code several times to eventually finish.
Still I have one combination of data does not run…