Issue: fitting nimare.decode.continuous.CorrelationDecoder to Neurosynth dataset

Hi Taylor,

Sorry for the delay in my response, and thanks for looking into this! Indeed, that was the problem that was causing the zeros in my meta-analysis images! I realized there was a typo in the code from this response, which I was following, where both label_positive_dset and label_negative_dset were assigned the same value. I must have missed it when writing up my script.

Below is my script replicating your local test. The decoder fitting time with these changes is about ~10 min for a single term, compared to the ~30 min needed when using CorrelationDecoder functions.

import os
import nimare
import numpy as np

# Save meta-analytic maps to an output directory
data_dir = os.path.abspath("nimare_ns_data/")
out_dir = f"{data_dir}/meta-analyses_maps/"
label = "terms_abstract_tfidf__adolescence"

dataset = nimare.dataset.Dataset.load(os.path.join(data_dir, "neurosynth_dataset.pkl.gz"))
# dataset.update_path(os.path.join(data_dir, "study_maps/")) # is this needed?

# Initialize the Estimator
meta_estimator = nimare.meta.cbma.mkda.MKDAChi2()

label_positive_ids = dataset.get_studies_by_label(label, 0.001)
label_negative_ids = list(set(dataset.ids) - set(label_positive_ids))

label_positive_dset = dataset.slice(label_positive_ids)
label_negative_dset = dataset.slice(label_negative_ids)
meta_result = meta_estimator.fit(label_positive_dset, label_negative_dset)
meta_result.save_maps(output_dir=out_dir, prefix=label)

A couple of follow-up questions: I noticed that when I added the code for pre-generating the MA maps after initializing the estimator (as you suggest in this response) the script threw a MemoryError during the subsequent meta_estimator.fit() line:

# Pre-generate MA maps to speed things up
kernel_transformer = meta_estimator.kernel_transformer
dataset = kernel_transformer.transform(dataset, return_type="dataset")
dataset.save("neurosynth_with_ma.pkl.gz")

Similarly, when I ran the script loading the saved dataset with MA maps instead of neurosynth_dataset.pkl.gz, i.e. dataset = nimare.dataset.Dataset.load("neurosynth_with_ma.pkl.gz"), I get the same error:

ERROR:nimare.meta.cbma.mkda:_fit failed, removing None
ERROR:nimare.meta.cbma.mkda:_fit failed, removing None
Traceback (most recent call last):
  File "nimare_fit_decoder_fast.py", line 56, in <module>
    meta_result = meta_estimator.fit(label_positive_dset, label_negative_dset)
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nimare/meta/cbma/base.py", line 719, in fit
    maps = self._fit(dataset1, dataset2)
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nimare/utils.py", line 473, in memmap_context
    return function(self, *args, **kwargs)
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nimare/meta/cbma/mkda.py", line 231, in _fit
    ma_maps2 = self._collect_ma_maps(
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nimare/meta/cbma/base.py", line 168, in _collect_ma_maps
    ma_maps = self.masker.transform(self.inputs_[maps_key])
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/input_data/base_masker.py", line 185, in transform
    return self.transform_single_imgs(imgs, confounds)
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/input_data/nifti_masker.py", line 443, in transform_single_imgs
    data = self._cache(filter_and_mask,
  File "/scinet/niagara/software/2019b/opt/base/python/3.8.5/lib/python3.8/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/input_data/nifti_masker.py", line 68, in filter_and_mask
    data, affine = filter_and_extract(imgs, _ExtractionFunctor(mask_img_),
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/input_data/base_masker.py", line 99, in filter_and_extract
    region_signals, aux = cache(extraction_function, memory,
  File "/scinet/niagara/software/2019b/opt/base/python/3.8.5/lib/python3.8/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/input_data/nifti_masker.py", line 30, in __call__
    return(masking.apply_mask(imgs, self.mask_img_,
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/masking.py", line 759, in apply_mask
    return _apply_mask_fmri(imgs, mask_img, dtype=dtype,
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/masking.py", line 803, in _apply_mask_fmri
    series = _utils.as_ndarray(series, dtype=dtype, order="C",
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/_utils/numpy_conversions.py", line 110, in as_ndarray
    ret = _asarray(arr, dtype=dtype, order=order)
  File "/gpfs/fs1/home/m/mchakrav/daialy/.virtualenvs/nimare_venv/lib/python3.8/site-packages/nilearn/_utils/numpy_conversions.py", line 27, in _asarray
    ret = np.asarray(arr, dtype=dtype, order=order)
  File "/scinet/niagara/software/2019b/opt/base/python/3.8.5/lib/python3.8/site-packages/numpy-1.19.2-py3.8-linux-x86_64.egg/numpy/core/_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
MemoryError: Unable to allocate 95.0 GiB for an array with shape (91, 109, 91, 14130) and data type int64

So, it seems that having the MA maps in memory creates issues. Am I correct then, that in my current script (the first code block), the MA maps are just generated every time the script is run? Based on my tests on the cluster it seems that not having the MA maps pre-generated is the only way the code runs to completion without memory errors. Not sure if I’m missing something.

Also, I was wondering when it is necessary to run dataset.update_path(data_dir) after loading the dataset. In my script above it seems that the line can be omitted without errors, and even when it is included there are no MA file outputs. Meanwhile, when using the CorrelationDecoder functions, the MA maps seem to be generated everytime decoder.fit(dataset) is run, even if data_dir is already populated. Is this the expected behaviour?

Thanks as always for your time, and apologies for all my questions!

Best,
Alyssa