Trouble extracting data from a NiMARE Dataset object using slice() method

dimitarnivanov · January 27, 2023, 11:30am

Summary of what happened:

I am struggling to get the coordinates from four studies I want to analyze from Neurosynth. I downloaded the full Neurosynth database and then used the slice() method with ids, stored in a 1D numpy array. Even though when I print the new data set it says it includes four experiments, when I try to view their ids, coordinates, etc. the sections are empty.

Command used (and if a helper script was used, a link to the helper script or the command generated):

import numpy as np
ids = np.array([21568643,26610651,11087002,18510438])

neuro_dset = neurosynth_dset.slice(ids)
neuro_dset.save(os.path.join(out_dir, "sliced_neurosynth.pkl.gz"))

neuro_dset.coordinates.head()

Version:

The version was python 3.8.10

Environment (Docker, Singularity, custom installation):

I did everything in Google Colab because for some reason I could not install nimare on my machine.

Data formatted according to a validatable standard? Please provide the output of the validator:

I used NiMARE’s documentation’s recommended way of extracting the data from Neurosynth and converting it into a .pkl.gz dataset:

out_dir = os.path.abspath("/example_data/")
os.makedirs(out_dir, exist_ok=True)

files = fetch_neurosynth(
    data_dir=out_dir,
    version="7",
    overwrite=False,
    source="abstract",
    vocab="terms",
)
# Note that the files are saved to a new folder within "out_dir" named "neurosynth".
pprint(files)
neurosynth_db = files[0]

neurosynth_dset = convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db["coordinates"],
    metadata_file=neurosynth_db["metadata"],
    annotations_files=neurosynth_db["features"],
)
neurosynth_dset.save(os.path.join(out_dir, "neurosynth_dataset.pkl.gz"))

Relevant log outputs (up to 20 lines):

“Dataset(4 experiments, space=‘mni152_2mm’)”

Screenshots / relevant information:

output

tsalo · January 27, 2023, 12:42pm

Can you try using the full Neurosynth IDs as strings?

ids = ["21568643-1", "26610651-1", "11087002-1", "18510438-1"]

dimitarnivanov · January 27, 2023, 1:25pm

Hi Taylor! Yes, that worked. Thank you so much.

dimitarnivanov · January 28, 2023, 2:01pm

Hi Taylor,
On another note, when I am trying to initialize a meta analysis object and to fit it to my data, I receive a KeyError ‘sample_size’. I tried creating a kernel object despite the default to ALEKernel like so:

ale = nimare.meta.kernel.ALEKernel()

and then passing it as a key:

meta1 = nimare.meta.cbma.ale.ALE(kernel_transformer = ale, null_model="approximate")

but it did not change anything.

I also tried modifying the kernel object itself by passing different values for the key sample_size:

ale = nimare.meta.kernel.ALEKernel(sample_size = None)

but I am still getting an error.

I am not sure where exactly I should define the sample size. I tried the same code for a bigger dataset, because I assumed four studies might be too little, but the same error showed up.

tsalo · January 28, 2023, 3:07pm

The ALE kernel’s FWHM depends on the study’s sample size, so the ALEKernel object will attempt to extract experiment-wise sample sizes from the Dataset, which explains the failure. Given that you’re using the Neurosynth dataset, which doesn’t have sample size information, you can set a constant sample size to use across studies. That value can’t be None, because it still needs some way to determine the appropriate FWHM of the kernel, but it can be a reasonable sample size for the literature (e.g., 20 or 30). In that situation, you can just do what you did, except provide a number instead of None.

kernel = nimare.meta.kernel.ALEKernel(sample_size=20)
meta1 = nimare.meta.cbma.ALE(kernel_transformer=kernel, null_model="approximate")
meta_results1 = meta1.fit(test_dset)

Also, it’s a minor thing, but in the screen shot you shared, you called the Estimator object meta1, but you ran meta.fit, so I worry that you were fitting an old object, rather than the new one you provided the kernel to.

dimitarnivanov · January 28, 2023, 3:16pm

Thank you, the solution worked. The screenshot was to showcase the mistake; I just played with some test datasets and so I changed some variable names before running it. I appreciate the reminder.