Reproduce Neurosynth Term-based Decoder with NiMARE

Summary of what happened:

Hello all,

I am trying to reproduce the term-based cognitive decoder on neurosynth /neurovault with the nimare package in python. I followed this example to fetch the Neurosynth dataset and this example to apply the correlation decoder (please refer to my script below).

Although I am able to calculate the similarity scores (i.e., correlation coefficients) between my own image and the term masks in the dataset, I noticed that the values are inconsistent with the outputs from the Neurosynth term-based decoder. Besides, the NiMARE decoder outputs more than 3000 terms, while there are only 1298 terms on Neurosynth. I tried to search on the forum, but I could only find discussions regarding the topic-based decoding.

I therefore wonder where the inconsistency might come from. I would really appreciate all the help!

Command used (and if a helper script was used, a link to the helper script or the command generated):

# load package
import os
from pprint import pprint

from nimare.extract import download_abstracts, fetch_neuroquery, fetch_neurosynth
from nimare.io import convert_neurosynth_to_dataset
from nimare.decode.continuous import CorrelationDecoder
from nimare.meta.cbma import mkda

# download neurosynth dataset
files = fetch_neurosynth(
    data_dir=decoder_data_dir,
    version='7',
    overwrite=False,
    source='abstract',
    vocab='terms',
)
# Note that the files are saved to a new folder within "decoder_data_dir" named "neurosynth".
pprint(files)
neurosynth_db = files[0]

# convert Neurosynth database tp Nimare dataset file
neurosynth_dset = convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db['coordinates'],
    metadata_file=neurosynth_db['metadata'],
    annotations_files=neurosynth_db['features'],
)
neurosynth_dset.save(os.path.join(decoder_data_dir, 'neurosynth_dataset.pkl.gz'))

# fit/train the decoder
decoder = CorrelationDecoder(
    frequency_threshold=0.001,
    meta_estimator=mkda.MKDAChi2,
    target_image='z_desc-association',
    n_cores = 8,
)

decoder.fit(neurosynth_dset)

# save the trained decoder for future use
decoder.save(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# load the decoder
decoder = CorrelationDecoder.load(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# decode the focal image
similarity_score_df = decoder.transform(target_file_path)

Version:

python = v3.12.3
NiMARE = v0.6.2

Environment (Docker, Singularity / Apptainer, custom installation):

NiMARE package was installed in a conda environment with

Hi @b03701209!

neurosynth website has an opinionated list of words it deemed useful (not data driven), here is the filtered list (the first column is the word, the second column is a 0 or 1 indicating whether the word is kept or not): https://raw.githubusercontent.com/neurosynth/neurosynth-web/refs/heads/master/data/assets/analysis_filter_list.txt

This should work (warning: untested):

# load package
import os
from pprint import pprint

import pandas as pd
from nimare.extract import download_abstracts, fetch_neuroquery, fetch_neurosynth
from nimare.io import convert_neurosynth_to_dataset
from nimare.decode.continuous import CorrelationDecoder
from nimare.meta.cbma import mkda

# download neurosynth dataset
files = fetch_neurosynth(
    data_dir=decoder_data_dir,
    version='7',
    overwrite=False,
    source='abstract',
    vocab='terms',
)
# Note that the files are saved to a new folder within "decoder_data_dir" named "neurosynth".
pprint(files)
neurosynth_db = files[0]

# convert Neurosynth database tp Nimare dataset file
neurosynth_dset = convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db['coordinates'],
    metadata_file=neurosynth_db['metadata'],
    annotations_files=neurosynth_db['features'],
)
neurosynth_dset.save(os.path.join(decoder_data_dir, 'neurosynth_dataset.pkl.gz'))

# Load the filtered feature list from neurosynth-web
feature_url = "https://raw.githubusercontent.com/neurosynth/neurosynth-web/refs/heads/master/data/assets/analysis_filter_list.txt"
features_df = pd.read_csv(feature_url, sep="\t")
# Keep only features marked with "keep" == 1
selected_features = features_df[features_df["keep"] == 1]["term"].tolist()
print(f"Selected {len(selected_features)} features for decoder training")

# fit/train the decoder with reduced feature set
decoder = CorrelationDecoder(
    frequency_threshold=0.001,
    features=selected_features,
    meta_estimator=mkda.MKDAChi2,
    target_image='z_desc-association',
    n_cores = 8,
)

decoder.fit(neurosynth_dset)

# save the trained decoder for future use
decoder.save(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# load the decoder
decoder = CorrelationDecoder.load(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# decode the focal image
similarity_score_df = decoder.transform(target_file_path)

Best,
James

Hello James,

Many thanks for the information!

I have tested with the opinionated list and now do get similar amount of terms. However, the similarity scores (and even the rank of the terms) I got remain inconsistent with the ones from the Neurosynth website.

Do you maybe have some ideas on the plausible reasons (e.g., different versions of training dataset)?

Best regards,
Ting

Hi Ting,

Some of the analytical choices are different.

The default kernel size for MKDAChi2 is 6mm for neurosynth, but is 10mm for NiMARE.
Neurosynth sets voxels to zero that aren’t supported by at least “1%” of the studies, NiMARE does not currently do that.
And there are a couple p-value calculation/rounding differences for results.

the kernel size/min_studies likely lead to the greatest differences between the two, but I wouldn’t say one approach is more “correct” than another.

Let me know if you have more questions! (and/or what your ultimate goal is to see what solution is more aligned with what you’re trying to accomplish.)
James

1 Like