Reproduce Neurosynth Term-based Decoder with NiMARE

Summary of what happened:

Hello all,

I am trying to reproduce the term-based cognitive decoder on neurosynth /neurovault with the nimare package in python. I followed this example to fetch the Neurosynth dataset and this example to apply the correlation decoder (please refer to my script below).

Although I am able to calculate the similarity scores (i.e., correlation coefficients) between my own image and the term masks in the dataset, I noticed that the values are inconsistent with the outputs from the Neurosynth term-based decoder. Besides, the NiMARE decoder outputs more than 3000 terms, while there are only 1298 terms on Neurosynth. I tried to search on the forum, but I could only find discussions regarding the topic-based decoding.

I therefore wonder where the inconsistency might come from. I would really appreciate all the help!

Command used (and if a helper script was used, a link to the helper script or the command generated):

# load package
import os
from pprint import pprint

from nimare.extract import download_abstracts, fetch_neuroquery, fetch_neurosynth
from nimare.io import convert_neurosynth_to_dataset
from nimare.decode.continuous import CorrelationDecoder
from nimare.meta.cbma import mkda

# download neurosynth dataset
files = fetch_neurosynth(
    data_dir=decoder_data_dir,
    version='7',
    overwrite=False,
    source='abstract',
    vocab='terms',
)
# Note that the files are saved to a new folder within "decoder_data_dir" named "neurosynth".
pprint(files)
neurosynth_db = files[0]

# convert Neurosynth database tp Nimare dataset file
neurosynth_dset = convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db['coordinates'],
    metadata_file=neurosynth_db['metadata'],
    annotations_files=neurosynth_db['features'],
)
neurosynth_dset.save(os.path.join(decoder_data_dir, 'neurosynth_dataset.pkl.gz'))

# fit/train the decoder
decoder = CorrelationDecoder(
    frequency_threshold=0.001,
    meta_estimator=mkda.MKDAChi2,
    target_image='z_desc-association',
    n_cores = 8,
)

decoder.fit(neurosynth_dset)

# save the trained decoder for future use
decoder.save(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# load the decoder
decoder = CorrelationDecoder.load(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# decode the focal image
similarity_score_df = decoder.transform(target_file_path)

Version:

python = v3.12.3
NiMARE = v0.6.2

Environment (Docker, Singularity / Apptainer, custom installation):

NiMARE package was installed in a conda environment with

Hi @b03701209!

neurosynth website has an opinionated list of words it deemed useful (not data driven), here is the filtered list (the first column is the word, the second column is a 0 or 1 indicating whether the word is kept or not): https://raw.githubusercontent.com/neurosynth/neurosynth-web/refs/heads/master/data/assets/analysis_filter_list.txt

This should work (warning: untested):

# load package
import os
from pprint import pprint

import pandas as pd
from nimare.extract import download_abstracts, fetch_neuroquery, fetch_neurosynth
from nimare.io import convert_neurosynth_to_dataset
from nimare.decode.continuous import CorrelationDecoder
from nimare.meta.cbma import mkda

# download neurosynth dataset
files = fetch_neurosynth(
    data_dir=decoder_data_dir,
    version='7',
    overwrite=False,
    source='abstract',
    vocab='terms',
)
# Note that the files are saved to a new folder within "decoder_data_dir" named "neurosynth".
pprint(files)
neurosynth_db = files[0]

# convert Neurosynth database tp Nimare dataset file
neurosynth_dset = convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db['coordinates'],
    metadata_file=neurosynth_db['metadata'],
    annotations_files=neurosynth_db['features'],
)
neurosynth_dset.save(os.path.join(decoder_data_dir, 'neurosynth_dataset.pkl.gz'))

# Load the filtered feature list from neurosynth-web
feature_url = "https://raw.githubusercontent.com/neurosynth/neurosynth-web/refs/heads/master/data/assets/analysis_filter_list.txt"
features_df = pd.read_csv(feature_url, sep="\t")
# Keep only features marked with "keep" == 1
selected_features = features_df[features_df["keep"] == 1]["term"].tolist()
print(f"Selected {len(selected_features)} features for decoder training")

# fit/train the decoder with reduced feature set
decoder = CorrelationDecoder(
    frequency_threshold=0.001,
    features=selected_features,
    meta_estimator=mkda.MKDAChi2,
    target_image='z_desc-association',
    n_cores = 8,
)

decoder.fit(neurosynth_dset)

# save the trained decoder for future use
decoder.save(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# load the decoder
decoder = CorrelationDecoder.load(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))

# decode the focal image
similarity_score_df = decoder.transform(target_file_path)

Best,
James