I am trying to reproduce the term-based cognitive decoder on neurosynth /neurovault with the nimare package in python. I followed this example to fetch the Neurosynth dataset and this example to apply the correlation decoder (please refer to my script below).
Although I am able to calculate the similarity scores (i.e., correlation coefficients) between my own image and the term masks in the dataset, I noticed that the values are inconsistent with the outputs from the Neurosynth term-based decoder. Besides, the NiMARE decoder outputs more than 3000 terms, while there are only 1298 terms on Neurosynth. I tried to search on the forum, but I could only find discussions regarding the topic-based decoding.
I therefore wonder where the inconsistency might come from. I would really appreciate all the help!
Command used (and if a helper script was used, a link to the helper script or the command generated):
# load package
import os
from pprint import pprint
from nimare.extract import download_abstracts, fetch_neuroquery, fetch_neurosynth
from nimare.io import convert_neurosynth_to_dataset
from nimare.decode.continuous import CorrelationDecoder
from nimare.meta.cbma import mkda
# download neurosynth dataset
files = fetch_neurosynth(
data_dir=decoder_data_dir,
version='7',
overwrite=False,
source='abstract',
vocab='terms',
)
# Note that the files are saved to a new folder within "decoder_data_dir" named "neurosynth".
pprint(files)
neurosynth_db = files[0]
# convert Neurosynth database tp Nimare dataset file
neurosynth_dset = convert_neurosynth_to_dataset(
coordinates_file=neurosynth_db['coordinates'],
metadata_file=neurosynth_db['metadata'],
annotations_files=neurosynth_db['features'],
)
neurosynth_dset.save(os.path.join(decoder_data_dir, 'neurosynth_dataset.pkl.gz'))
# fit/train the decoder
decoder = CorrelationDecoder(
frequency_threshold=0.001,
meta_estimator=mkda.MKDAChi2,
target_image='z_desc-association',
n_cores = 8,
)
decoder.fit(neurosynth_dset)
# save the trained decoder for future use
decoder.save(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))
# load the decoder
decoder = CorrelationDecoder.load(os.path.join(decoder_data_dir, 'neurosynth_dataset_decoder.pkl'))
# decode the focal image
similarity_score_df = decoder.transform(target_file_path)
I have tested with the opinionated list and now do get similar amount of terms. However, the similarity scores (and even the rank of the terms) I got remain inconsistent with the ones from the Neurosynth website.
Do you maybe have some ideas on the plausible reasons (e.g., different versions of training dataset)?
The default kernel size for MKDAChi2 is 6mm for neurosynth, but is 10mm for NiMARE.
Neurosynth sets voxels to zero that aren’t supported by at least “1%” of the studies, NiMARE does not currently do that.
And there are a couple p-value calculation/rounding differences for results.
the kernel size/min_studies likely lead to the greatest differences between the two, but I wouldn’t say one approach is more “correct” than another.
Let me know if you have more questions! (and/or what your ultimate goal is to see what solution is more aligned with what you’re trying to accomplish.)
James