Neurosynth topic-based decoding via NiMARE

nickcorriveaul · October 12, 2021, 10:34pm

Hi again Taylor and/or others,

Our group has used the neurosynth decoding script in the past to conduct a topic-based decoding. We generally use the 50 topic file (neurosynth-web/v4-topics-50.txt at master · neurosynth/neurosynth-web · GitHub). However, I am running into issues with this code and after a little search I saw a reply from Tal Yarkoni saying this package is no longer maintained (add_features() returns error · Issue #96 · neurosynth/neurosynth · GitHub).

Do NiMARE allows to reproduce this topic-based analysis?

Thanks!

tsalo · October 12, 2021, 11:42pm

Have you tried downloading the topic file using nimare.extract.fetch_neurosynth? Specifically, you should be able to use vocab="LDA50" and version="7" to download the most recent version of Neurosynth with the 50-topic model included as the features.

In order to faithfully reproduce the original meta-analyses, you would need to use a threshold of 0.05 instead of the default 0.001 in Dataset.get_studies_by_label(). I found that out for How to replicate Neurosynth meta-analysis in NiMARE? - #7 by tsalo.

nickcorriveaul · October 13, 2021, 6:57pm

Hi Taylor,

Thanks for this info. I have successfully downloaded the v7 50 topics LDA file, but it only contains 50 lines with the topics, whereas the older version we used contains the “id” header in addition to values associated with the topics (https://raw.githubusercontent.com/neurosynth/neurosynth-web/master/data/topics/analyses/v4-topics-50.txt). The code you are referring from this other publication hence works with the v4 file, but not the v7. Is there a v7 file I missed which would be similarly organized?

Using the code you provided in this other publication, I was able to merge the 50 v4 topics into the neurosynth dataset. However I am not exactly sure what next steps should be? I am sorry in advance, I am not a seasoned coder, especially not with python so it is a bit difficult for me to follow. Here is my code, in case it would help:

Creation of the neurosynth dataset, as indicated on the NiMARE website:

‘’'out_dir = os.path.abspath("/Users/m246120/Desktop/dAD_BPR/Decoding/neurosynth/")
os.makedirs(out_dir, exist_ok=True)

files = nimare.extract.fetch_neurosynth(
path=out_dir,
version=“7”,
overwrite=False,
source=“abstract”,
vocab=“terms”,
)
pprint(files)
neurosynth_db = files[0]

neurosynth_dset = nimare.io.convert_neurosynth_to_dataset(
coordinates_file=neurosynth_db[“coordinates”],
metadata_file=neurosynth_db[“metadata”],
annotations_files=neurosynth_db[“features”],
)
neurosynth_dset.save(os.path.join(out_dir, “neurosynth_dataset.pkl.gz”))
print(neurosynth_dset)

neurosynth_dset = nimare.extract.download_abstracts(neurosynth_dset, “corriveau-lecavalier.nick@mayo.edu”)
neurosynth_dset.save(os.path.join(out_dir, “neurosynth_dataset_with_abstracts.pkl.gz”))’’’

“”"###Reproduction of the 50LDA neurosynth library

First, load the dataset as `dset`

with gzip.open("/Users/m246120/Desktop/dAD_BPR/Decoding/neurosynth/neurosynth_dataset_with_abstracts.pkl.gz") as dset:
dset = pickle.load(dset)"""

“”# Read in the topic file, rename the ID column, and

prepend a prefix to the topic names

df = pd.read_table("/Users/m246120/Desktop/dAD_BPR/Decoding/neurosynth/data-neurosynth_version-4_vocab-LDA50_vocabulary.txt")
topic_names = [c for c in df.columns if c.startswith(“topic”)]
topics_renamed = {t: “Neurosynth_LDA__” + t for t in topic_names}
topics_renamed[“id”] = “study_id”
df = df.rename(columns=topics_renamed)"""

“”"# Change the data type for the study_id column so it can be merged
df[‘study_id’] = df[‘study_id’].astype(str)

Merge the topic dataframe into the annotations dataframe

new_annotations = dset.annotations.merge(
df,
how=“inner”,
left_on=“study_id”,
right_on=“study_id”
)
dset.annotations = new_annotations"""

“”"# The topic file only contains ~10k studies,

so we must reduce the dataset to match

new_ids = new_annotations[“id”].tolist()
dset = dset.slice(new_ids)"""

What would be next? I am not sure what you mean with the dataset.get_studies_by_label() function, as I have never included that in my previous models. Do I need to run a new model, then the decoder? Would decoding be similar as for the gclda model? Such as:

“”"# Run the decoder
decoded_df, _ = decode.continuous.gclda_decode_map(model, img_eb1)
decoded_df.sort_values(by=“Weight”, ascending=True).head(50)""

Thanks for your help again.

tsalo · October 13, 2021, 7:31pm

You don’t need all of those steps. You can just fetch the LDA50 features directly and feed them into the conversion function, just like the default TFIDF weights.

Yes, the format of Neurosynth and NeuroQuery files has been changed in order to (1) minimize the space used by the database and (2) use the same convention in both. However, NiMARE can work with these new files just fine.

Take a look at this:

import os
from pprint import pprint

import nimare

out_dir = os.path.abspath("/Users/m246120/Desktop/dAD_BPR/Decoding/")
os.makedirs(out_dir, exist_ok=True)

# Fetch Neurosynth with *just* the LDA50 features
files = nimare.extract.fetch_neurosynth(
    data_dir=out_dir,  # version 0.0.10 switched to data directory
    version="7",
    overwrite=False,
    source="abstract",
    vocab="LDA50",  # Note the difference here
)
neurosynth_db = files[0]
pprint(neurosynth_db)
# Note the "keys" file. That has the top 30 words for each topic.
# It *doesn't* go in the Dataset at all though.

# Get the Dataset object
neurosynth_dset = nimare.io.convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db["coordinates"],
    metadata_file=neurosynth_db["metadata"],
    annotations_files=neurosynth_db["features"],
)

From there you can use whichever decoder best fits your analysis. No need to run a model.
The GCLDA method (gclda_decode_map) won’t work, since it relies on probability distributions that are available in a GCLDA model, but not an LDA one, but the others would.

If you want to decode an unthresholded map, you can use the CorrelationDecoder, but you should be aware that it’s currently very slow, and it’s generally easier to just loop through the features and run the meta-analyses with your own script, as discussed in Issue: fitting nimare.decode.continuous.CorrelationDecoder to Neurosynth dataset. You can find the template for such a script in that topic.

The CorrelationDecoder has a parameter, frequency_threshold, that determines how studies are divided into being “about that feature” and “not about that feature”. When you run a meta-analysis on a larger Dataset like Neurosynth, you are going to want to separate the studies into those that are “positive” (about) the feature and those that are negative. For Neurosynth’s standard TFIDF values, the default threshold to do this is 0.001, based on the original Neurosynth code. However, for the LDA topic model weights, the default in Neurosynth’s code is 0.05 (i.e., frequency_threshold=0.05).

nickcorriveaul · October 13, 2021, 8:14pm

I cannot thank you enough, Taylor. The code is presently running, fingers crossed it works.

fhopp · May 2, 2022, 11:01am

Dear Taylor,

thanks for all the work and responses on this topic.

I am trying to piece together a few of the here related threads to perform meta-analytic decoding of various second-lv statistical maps. Particularly, I want to use the CorrelationDecoder, feed it an unthresholded image, and see how well this image correlates with each of the LDA50 topics of neurosynth.

I am trying to map out the steps below, perhaps you can help me out where I fail:

Download LDA50 as described above:

import nimare

out_dir = os.path.abspath("/home/fhopp/spm/analysis/vignettes/glm/spm/neurosynth/lda50")
os.makedirs(out_dir, exist_ok=True)

# Fetch Neurosynth with *just* the LDA50 features
files = nimare.extract.fetch_neurosynth(
    data_dir=out_dir,  # version 0.0.10 switched to data directory
    version="7",
    overwrite=False,
    source="abstract",
    vocab="LDA50",  # Note the difference here
)
neurosynth_db = files[0]
pprint(neurosynth_db)
# Note the "keys" file. That has the top 30 words for each topic.
# It *doesn't* go in the Dataset at all though.

# Get the Dataset object
neurosynth_dset = nimare.io.convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db["coordinates"],
    metadata_file=neurosynth_db["metadata"],
    annotations_files=neurosynth_db["features"],
)

What needs to happen now? How can I get the similarity for each of the 50LDA topics based on a single untresholded image?

Thanks much!
F

tsalo · May 2, 2022, 1:21pm

You can try using the CorrelationDecoder, although we have had memory issues with it. If that doesn’t work for you, then you can adapt this code snippet, which will generate the topic-wise meta-analytic maps. From there, you can load the generated files and correlate them with your input map using your nilearn masker and numpy.

from nimare import dataset, meta

dset = dataset.Dataset.load("neurosynth_dataset.pkl.gz")

out_dir = "."

# Initialize the Estimator
meta_estimator = meta.cbma.mkda.MKDAChi2()

# Get features
labels = dset.get_labels()
for label in labels:
    print(f"Processing {label}", flush=True)
    label_positive_ids = dset.get_studies_by_label(label, 0.05)
    label_negative_ids = list(set(dset.ids) - set(label_positive_ids))
    # Require some minimum number of studies in each sample
    if (len(label_positive_ids) == 0) or (len(label_negative_ids) == 0):
        print(f"\tSkipping {label}", flush=True)
        continue

    label_positive_dset = dset.slice(label_positive_ids)
    label_negative_dset = dset.slice(label_negative_ids)
    meta_result = meta_estimator.fit(label_positive_dset, label_negative_dset)
    meta_result.save_maps(output_dir=out_dir, prefix=label)

fhopp · May 2, 2022, 4:07pm

Thanks for the fast response!

I managed to download a map for an example topic, and just to be sure what I am doing is correct, here is what I do to get the correlation between that meta-analysis map and the unthresholded t-map:

from nimare.stats import pearson

target_img = neurosynth_dset.masker.transform("/home/fhopp/spm/analysis/vignettes/glm/spm/second_lv/_cont_id_0018/level2conestimate/spmT_0001.nii")

meta_topic_img = neurosynth_dset.masker.transform("/home/fhopp/spm/analysis/vignettes/glm/spm/neurosynth/lda50/meta_results/LDA50_abstract_weight__8_mpfc_social_medial_z_desc-specificity.nii.gz")

corrs = pearson(target_img, meta_topic_img)

The correlation is 0.38. The task is a moral judgment paradigm, so the correlation with topic “social” makes – at face value – sense. Downloading the topic map takes a bit time, but correlation is pretty fast.
You mentioned nilearn masker and numpy, and since I did not use these, I just wanted to confirm the above works as well.

Thanks much!

tsalo · May 2, 2022, 4:29pm

Yes, that should work.

Neurosynth topic-based decoding via NiMARE

First, load the dataset as dset

prepend a prefix to the topic names

Merge the topic dataframe into the annotations dataframe

so we must reduce the dataset to match

First, load the dataset as `dset`