How to replicate Neurosynth meta-analysis in NiMARE?

Xinhui_Li · June 1, 2021, 8:36pm

Hi,

I am trying to use NiMARE to replicate the analysis in this notebook (gradient_analysis/05_metaanalysis_neurosynth.ipynb at master · NeuroanatomyAndConnectivity/gradient_analysis · GitHub). I was able to convert Neurosynth “database.txt” and “features.txt” to a NiMARE Dataset object and decoded the ROIs with this dataset. However, I am not sure how to add information from the topics file “v3-topics-50.txt” (the second last code block in the notebook above) into NiMARE Dataset object. Is there a way to decode ROIs with topics?

Thanks,
Xinhui

tsalo · June 1, 2021, 9:05pm

This will require some custom code:

import nimare
import pandas as pd

# First, load the dataset as `dset`

# Read in the topic file, rename the ID column, and
# prepend a prefix to the topic names
df = pd.read_table("v3-topics-50.txt")
topic_names = [c for c in df.columns if c.startswith("topic")]
topics_renamed = {t: "Neurosynth_LDA__" + t for t in topic_names}
topics_renamed["id"] = "study_id"
df = df.rename(columns=topics_renamed)

# Change the data type for the study_id column so it can be merged
df['study_id'] = df['study_id'].astype(str)

# Merge the topic dataframe into the annotations dataframe
new_annotations = dset.annotations.merge(
    df, 
    how="inner", 
    left_on="study_id", 
    right_on="study_id"
)
dset.annotations = new_annotations

# The topic file only contains ~10k studies,
# so we must reduce the dataset to match
new_ids = new_annotations["id"].tolist()
dset = dset.slice(new_ids)

Then you can run the decoder using the “Neurosynth_LDA” feature group instead of the “Neurosynth_TFIDF” one.

I hope that helps!

Xinhui_Li · June 1, 2021, 9:47pm

Hi Taylor,

I appreciate your quick response! Your code block solved my problem.You’re amazing!

Best,
Xinhui

Peng_Ren · July 17, 2021, 1:40pm

Dear Taylor,

Just ask one more question, I’ve already asked how to decoding continuous maps with labels in another post (Examples for decoding unthresholded continuous map), well, I didn’t figure out how to get images of these topics here for continuous map correlation, should I just use the same code with new dset? could you give some suggestions?

Best wishes,
Peng

tsalo · July 17, 2021, 3:24pm

Hi Peng,

I’m not sure I understand what your blocker is. Is the issue finding the topic features in order to run meta-analyses? If so, they’re located here. For the most recent version of the database, you would use v5-topics.tar.gz, and within that you can select the topic model with your preferred number of topics (50, 100, 200, or 400).

Once you’ve downloaded the topic annotations of your choice, you can create a NiMARE Dataset with them (or add them to an existing one with the code from earlier in this thread). You can definitely run a discrete (ROI-based) decoder on topic annotations. Just treat them like Neurosynth’s standard TF-IDF annotations.

Does that help?

Best,
Taylor

Peng_Ren · July 18, 2021, 3:22pm

Dear Taylor,

Sorry for the confusion, I am interested in decoding unthresholded maps rather than discrete maps, let me give it this way:

 #### label based decoding  #####
# get features used for decoding
labels = []

out_dir = "/Decoding_Meta/decoding/neurosynth/"
# Initialize the Estimator
# You could use `low_memory=True` here if you want, but that will slow things down.
meta_estimator = nimare.meta.cbma.mkda.MKDAChi2()

# Pre-generate MA maps to speed things up
kernel_transformer = meta_estimator.kernel_transformer
dataset = kernel_transformer.transform(dataset, return_type="dataset")
dataset.save("neurosynth_with_ma.pkl.gz")

# Get features
labels = dataset.get_labels()
for label in labels:
    print("Processing {}".format(label), flush=True)
    label_positive_ids = dataset.get_studies_by_label(label, 0.001)
    label_negative_ids = list(set(dataset.ids) - set(label_positive_ids))
    # Require some minimum number of studies in each sample
    if (len(label_positive_ids) == 0) or (len(label_negative_ids) == 0):
        print("\tSkipping {}".format(label), flush=True)
        continue
    label_positive_dset = dataset.slice(label_positive_ids)
    label_negative_dset = dataset.slice(label_positive_ids)
    meta_result = meta_estimator.fit(label_positive_dset, label_negative_dset)
    meta_result.save_maps(output_dir=out_dir, prefix=label)

this is the code you suggested last time for getting Neurosynth_TFIDF nifti files for the labels of interest, which are later used for self decoding of unthresholded continuous map. there are two key parameters here: dataset and labels.

My question is whether it will be different for getting topic-specific nifti files (which may be Neurosynth_LDA files) for topic-decoding of continuous maps? cause I don’t which labels to give in the code here, As you mentioned in this post, the topics are added as additional features and a new dataset is outputted, should I input this new dataset and the Neurosynth_LDA labels?

Best wishes,
Peng

tsalo · July 19, 2021, 2:55pm

Hi Peng,

I think I understand now. There is no special decoding method for LDA topic model data (unlike GCLDA topic models), so you can use the same approach for decoding/correlating the Neurosynth_LDA labels.

The only difference you probably want to include is to use a threshold of 0.05, instead of the default 0.001, in get_studies_by_label, since that is the default for the topic model meta-analyses on the Neurosynth website.

Once you have the topic model features added to the Dataset, you can select features that start with Neurosynth_LDA. Unfortunately, get_studies_by_label doesn’t have a feature_group parameter (I probably should add that), so you will need to filter the full set of labels. The following code should work:

all_features = Dataset.get_labels()
lda_features = [f for f in all_features if f.startswith("Neurosynth_LDA__")]

Exactly!

Best,
Taylor

Peng_Ren · July 20, 2021, 3:29pm

Dear Taylor,

Appreciate your kind and professional help, It’s much clear now!

Best wishes,
Peng