How to use NiMARE for decoding ALE results?

shanshan_zhu · January 11, 2025, 3:04am

Hello everyone,

I am currently using NiMARE to decode my ALE results. For example, after completing the meta-analysis, I obtained activation results in the Frontal pole (x, y, z: -6, 58, 18). I created a spherical ROI with the following code:

from nilearn import datasets, plotting

from nilearn.input_data import NiftiSpheresMasker

from nilearn.maskers import _unmask_3d

import nibabel as nib

from nibabel import Nifti1Image

# Create spherical ROI

brain_mask = datasets.load_mni152_brain_mask()

_, A = NiftiSpheresMasker._apply_mask_and_get_affinity(

seeds=[(-6, 58, 18)],

niimg=None,

radius=10,

allow_overlap=False,

mask_img=brain_mask)

FPole_mask = _unmask_3d(

X=A.toarray().flatten(),

mask=brain_mask.get_fdata().astype(bool))

FPole_mask = Nifti1Image(FPole_mask, brain_mask.affine)

nib.save(FPole_mask, "FPole.nii.gz")

# Plot the result to make sure it makes sense

plotting.plot_roi("FPole.nii.gz")

plotting.show()

roi_img = nib.load("FPole.nii.gz")

Then, I used NiMARE to decode my image:

import nimare

# Fetch Neurosynth data (Note: This can take a while!)
databases = nimare.extract.fetch_neurosynth(data_dir='../data')[0]

# Convert to NiMARE dataset (Note: This can take a while!)
ds = nimare.io.convert_neurosynth_to_dataset(
    coordinates_file=databases['coordinates'],
    metadata_file=databases['metadata'],
    annotations_files=databases['features']
)

# Perform decoding
decoder = nimare.decode.discrete.ROIAssociationDecoder(roi_img)
decoder.fit(ds)
decoded_df = decoder.transform()
print(decoded_df.iloc[60:80, :].to_string())

Plotted the decoding result with the following code:

import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# Save decoding result
decoded_df.to_csv('FPole_decoded_output.csv', index=True)

# Generate word cloud
word_path = "FPole_decoded_output.csv"
data1 = pd.read_csv(word_path)
word_freq = dict(zip(data1['feature'], data1['r']))
wordcloud = WordCloud(width=800, height=400, max_font_size=100, background_color='white').generate_from_frequencies(word_freq)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

I have the following questions:

Is my decoding approach correct? If it is correct, are there other visualization methods I can use?
The results include all terms, but in my research, I am only interested in cognitive-related terms. I discovered that the Cognitive Atlas only includes cognitive vocabulary, but I don’t know how to replace the vocabulary in NiMARE. I also noticed that if I decrease the number of terms, the model’s fit worsens. For example, I don’t care about anatomical terms. How can I exclude these irrelevant terms from the results? How can this be achieved in Python?

Thank you very much for your help!

jdkent · January 13, 2025, 7:55pm

Hi @shanshan_zhu,

Is my decoding approach correct? If it is correct, are there other visualization methods I can use?

Depending on your research question. If you have a specific hypothesis about the frontal pole, then yes, this is an appropriate decoding approach. If you want to see which terms load on the output of the ALE meta-analysis, I would suggest using the entire ALE statistical map as input for decoding. How you visualize the ROI is fine. if you want another option, you can use the coordinate directly and generate a sphere on a brainmap using the plot_connectome function. just set the adjacency matrix to all zeros, like what we do here

The results include all terms, but in my research, I am only interested in cognitive-related terms. I discovered that the Cognitive Atlas only includes cognitive vocabulary, but I don’t know how to replace the vocabulary in NiMARE. I also noticed that if I decrease the number of terms, the model’s fit worsens. For example, I don’t care about anatomical terms. How can I exclude these irrelevant terms from the results? How can this be achieved in Python?

You can select all the cognitive atlas concepts and tasks like so:

import requests

# Cognitive Atlas API base URL
base_url = "https://www.cognitiveatlas.org/api/v-alpha"

# Endpoints for concepts, tasks, and disorders
concepts_endpoint = f"{base_url}/concept?format=json"
tasks_endpoint = f"{base_url}/task?format=json"
disorders_endpoint = f"{base_url}/disorder?format=json"


# Fetch concepts and tasks data
concepts_response = requests.get(concepts_endpoint)
tasks_response = requests.get(tasks_endpoint)
disorders_response = requests.get(disorders_endpoint)

# Extract names from the response data
concepts = concepts_response.json()
tasks = tasks_response.json()
disorders = disorders_response.json()

# Get the names of concepts and tasks
concept_names = [concept['name'] for concept in concepts]
task_names = [task['name'] for task in tasks]
disorder_names = [disorder['name'] for disorder in disorders]

cognitive_atlas_terms = concept_names + task_names + disorder_names

Then you can use those terms to filter the annotations in the neurosynth dataset

terms_to_keep = [term for term in ds.annotations.columns if term.split('__')[-1] in cognitive_atlas_terms]

ds.annotations = ds.annotations[terms_to_keep]

Then you can instantiate/fit/transform the decoder as you would.

The model fit will worsen as you reduce the number of terms/features (and the model fit will increase as you add more features). If we had a term for every word that appeared at least once in the corpus of all abstracts in the neurosynth dataset, then that would provide the best model fit, as you remove terms, the model fit will worsen, but if your goal is just to derive some related terms to a region of interest, then model fit is not as important.

Let me know if you have further questions!
James

P.S., I didn’t test all the code so apologies if there are syntax errors

shanshan_zhu · January 15, 2025, 6:11am

Thank you for your assistance—it was incredibly helpful! I ran the code you provided, and although I encountered a minor error, I was able to resolve it by modifying the code as follows:

terms_to_keep = ['id'] + [term for term in ds.annotations.columns if term.split('__')[-1] in cognitive_atlas_terms]
ds.annotations = ds.annotations[terms_to_keep]

With this adjustment, the output was a DataFrame containing “feature” and “r” columns. However, I’m a bit unsure about how to interpret the results. Regarding the “r” column, I assume it represents the Pearson correlation coefficient. Does this correlation indicate the relationship between my input ROI and the meta-analytic functional map associated with the corresponding term?

Additionally, would it be feasible to extract each word (such as the “association” part in "terms_abstract_tfidf__association") along with its corresponding “r” value to create a word cloud? Is this a valid approach? Lastly, as you suggested, I’m currently working on using the entire ALE statistical map as input for decoding. Just to clarify, should this map remain unthresholded?

Once again, thank you for your guidance—I truly appreciate it!

shanshan_zhu · February 3, 2025, 7:34am

Hi @jdkent ,

Thank you again for your previous assistance. However, I’m encountering an error when attempting to use the entire ALE statistical map as input for decoding. Here’s the error message I’m receiving:

And I don’t encounter this issue when using the decode.discrete.ROIAssociationDecoder() . Why is it that this error only occurs when using nimare.continuous.CorrelationDecoder() ?

The code is mostly the same as before, with the only difference being that I am now extracting terms from the Neurosynth database related to Cognitive Atlas and using the continuous decoder.

terms_to_keep = ['id'] + ['study_id'] + ['contrast_id'] + [term for term in ds.annotations.columns if term.split('__')[-1] in cognitive_atlas_terms]

input = "../output/control_all_z.nii.gz"
decoder = nimare.decode.continuous.CorrelationDecoder(
    input,
    frequency_threshold=0.001
)

Thank you again for your support!

Best regards,
Shanshan

jdkent · February 21, 2025, 5:53pm

With this adjustment, the output was a DataFrame containing “feature” and “r” columns. However, I’m a bit unsure about how to interpret the results. Regarding the “r” column, I assume it represents the Pearson correlation coefficient. Does this correlation indicate the relationship between my input ROI and the meta-analytic functional map associated with the corresponding term?

Yes, you are correct, it represents the Pearson correlation coefficient between the average “modeled activation” value for that ROI with the corresponding term (see relevant documentation). So each study will have a modeled activation value and a tfidf value, and the correlation is determined across the ~14,000 studies.

Additionally, would it be feasible to extract each word (such as the “association” part in "terms_abstract_tfidf__association" ) along with its corresponding “r” value to create a word cloud?

This sounds reasonable to me, words whose correlations closer to 1 would be larger, and those closer to 0 would be smaller.

Lastly, as you suggested, I’m currently working on using the entire ALE statistical map as input for decoding. Just to clarify, should this map remain unthresholded?

Yes, correct, the map would not be thresholded.