Using neurosynth compose and annotations

I’ve set up a manually created study on neurosynth compose. Because of the nature of the question (glucose metabolism in neurodegeneration), it is possible for multiple populations of interest to be included in a single study (and often the same table). As a result, I’ve left the “sample size” metadata alone and instead of have sample sizes for the populations of interest as annotations for the relevant analyses within a study.

I’m trying to run the actual analysis locally using NiMARE. I can successfully pull in the data. Is there a way to use the sample sizes that I have in the annotations to update the “sample_size” metadata for each analysis that is included?

Thanks

Tagging @jdkent and @JulioAPeraza at @tsalo 's recommendation

2 Likes

Hi Adam!

Great questions.

You can follow this example for creating a NiMARE dataset:
https://nimare.readthedocs.io/en/stable/auto_examples/01_datasets/05_plot_nimads.html#sphx-glr-auto-examples-01-datasets-05-plot-nimads-py

(EDIT: there is a bug reading in annotations, I’m pushing a fix and will publish a release of NiMARE)

For translating the annotations to metadata, could you tell me more about the analysis plan on how you want to incorporate sample size into the analysis? Do you want to use ALE to adjust kernel size? For the annotations, are they saved in separate columns, like so (population1_sample_size; population2_sample_size)?

Thank you!
James

Hi James,

I was indeed intending to do ALE and to modulate the kernel size by the sample size. And yes they are saved in separate columns as “patient_samplesize” and “control_samplesize”.

I know I can extract them from the annotations. I can even compute the total sample size per analysis. I just don’t know how to migrate those sample size then to somewhere that the ALE kernel estimator can use. I know I could also set a fixed sample size but given very disparate sample sizes across studies (ranging from 12 to 200+), I’m not sure how to set a sane fixed sample.

Thanks for the clarifications!

It sounds like the only piece you want to know about is the metadata attribute of a nimare dataset.

If I have a nimare dataset assigned to the variable ds, then I would be able to access the sample_sizes information like so:

ds.metadata["sample_sizes"]

Once the values are in place, When you call ALE(), make sure that you DO NOT fill out either sample_size or fwhm as arguments to the kernel, and the inserted sample sizes from the metadata dataframe will be used automatically.

However, matching the contrast to the appropriate sample_size would be a bit annoying in a NiMARE dataset. I’m creating a new release that should make it easier to translate a list of sample sizes to a nimare dataset.

the process will be using the NIMADs dataset and adding the sample size information there before translating it to a NiMARE dataset:

for study in studyset.study:
    for analysis in study.analyses:
        analysis_annotation = list(analysis.annotations.values())[0]
        sample_sizes = [
            analysis_annotation['patient_samplesize'],
            analysis_annotation['control_samplesize'],
        ]
        analysis.metadata['sample_sizes'] = sample_sizes

That will be in release 0.4.2 (will be released tomorrow), you can keep track here:

Great. Looking forward to it.

Thanks for helping with this.

Here is the release! nimare==0.4.2

https://pypi.org/project/nimare/0.4.2/

Let me know if this works for your use case.

Hi James,

I can pull in the data and add the annotations + sample_sizes to the metadata as expected. However, if I slice the studyset (e.g., to focus on hypermetabolism or hypometabolism) and convert to a nimare dataset, things aren’t retained:

for study in studyset.studies:
    for analysis in study.analyses:
        analysis_annotation = list(analysis.annotations.values())[0]
        if analysis_annotation['included']:
            analysis_ids.append(analysis.id)
            sample_sizes = [
                analysis_annotation['patient_samplesize'],
                analysis_annotation['control_samplesize'],
            ]
        else:
            sample_sizes = [0,0] # Analyses of no current interest that weren't annotated and are to be excluded
        analysis.metadata['sample_sizes'] = sample_sizes
nimare_dset = studyset.to_dataset()
nimare_dset.metadata.head()

Output:

0 	A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	PD_+_Dementia_vs_control 	A Comparison of Cerebral Glucose Metabolism in... 	European Journal of Neurology 	A Comparison of Cerebral Glucose Metabolism in... 	[13, 15]
...

But running the following after adding the sample sizes doesn’t retain that “sample_size” column:

target_studyset = studyset.slice(analyses = analysis_ids)
target_dset = target_studyset.to_dataset()
target_dset.metadata.head()

Gives:

A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	PD_+_Dementia_vs_control 	A Comparison of Cerebral Glucose Metabolism in... 	European Journal of Neurology 	A Comparison of Cerebral Glucose Metabolism in...

Additionally, slicing the dataset removes the annotations in general:

Whole studyset

nimare_dset.annotations.head()

 	id 	study_id 	contrast_id 	AD 	PD 	ALS 	included 	patient_meanage 	control_samplesize 	patient_samplesize 	patient_hypometabolism 	patient_hypermetabolism 	control_meanage
0 	A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	PD_+_Dementia_vs_control 	False 	True 	False 	True 	73.4 	15 	13 	True 	None 	65.3
1 	A_Phase_I_Trial_of_Deep_Brain_Stimulation_of_M... 	A_Phase_I_Trial_of_Deep_Brain_Stimulation_of_M... 	AD_vs_control 	True 	False 	False 	True 	60.7 	6 	6 	True 	None 	68.5
2 	Abnormal_Regional_Brain_Function_in_Parkinson’... 	Abnormal_Regional_Brain_Function_in_Parkinson’... 	Table_1 	False 	True 	False 	True 	57.1 	24 	24 	None 	True 	57
3 	Activation_in_the_premotor_cortex_during_menta... 	Activation_in_the_premotor_cortex_during_menta... 	Table_2 	True 	False 	False 	True 	55.1 	6 	10 	True 	None 	56.8
4 	Activation_in_the_premotor_cortex_during_menta... 	Activation_in_the_premotor_cortex_during_menta... 	Table_3 	False 	False 	False 	False 	None 	None 	None 	None 	None 	None

Sliced studyset:

target_dset.annotations.head()

id 	study_id 	contrast_id
0 	A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	A_Comparison_of_Cerebral_Glucose_Metabolism_in... 	PD_+_Dementia_vs_control
1 	A_Phase_I_Trial_of_Deep_Brain_Stimulation_of_M... 	A_Phase_I_Trial_of_Deep_Brain_Stimulation_of_M... 	AD_vs_control
2 	Abnormal_Regional_Brain_Function_in_Parkinson’... 	Abnormal_Regional_Brain_Function_in_Parkinson’... 	Table_1
3 	Activation_in_the_premotor_cortex_during_menta... 	Activation_in_the_premotor_cortex_during_menta... 	Table_2
4 	Alternative_Normalization_Methods_Demonstrate_... 	Alternative_Normalization_Methods_Demonstrate_... 	PD_<_control

Solved… maybe a bit hacky but it grabs the studies I expect:

nimads_studyset = download_file("https://neurostore.org/api/studysets/<id>?nested=true")
nimads_annotation = download_file("https://neurostore.org/api/annotations/<annotation_id>")

studyset = Studyset(nimads_studyset, annotations = nimads_annotation)
annotation = Annotation(nimads_annotation, studyset)

analysis_ids = []
missing_keys = []
for n in annotation.notes:
    try:
        if n.note['patient_hypermetabolism'] & n.note['included']:
            analysis_ids.append(n.analysis.id)
    except:
        missing_keys.append(n.analysis.id)

print(len(analysis_ids))

# Filter notes to only include those where the analysis is in analysis_ids
annotation_dict = annotation.to_dict()
filtered_notes = [
    note for note in annotation_dict['notes'] if note['analysis'] in analysis_ids
]

# Create a new dictionary with the filtered notes
filtered_annotation = annotation_dict.copy()
filtered_annotation['notes'] = filtered_notes

#print(filtered_annotation)

studyset_dict = studyset.to_dict()
studyset_dict_filtered = studyset_dict.copy()

# Filter studies and analyses based on the analysis_ids
for study in studyset_dict_filtered['studies']:
    # Filter analyses for this study
    study['analyses'] = [analysis for analysis in study['analyses'] if analysis['id'] in analysis_ids]

# Create subset studyset to include only those studies of interest
target_studyset = Studyset(studyset_dict_filtered, annotations = filtered_annotation)

# Get sample sizes from annotations and add them to the metadata
for study in target_studyset.studies:
    for analysis in study.analyses:
        analysis_annotation = list(analysis.annotations.values())[0]
        if analysis_annotation['included']:
            analysis_ids.append(analysis.id)
            sample_sizes = [
                int(analysis_annotation['patient_samplesize']),
                int(analysis_annotation['control_samplesize']),
            ]
        else:
            sample_sizes = [-1,-1]
        print(sample_sizes)
        analysis.metadata['sample_sizes'] = sample_sizes
nimare_dset = target_studyset.to_dataset()

I have another release candidate 0.5.0rc1 that should solve the slicing issue you experienced

let me know if that version fixes the issue you are experiencing.