I’ve set up a manually created study on neurosynth compose. Because of the nature of the question (glucose metabolism in neurodegeneration), it is possible for multiple populations of interest to be included in a single study (and often the same table). As a result, I’ve left the “sample size” metadata alone and instead of have sample sizes for the populations of interest as annotations for the relevant analyses within a study.
I’m trying to run the actual analysis locally using NiMARE. I can successfully pull in the data. Is there a way to use the sample sizes that I have in the annotations to update the “sample_size” metadata for each analysis that is included?
(EDIT: there is a bug reading in annotations, I’m pushing a fix and will publish a release of NiMARE)
For translating the annotations to metadata, could you tell me more about the analysis plan on how you want to incorporate sample size into the analysis? Do you want to use ALE to adjust kernel size? For the annotations, are they saved in separate columns, like so (population1_sample_size; population2_sample_size)?
I was indeed intending to do ALE and to modulate the kernel size by the sample size. And yes they are saved in separate columns as “patient_samplesize” and “control_samplesize”.
I know I can extract them from the annotations. I can even compute the total sample size per analysis. I just don’t know how to migrate those sample size then to somewhere that the ALE kernel estimator can use. I know I could also set a fixed sample size but given very disparate sample sizes across studies (ranging from 12 to 200+), I’m not sure how to set a sane fixed sample.
It sounds like the only piece you want to know about is the metadata attribute of a nimare dataset.
If I have a nimare dataset assigned to the variable ds, then I would be able to access the sample_sizes information like so:
ds.metadata["sample_sizes"]
Once the values are in place, When you call ALE(), make sure that you DO NOT fill out either sample_size or fwhm as arguments to the kernel, and the inserted sample sizes from the metadata dataframe will be used automatically.
However, matching the contrast to the appropriate sample_size would be a bit annoying in a NiMARE dataset. I’m creating a new release that should make it easier to translate a list of sample sizes to a nimare dataset.
the process will be using the NIMADs dataset and adding the sample size information there before translating it to a NiMARE dataset:
for study in studyset.study:
for analysis in study.analyses:
analysis_annotation = list(analysis.annotations.values())[0]
sample_sizes = [
analysis_annotation['patient_samplesize'],
analysis_annotation['control_samplesize'],
]
analysis.metadata['sample_sizes'] = sample_sizes
That will be in release 0.4.2 (will be released tomorrow), you can keep track here:
I can pull in the data and add the annotations + sample_sizes to the metadata as expected. However, if I slice the studyset (e.g., to focus on hypermetabolism or hypometabolism) and convert to a nimare dataset, things aren’t retained:
for study in studyset.studies:
for analysis in study.analyses:
analysis_annotation = list(analysis.annotations.values())[0]
if analysis_annotation['included']:
analysis_ids.append(analysis.id)
sample_sizes = [
analysis_annotation['patient_samplesize'],
analysis_annotation['control_samplesize'],
]
else:
sample_sizes = [0,0] # Analyses of no current interest that weren't annotated and are to be excluded
analysis.metadata['sample_sizes'] = sample_sizes
nimare_dset = studyset.to_dataset()
nimare_dset.metadata.head()
Output:
0 A_Comparison_of_Cerebral_Glucose_Metabolism_in... A_Comparison_of_Cerebral_Glucose_Metabolism_in... PD_+_Dementia_vs_control A Comparison of Cerebral Glucose Metabolism in... European Journal of Neurology A Comparison of Cerebral Glucose Metabolism in... [13, 15]
...
But running the following after adding the sample sizes doesn’t retain that “sample_size” column:
A_Comparison_of_Cerebral_Glucose_Metabolism_in... A_Comparison_of_Cerebral_Glucose_Metabolism_in... PD_+_Dementia_vs_control A Comparison of Cerebral Glucose Metabolism in... European Journal of Neurology A Comparison of Cerebral Glucose Metabolism in...
Solved… maybe a bit hacky but it grabs the studies I expect:
nimads_studyset = download_file("https://neurostore.org/api/studysets/<id>?nested=true")
nimads_annotation = download_file("https://neurostore.org/api/annotations/<annotation_id>")
studyset = Studyset(nimads_studyset, annotations = nimads_annotation)
annotation = Annotation(nimads_annotation, studyset)
analysis_ids = []
missing_keys = []
for n in annotation.notes:
try:
if n.note['patient_hypermetabolism'] & n.note['included']:
analysis_ids.append(n.analysis.id)
except:
missing_keys.append(n.analysis.id)
print(len(analysis_ids))
# Filter notes to only include those where the analysis is in analysis_ids
annotation_dict = annotation.to_dict()
filtered_notes = [
note for note in annotation_dict['notes'] if note['analysis'] in analysis_ids
]
# Create a new dictionary with the filtered notes
filtered_annotation = annotation_dict.copy()
filtered_annotation['notes'] = filtered_notes
#print(filtered_annotation)
studyset_dict = studyset.to_dict()
studyset_dict_filtered = studyset_dict.copy()
# Filter studies and analyses based on the analysis_ids
for study in studyset_dict_filtered['studies']:
# Filter analyses for this study
study['analyses'] = [analysis for analysis in study['analyses'] if analysis['id'] in analysis_ids]
# Create subset studyset to include only those studies of interest
target_studyset = Studyset(studyset_dict_filtered, annotations = filtered_annotation)
# Get sample sizes from annotations and add them to the metadata
for study in target_studyset.studies:
for analysis in study.analyses:
analysis_annotation = list(analysis.annotations.values())[0]
if analysis_annotation['included']:
analysis_ids.append(analysis.id)
sample_sizes = [
int(analysis_annotation['patient_samplesize']),
int(analysis_annotation['control_samplesize']),
]
else:
sample_sizes = [-1,-1]
print(sample_sizes)
analysis.metadata['sample_sizes'] = sample_sizes
nimare_dset = target_studyset.to_dataset()