I would like to know what is the best way to complement a Neurosynth style term-based meta-analysis with individual studies.
nimare.io.convert_neurosynth_to_dataset("database.txt", "features.txt") we are able to generate dataset.Dataset type nimare variable, but it is unclear how one would add additional entries as there does not appear to any “add()” method for this datatype.
There’s no way to add individual studies to a Dataset, but you can merge two Datasets with the new
Dataset.merge() method. I haven’t made a release since adding that method, though, so if you want to use it you’ll need to install the
main version from GitHub.
I think your best move would be to either (1) directly add the studies to the Neurosynth files (which might muddy the waters when it comes to describing your Dataset, unfortunately) or (2) create a Dataset for your added studies and use
So going with (2) would you recommend basically
Once I merge these, ideally all my studies would be in the same “annotation space” right? Am I correct to think that ranking my manually imputed studies according to the “neurosynth annotation space” would be a hassle? I should instead run one of NiMARE annotations functions of the final merged dataset…
That sounds like a good approach, although the JSON format may have changed in the past few years. Take a look at this test json when you build your own. Also be aware that we ultimately want to move toward the NIMADS specification for our jsons (see Synchronize NiMARE json format with NIMADS specification · Issue #523 · neurostuff/NiMARE · GitHub), but I don’t plan to tackle that for a little while.
All of the annotations should be in the same
Dataset.annotations table, and as long as you use the same column/feature names the merge should work fine. However, Neurosynth label weights will probably not follow the same scale. For example, the standard TFIDF weights are scaled not only by the term counts in each study’s abstract, but also the counts for those terms across the whole Neurosynth database. Unless you plan to apply a threshold that would be fairly consistent across the two Datasets (e.g., the threshold of 0.001 corresponds to the term appearing at least once in the abstract), then I wouldn’t consider the weights directly comparable.
Yeah, trying to scale your weights to match the scaling of Neurosynth’s term TFIDF weights or its topic weights would be very difficult, if not impossible.
You might want to (1) download the abstracts for the Neurosynth corpus (NiMARE has a function for it), (2) generate term count annotations, (3) merge with your Dataset, and then (4) apply the TFIDF transform to the combined counts, if you want to directly compare TFIDF weights across the two Datasets.
One final minor thing… we’re looking to support the NeuroQuery data in NiMARE (see Add fetcher/converter for NeuroQuery dataset · Issue #522 · neurostuff/NiMARE · GitHub), and that dataset is actually better than Neurosynth. It’s also automatically extracted, but Jérôme Dockès put a lot of work into improving the quality of the dataset. Just something to be aware of.