Hi Taylor,
Thanks for this info. I have successfully downloaded the v7 50 topics LDA file, but it only contains 50 lines with the topics, whereas the older version we used contains the “id” header in addition to values associated with the topics (https://raw.githubusercontent.com/neurosynth/neurosynth-web/master/data/topics/analyses/v4-topics-50.txt). The code you are referring from this other publication hence works with the v4 file, but not the v7. Is there a v7 file I missed which would be similarly organized?
Using the code you provided in this other publication, I was able to merge the 50 v4 topics into the neurosynth dataset. However I am not exactly sure what next steps should be? I am sorry in advance, I am not a seasoned coder, especially not with python so it is a bit difficult for me to follow. Here is my code, in case it would help:
Creation of the neurosynth dataset, as indicated on the NiMARE website:
‘’'out_dir = os.path.abspath("/Users/m246120/Desktop/dAD_BPR/Decoding/neurosynth/")
os.makedirs(out_dir, exist_ok=True)
files = nimare.extract.fetch_neurosynth(
path=out_dir,
version=“7”,
overwrite=False,
source=“abstract”,
vocab=“terms”,
)
pprint(files)
neurosynth_db = files[0]
neurosynth_dset = nimare.io.convert_neurosynth_to_dataset(
coordinates_file=neurosynth_db[“coordinates”],
metadata_file=neurosynth_db[“metadata”],
annotations_files=neurosynth_db[“features”],
)
neurosynth_dset.save(os.path.join(out_dir, “neurosynth_dataset.pkl.gz”))
print(neurosynth_dset)
neurosynth_dset = nimare.extract.download_abstracts(neurosynth_dset, “corriveau-lecavalier.nick@mayo.edu”)
neurosynth_dset.save(os.path.join(out_dir, “neurosynth_dataset_with_abstracts.pkl.gz”))’’’
“”"###Reproduction of the 50LDA neurosynth library
First, load the dataset as dset
with gzip.open("/Users/m246120/Desktop/dAD_BPR/Decoding/neurosynth/neurosynth_dataset_with_abstracts.pkl.gz") as dset:
dset = pickle.load(dset)"""
“”# Read in the topic file, rename the ID column, and
prepend a prefix to the topic names
df = pd.read_table("/Users/m246120/Desktop/dAD_BPR/Decoding/neurosynth/data-neurosynth_version-4_vocab-LDA50_vocabulary.txt")
topic_names = [c for c in df.columns if c.startswith(“topic”)]
topics_renamed = {t: “Neurosynth_LDA__” + t for t in topic_names}
topics_renamed[“id”] = “study_id”
df = df.rename(columns=topics_renamed)"""
“”"# Change the data type for the study_id column so it can be merged
df[‘study_id’] = df[‘study_id’].astype(str)
Merge the topic dataframe into the annotations dataframe
new_annotations = dset.annotations.merge(
df,
how=“inner”,
left_on=“study_id”,
right_on=“study_id”
)
dset.annotations = new_annotations"""
“”"# The topic file only contains ~10k studies,
so we must reduce the dataset to match
new_ids = new_annotations[“id”].tolist()
dset = dset.slice(new_ids)"""
What would be next? I am not sure what you mean with the dataset.get_studies_by_label() function, as I have never included that in my previous models. Do I need to run a new model, then the decoder? Would decoding be similar as for the gclda model? Such as:
“”"# Run the decoder
decoded_df, _ = decode.continuous.gclda_decode_map(model, img_eb1)
decoded_df.sort_values(by=“Weight”, ascending=True).head(50)""
Thanks for your help again.