Dear Friends of Research,
I am working my way through the (excellent) workshop by Michael Notter and Peer Heerholz (https://www.youtube.com/watch?v=4FVGn8vodkc). Now I want to apply the machine learning tutorial (workshop_pybrain/05b_machine_learning_nilearn.ipynb at master · miykael/workshop_pybrain · GitHub) to a dataset of my own, but am failing at creating the labels.
In Michael Notter’s example, the eyes are each opened for 4 volumes and closed for 4 volumes. A Numpy array is created that looks like this:
array(['closed', 'closed', 'closed', 'closed', 'open', 'open', 'open', 'closed', 'closed', 'closed', 'closed', ...])
My experiment on the other hand consists of 4 runs with 211 volumes each and a TR of 2. Each trial lasts a total of 2.5 seconds (2.25 seconds stimulus presentation, 0.25 seconds ITI). Per run 167 trials were completed. There are 6 different conditions that were presented in random order.
Based on the tutorial, it seems to me that the labels must be assigned specifically for each volume. Now would the assumption be correct that ~ 1.25 volumes need to be assigned to each trial? If so, does the overlap between the trials and the volumes pose a problem? Is my experiment even analyzable in this form?
Thanks for the help and if any information is missing, I’ll be happy to provide it
welcome to neurostars and thank you for your post, it’s great to have you here!
I’m very sorry for the late reply, but very happy to hear that you found our tutorials helpful!
That being said, the ML tutorial might actually not be the best concerning common ML workflows applied to neuroimaging data. Especially, with regard to how
labels are generated. The tutorial dataset we’re using was picked because it’s rather small and allows to explore different aspects. However, it’s rather uncommon in that
labels are precisely matched, thus enabling to assign one
volume. You’re right that this won’t work for your (and most other) designs/paradigms as
trials will most certainly be distributed across several
volumes. What folks in these (and other) cases do is to obtain
beta images per/across
runs via a
GLM and submit those to subsequent ML analyses. For example, if your experiment consisted of 8 runs within which 4 different auditory categories were presented and you want to evaluate if the
voxel pattern of certain
ROIs carry information regarding these conditions, you could submit your
preprocessed data to a
GLM to compute run-wise
beta images for each condition (
4 beta images per run,
32 in total) and then train a
classifier to differentiate them (e.g. based on
voxels within the
auditory cortex), cross-validating based on
leave-n-out). Nilearn has a lot of great tutorials on this. Since the
nilearn also supports GLMs and thus got rid of the necessity to obtain
beta images via other software packages.
If you have further questions, please don’t hesitate to ask!
HTH, cheers, Peer
It is almost a year later and I found your post really helpful. I’m new to MVPA and trying to decode without perfect labels (as most tutorials have) and I am getting tripped up.
Would you have a suggestion for how to do this with limited runs? Our task has two runs and so cross-validating based on runs doesn’t make sense, although this is the ideal approach (as would having more than 2 runs of data).
Would a between-subjects approach seem appropriate? To cross-validate across subjects? The problem here is that there would be repeated measures (multiple images for the same subject) in the dataset… perhaps I could run ML analyses on two datasets that don’t have more than one image per subject? Dataset 1 would have an image for condition 1 for subject X and an image for condition 2 for subject Y. Dataset 2 would have an image for condition 2 for subject X and an image for condition 1 for subject Y.
Hopefully that makes sense! If I am completing misunderstanding how this works, please let me know!