PyMVPA Event-related Data Analysis

sajjad · December 13, 2017, 11:41pm

Hi All!

I’m trying to apply MVPA using PyMVPA (http://www.pymvpa.org/tutorial_eventrelated.html) on a dataset that I have, but have a problem with it. With the following function, we can fit a model of the hemodynamic response to all relevant stimulus conditions and later feed our classifier with the results of hrf modeling rather than raw BOLD data; time_attr is an attribute with a time stamp for each sample. Also, (‘targets’, ‘chunks’) indicates that we want a separate model for each stimulation condition (targets) for each run of our example dataset (chunks):

evds = fit_event_hrf_model(ds,
… events,
… time_attr=‘time_coords’,
… condition_attr=(‘targets’, ‘chunks’))

As I understand from this function, it still gets samples (raw BOLD volumes) as inputs and needs us to attribute time stamps to each sample. So what if I have jittered my data and events don’t start at the beginning of volumes? In this case, labeling samples isn’t really reasonable because one sample can contain data of two different conditions. I hope my question is clear. I’d appreciate it if anyone could help with this!

eknahm · December 15, 2017, 5:50am

The function will generate HRF models and fit them to the data you provide. For that to work you need two types of timing information. 1) The ‘time_coords’ in the dataset identify at which timepoints each BOLD volume was acquired. Unless you are doing event triggered sparse sampling scans this is independent of the timing of your stimulation event. This timing is 2) and is given via the list of ‘events’, where each even is a dict that has at least ‘onset’ and ‘duration’ as fields, each given in the time unit of your ‘time_coords’ attribute. Hence, it is perfectly fine to have events that are not synchronized with the TR.

The labeling of events is taken from the ‘events’ list. Any attribute(s) that are also in the dicts will be assigned as condition labels in the output dataset.

sajjad · January 10, 2018, 10:46pm

Hi Michael, Thank you again very much for your response to this post. It solved my issue! I just have a couple more questions, and would appreciate it if you could help:

At the beginning of running my code, I get the following two warnings and haven’t found any convincing explanation of what they mean:
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/integrate/quadpack.py:293: UserWarning: Extremely bad integrand behavior occurs at some points of the
integration interval.
WARNING: Detected incorrect (nan) scl_ fields. Resetting to scl_slope=1.0 and scl_inter=0.0
My TR is 2 seconds, but when I load my data and print time_coords (print fds.sa.time_coords) I get this: [ 0.00000000e+00 1.00000000e+00 2.00000000e+00 …, 1.48100000e+03 1.48200000e+03 1.48300000e+03] Do you know why this might have happened? In order to fix it, I just do: fds.sa.time_coords = fds.sa.time_coords * 2
As you see from the fit_event_hrf_model function that I mentioned in my first post, one parameter estimate sample for each target value for each chunk is produced. Doesn’t that give us very few data for cross-validation? What I’m doing instead is: condition_attr=(‘onset’,‘targets’, ‘chunks’)) so that there is one parameter estimate sample for each individual event; also to be clear, I’ve merged all my runs into one single NIFTI file which is what I first load using fmri_dataset function.

eknahm · January 12, 2018, 6:58am

Glad it helped!

re 1) The first warning I have no idea about. The second warning is NiBabel informing you that your NIfTI files have two fields set to illegal values (NaN) where the standard demands 1.0 and 0.0. In general this can be ignored, as both cases tend to indicate that no scaling of the data is desired

re 2) This information is taken from the pixdim[4] value in the NIfTI header. If this is wrong your NIfTI header does not really match the data. This should not happen with any standard pre-processing pipeline. I would recommend you take a look at which stage your files get butchered. If some piece is too sloppy, it might also handle other pieces of information inappropriately.

re 3) Yes, one estimate per condition per run is not much. On the other hand your alternative produces more estimates, but they are likely to be noisier, because you are basing the estimate on a lot less variance in your model (it will be mostly zero, except for one event). This is one example of the general trade off between number of training samples and noise reduction. I don’t think anyone can tell you what is best, as the optimal selection will heavily depend on the context and quality of you data.

HTH

sajjad · January 26, 2018, 12:00am

Thank you so much again for your help! Among the initial results that I’ve got, there are some under chance classification accuracies. Other people have suggested that preprocessing issues might be the main reason, so I wanted to check and see if I’m doing detrending and normalization in the right way; my data is already motion corrected.

First, just to clarify, I’ve merged all my runs into one NIFTI file which is loaded (and masked) with fmri_dataset function as ‘ds’ in my code. This will be what I give to fit_event_hrf_model function for modeling. Wouldn’t that be a problem because of baseline differences between runs? Does the function handle that? Or maybe since we do detrending before modeling, we don’t need to worry about it?
As I mentioned before, there is jittering in my data and rather than giving target and chunk labels to my samples, I do that directly to my events for classification purposes. So my ds doesn’t have target and chunk sample attributes, thus for detrending I do: poly_detrend(ds, polyord=1) which detrends the whole dataset at once. Is that a common approach? Or should I add chunks sample attributes and then do: poly_detrend(fds, polyord=1, chunks_attr=‘chunks’)? By the way, I’m doing detrending just before fit_event_hrf_model.
For normalization, after fit_event_hrf_model and just before classification, I do: zscore(evds, chunks_attr=None) like it is suggested on the link I mentioned before. Is this the best approach here, or can I in some ways normalize based on events with ‘rest’ targets? Thank you very much for your time and help!

Sajjad

sajjad · February 8, 2018, 6:20pm

Problems solved. If anyone has faced similar issues, please let me know and I’ll provide details.