Fmriprep for BOLD time-series analysis

I want to run some time-series analysis on BIDS-formatted dataset.
I looked at some tutorials, for example:
And it seems that when performing time-series analysis (in contrary to GLM analysis), the data used is normally the raw bold.nii.gz , meaning w/o applying fmriprep on it.

  1. Is this indeed the case? If so - why is it? I
  2. Why the datasets in nilearn.datasets stored in sklearn.utils.Bunch format instead of Bids? Let’s say I want to apply this tutorial on my dataset, is there a convenient way to transform it from BIDS to bunch object?
  1. I think naming the file bold.nii.gz was just a way of generalizing it to a functional timeseries. I would suggest using your fMRIPrep-ed data here. fMRIPrep does minimal preprocessing that improves data quality on most kinds of analyses (e.g. slice-timing and susceptibility correction if applicable, motion correction, and spatial normalization), and outputs tons of regressors that can be used for further preprocessing depending on your specific application. For example, you may want to input some of the regressor columns (e.g. motion outliers, global signal) as confounds when you extract the timeseries (see here for an example).
  1. I am not familiar with the sklearn.utils.Bunch format, but BIDS is standardized enough that if you get your script working on one subject, it should be very manageable to script it for other subjects.

Let me know if this doesn’t make sense.


@Steven Definitely makes sense! thanks. using the post-fmriprep data is certainly preferable for me, also since I want to use MNI-space and not native space (I want to fetch data from the same ROI for all subjects).
Can you please specify what is the confounds file in the link you send and why is it useful?
My goal is to take the time series from specific voxel (e.g. Broca area data) and correlate it, per subject, with the input signal.
So, to my understanding I need to use NiftiMasker on post-fmriprep .nii file, right? Do you perhaps have an example to something similar? (Even if only the pipeline of extracting one subject’s time-series on a certain voxel, the correlation part is clearer for me)


Confounds contain certain kinds of noise and signal to regress out of the time series. For example, you may want to regress out head motion parameters (rotation and translation in x,y,z axes). Some parameters, like aCompCor, will help regress out physiological noise (e.g. from breathing or heart activity). Here is a good paper with recommendations for resting state data: An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI - ScienceDirect.

Your confounds will be in the func folder as the confounds.tsv file. However, it contains several confounds, and you probably will not want to use all of them, or you will be left with very little signal. In terms of specific confounds, you can look at the paper I sent, or look at papers doing similar analyses to you to see what they did. In either case, you will have to save a new confounds.tsv while extracting just the columns you’re interested in. Again, since the fMRIPrep outputs are so standardized, scripting this should not be too hard; using the pandas package for Python makes working with tsv/csv files manageable.

1 Like

In regards to a specific tutorial, it isn’t quite clear to me what you are trying to do, but it sounds like a seed-to-voxel (that is, using a broca’s area seed and correlate with every voxel in the brain) is what you are looking for. Nilearn: Statistical Analysis for NeuroImaging in Python — Machine learning for NeuroImaging Nilearn has several tutorials that you can search for.

@Steven Thanks but this is not exactly what I meant, I will be clearer:
Let’s say my input is a certain text with a score for example of lexical complexity at each time point. So now, as a first step, I want to fetch Broca’s BOLD activity to correlate with the vector of lexical complexity score and see the relation between the BOLD activity at the voxel and the complexity score.

Does it make sense?

I get what you are trying to do now, and I think the tutorial I linked probably has a way to get the time series. However, I’d be concerned about doing the kind of analysis you described, as 1) fMRI has a long time resolution 2) hemodynamics peak responses are delayed relative to stimuli onset time. Both of these don’t bode well for correlating bold activity with a fast-moving signal like sentence complexity.

@Steven I thought about modelling it with a delay or offset. What do you think?
Assuming the hemodynamic response takes around 7 secs, can’t I just “shift” them to match each other?
Anyway I think that I will be able to get relativity between correlations in different regions? Meaning, if I compare activity in a relevant voxel v1 and irrelevant one v2, I should be able to see that r(v1) > r(v2), right?

Even if you account for the delay, you still have the fact that a given voxel is only sampled every TR (could be 2-2,5 seconds). A lot of the data are interpolated and probably won’t be responsive to lexical intensity at the word level.

The better experimental design would have been to use a set of scans while being presented with just high lexical score words, then another set of scans using low lexical score words (ideally several blocks of each) and then contrast the two conditions. That in my mind would be the simplest way of saying something like brocas activity increases with lexical activity.

All that being said, if you can find a paper that does a similar method to what you propose, try adopting its methods.

@Steven OK, got it, good point thanks.
What are some common time-series analysis in the field, beside what’s in nilearn tutorials?
for example, let’s say I have a 10 secs pause between sentences/paragraph.
Can I build LSTM classifier to predict sentence type (e.g. complexity score of the entire sentence) based on the entire 20-30 seconds prior?

The most common time series analysis in fMRI is probably functional connectivity, of which there are plenty of tutorials (see the first one I linked above) and toolboxes (see the CONN toolbox).

These days, you can build anything to predict anything, but it’s only worth spending time doing it if you expect the input and output to have some kind of relationship. Not exactly clear what you’re trying to do with the last method you mentioned, but I wouldn’t expect there to be much informational relevant to a given trial before it starts (20-30 sec prior).

@Steven sorry I was unclear.
Let’s say I have 30 seconds long sentence, and then a 15 seconds pause (so the hemodynamic response goes back to baseline).
If I have X sentences with high complexity score and X sentences with low complexity score, I would assume it would be possible to train seq-to-one LSTM to predict “sentence type” (e.g complexity level) based on these 30 seconds of activation. WDYT?

Sounds like multi-voxel pattern analysis (MVPA) may be what you are looking for then. The idea is essentially building a machine-learning classifier, where the inputs are voxel activities (one input per voxel in a given ROI) and the outputs are a predicted condition (in your case, lexical complexity). You can compare the predictions with the actual condition labels and see if the predictor is accurate at the level of chance (1/ #conditions) or above it, which would indicate some kind of predictive power in the ROI. Check out PyMVPA BIDS app for an example program that can do this. GitHub - BIDS-Apps/PyMVPA

@Steven Thanks! I might implement it myself since I don’t suppose it will work as-is on my data, since it is not divided to runs, and also I would like to have flexibility in choosing model’s hyper-parameters & architecture, but good to have a reference point!

@Steven re your useful remark about poor temporal resolution - I did some research and found many papers in which they used stories to build an encoding models , e.g. predict neural response from linguistic input, by respected researchers e.g. A. Huth, L. Wehbe, L. Beinhorn and more. Is using encoding model basically handles this problem? As you aim to predict a response that already includes the HRF function?

Can you send me a link to the paper? And encoding models could work (MVPA is kind of an encoding model), but it will depend on what you are trying to predict. For example, this won’t allow you to predict word complexity at the word level, since that moves too quickly. But on a sentence or paragraph level, then it may be warranted (this is not my area of expertise so I can’t say for sure).

@Steven sure, here are some links:

Sentence level is fine (I saw that they sometimes just sum the measures between scans of the same voxel, so if 2 secs seperation, they take those 2 secs and give it the score).
I just can’t understand why in encoding model (or MVPA as you said) the poor temporal resolution has no effect. Why the problem you mentioned of associating BOLD signal with specific input is no longer an issue?

Will take a look at the papers at some point, but poor temporal resolution indeed still has an effect and is why you would not be able to predict complexity at the word level (since it varies too fast).

@Steven So I understand that the segmentation (looking at sentence level and representing it using a single value) account for the 2seconds intervals between scans of the same voxel. By I don’t understand how encoding models account for the specific, unique shape of the HRF? (what usually handled by using block design)