Dataset quality from fmriprep report

root · February 9, 2024, 1:15pm

Hi everyone,

I’ve been working with an fMRI dataset where subjects were tasked with verifying the correctness of arithmetic equations. My subsequent analysis is aimed at exploring the relationship between the stimuli and brain activity.

For preprocessing, I’ve used fmriprep with all default settings. However, the preprocessed data performs surprisingly poor in my subsequent analysis, showing no significant difference from results obtained with randomized data. This is puzzling, especially considering that similar studies have reported much more promising findings.

Upon reviewing the fmriprep reports, I noticed that the tCompCor values are consistently high (above 0.6) for most subjects, suggesting potential issues with data quality. I’m concerned that this might be affecting the performance of my subsequent analyses. Given my relatively limited experience with direct data quality assessment in preprocessing reports, I’m unsure how to proceed.

I’ve attached the complete fmriprep reports for two of the subjects for reference (subject1001 subject1002). I would greatly appreciate any insights or advice on how to address this issue. Could the high tCompCor values be the root of the problem, and if so, how might I go about improving the quality of my preprocessed data? Any suggestions or feedback would be of great help.

Thank you in advance for your help!

Steven · February 9, 2024, 1:29pm

Hi @root, and welcome to neurostars!

The data look fine to me. tcompcor tends to have a high correlation with global signal, so what you’re seeing is not an anomaly. In fact, your data seem to have good (as in low) levels of motion.

Without knowing how you are doing subsequent analysis, it is hard to provide concrete recommendations on how to proceed.

Best,
Steven

root · February 9, 2024, 1:39pm

Thank your for your reply! @Steven

I wonder if you check the other contents in the report? Does the dataset requires additional processing?

Still, the poor performance in my following analysis makes me rather confused (As the performance doesn’t vary much when I replaced the fmri frames with completely randomized matrix with the same size). I wonder if you have some lead on this?

Thanks again for your help!

Steven · February 9, 2024, 1:43pm

Hi @root,

It is hard to know with the limited amount of information provided. How are you currently running your analysis and what is the hypothesis?

None of the other parts of the reports had me concerned either, although some parts are hard to gauge by the static PDF (the HTML has several alternating GIFs that show correspondance between BOLD and T1 alignment that I also tend to look for).

Best,
Steven

root · February 9, 2024, 1:57pm

Hi @Steven

Here’s the html for the two subjects: sub1001 sub1002

Best,
root

root · February 9, 2024, 2:02pm

Hi @Steven ,

As for my analysis, I’m trying to map the semantic features of the the stimuli (e.g. word embeddings) across the cortex using voxel-wise modelling. In short, features of interest are first extracted from the stimuli and then regression is used to determine how each feature modulates BOLD responses in each voxel. My hypothesis would would align with the previous research, that semantic models accurately predict BOLD responses in many brain areas which have previously been identified as the semantic system in the human brain. However, my analysis (mine) for now looks more randomly scattered across the brain rather than concentrated as in previous work (tutorial) (As shown in the heatmap, a lighter voxel indicates a better regression performance).

Any advice on what might be causing the discrepancy in analysis outcomes would be greatly appreciated.

Best,
root

Steven · February 9, 2024, 2:57pm

It is hard to know without seeing any code or knowing how fmri tasks differed between your analysis and the tutorial.

Best,
Steven

root · February 9, 2024, 3:34pm

Hi @Steven ,

I used the same code as the tutorial. The only two differences here are, 1. the semantic features I use is from a language model instead of self-constructed in the tutorial (but the semantic features from the LM works just as well for the stimuli and fmri in tutorial)
2. the fmri the tutorial uses is collected while subjects listened to hours of narrative stories, while my dataset is collected while subjects are shown with arithmetic equations and asked to determine whether the equations are correct.

Also the HTML for the dataset is here (I think I might have uploaded the wrong files earlier)

Thanks for your help!

Best,
root