Publishing MRI dataset

Hi all,

We’re planning to release a large MRI-dataset (containing functional, anatomical, and diffusion MRI, as well as physiology data), containing data from ~1400 subjects in total. As the data is in BIDS-format already, and to make it as easy as possible for people to work with the data, we plan to run the data through FMRIPREP (and MRIQC). We were wondering what the “optimal” preprocessing parameters would be, given that we want to make this dataset as “user-friendly” as possible. At this moment, we were thinking of the following:

  • Freesurfer reconstruction;
  • No slicetiming correction (TR of functional scans range from 0.75-2.2 sec.);
  • Output spaces: native, T1w, fsnative;
  • Native resampling grid
  • SyN distortion correction for sessions without fieldmap scan

Additionally, we were thinking of running the (preprocessed) anatomical data through the FSL-VBM pipeline and the DWI data through the FSL TBSS pipeline. This, again, to make the data as “ready to-be-analyzed” as possible.

If anyone has ideas/recommendations/suggestions about how to “package” this dataset for publication, let me know!

Best,
Lukas (University of Amsterdam, Spinoza Centre for Neuroimaging)

2 Likes

Thanks for considering publishing your data - it’s a very noble thing to do!

If the dataset is in BIDS (and passes the validator) you are half way there!

In terms of organizing data you should put it in the /derivatives folder and arrange it in pipeline specific subfolders such as:

/derivatives/
   fmriprep/
   freesurfer/
   tbss/
   vbm/

FMRIPREP will produce the freesurfer and folders. For transfering to OpenNeuro.org considering the size of the dataset I would recommend using the OpenNeuro CLI. You can run it on your server in a console and it also allows you to update datasets (so, for example, you can upload raw data first and later add the /derivatives folder and update the dataset).

As for parameters:

Yes

Opinions are divided. I personally think this does not make a huge difference.

I would do T1w, MNI152NLin2009cAsym, and fsaverage. There is no need for native/orig since in FMRIPREP T1w has the same sampling grid as original data and the same number of interpolations (since FMRIPREP concatenates interpolations).

That’s the default in FMRIPREP for T1w and MNI outputs. It saves space (and computational time), but can cause problems if your input data have different dimensions. If that’s the case I would recommend calculating what is the highest resolution along all three dimensions in your input data and setting this manually as your resampling grid in FMRIPREP options. This way you will not lose any precision and all the outputs will have the same dimensions.

sounds like a good idea

Processing such a large dataset will be a challenge, but even sharing the raw data will be a great contribution to the community.

1 Like

I would consider fsaverage5, which has a vertex-to-voxel ratio of closer to 1:1, while fsaverage is on the order of 3:1 (if I remember correctly), so you have higher spatial sampling rate and thus larger files without much gain. If other surfaces need to be targeted for some reason, using mri_vol2surf from the space-T1w files will internally sample to fsnative and then to whatever mesh is desired.

(That said, there’s no harm in targeting fsaverage except in terms of disk space and the computational cost of working with larger data.)

Thanks for the input, both! It’ll start working on it soon :slight_smile: