Create a dataset from my own data

I am trying to follow this example:

and replicate it on my own data (in bids format).
For that, I need to create a Dataset from my own data.
Meaning, instead of the lines:

from nilearn import datasets
dataset = datasets.fetch_development_fmri(n_subjects=1)

It should be something like:

dataset = datasets.fetch_from_folder(path_to_bids)

How can I do this?
In all of the many published nilearn examples, I only saw how to use their pre-defined datasets.


Hi @orko,

The fetch functions provided in nilearn’s datasets return a dictionary that has file paths to the downloaded data, which are stored by default in nilearn_data in your home directory. These functions are convenient for downloading data, but don’t have anything special about them that is required by nilearn.

If you wanted something similar to the dictionary that these functions return, you would have to build your own for your own dataset.

For instance, if you had a BIDS dataset stored in ~/my_data/, you could use python’s glob module to return all files that match a pattern:

import glob
dataset = {'func': glob.glob('~/my_data/*/*/*bold.nii.gz'),
         'confounds': glob.glob('~/my_data/*/*/*confounds.tsv')}

You could imagine adding more keys to the dictionary depending if you needed them for the analysis.

Alternatively, you could look into PyBIDS, which provides various functions that allow you to access your data without having to provide all sorts of file paths (assuming your data is perfectly in BIDS format)

In fact the only thing you need to have to run this on your data is to get dataset.func defined,
but you could simply use

func_filename =  'path_to_your_data.nii.gz'

and run the whole thing without bothering with datasets

@danjgale Thanks, the data is indeed perfectly in BIDS format - do you have an example for how to combine pyBIDS with nilearn’s Dataset & analysis?

Yes but I prefer to have the Dataset object for compatibility with other functions as wel

Then you can create a sklearn Bunch manually, by imputing your data. This what the nilearn functions do.

1 Like

@bthirion Thanks, one more (perhaps basic) question - are the func and anat files in nilearn’s datasets (For example for .nii used here

) are the raw files, or the one generated after running fmriprep?

These are the files obtained by running fmriprep or some equivalent data analysis tool.
In particular, they are registered and resampled to MNI space.

@bthirion and haxby_dataset.anat[0] is the same subject’s brain_mask.nii.gz after preprocessing right?

Yes, but I would advise you to take a standard mask in MNI space if you run a multi-subject study.