Querying Acquisition Info and Metadata with pybids

I’d like to query the acquisition info (TR, TE, PED etc) and some meta-data for a given dataset in BIDS format. I think part of this functionality is already in pybids routines to generate a BIDSReport, but that returns a text meant for inclusion in the Methods section. I would like some sort of dictionary organized by modality, subject/session/run etc that can be programmatically traversed. Any suggestions to achieve this? Happy to help code up what’s missing or desirable.

cc @effigies @tal @yarikoptic

Here is our metadata extractor for datalad based on pybids: datalad-neuroimaging/bids.py at master · datalad/datalad-neuroimaging · GitHub

Here is what I see if I search (on somewhat outdated) metadata extracted this way from a good number of datasets on datasets.datalad.org:

$> datalad search --show-keys short 'bids\.(Repetition|EchoTime)'
 in  138 datasets
 has 342 unique values: 0; 0.00164; 0.00174; 0.00182; 0.00185; 0.00194; 0.00195; 0.00197; 0.00201; +333 values
 in  13 datasets
 has 8 unique values: 0; 0.00313; 0.004; 0.005; 0.00519; 0.00738; 0.00765; 0.01
 in  13 datasets
 has 9 unique values: 0.00302; 0.00492; 0.00519; 0.00559; 0.00646; 0.00746; 0.00765; 0.00984; 0.01246
 in  1 datasets
 has 4 unique values: 0.00373; <<[0.02763, 0.02763, 0.02763, 0.0276++3787 chars++763]>>; +2 values
 in  167 datasets
 has 292 unique values: 0.0025; 0.005224; 0.0064; 0.00668; 0.006872; 0.006884; 0.0068843; 0.0068997; +284 values

1 Like

It should be fairly easy to achieve want you want using pybids already.

Each BIDSFile has meta-data entries as meta-data by default (you can initialize it to not index meta-data to speed things up).

You can query unique meta-data like this:

> [1.5, 2.3, 3.2, 2.53, 1.52]

And for a given BIDSFile you can see the meta-data easily:

layout.get(subject='106', extension='nii.gz')[0]
>>> <BIDSImageFile filename='/home/zorro/scratch/narratives/sub-106/anat/sub-106_T1w.nii.gz'>

layout.get(subject='106', extension='nii.gz')[0].entities
>>> {'DeviceSerialNumber': '45031', 'DwellTime': 8.2e-06, 'EchoTime': 0.00308, 'FlipAngle': 9, 'InstitutionAddress': 'Washington Rd, Princeton, NJ 08540, USA', 'InstitutionName': 'Princeton University', 'InstitutionalDepartmentName': 'Princeton Neuroscience Institute', 'InversionTime': 0.9, 'MagneticFieldStrength': 3, 'Manufacturer': 'Siemens', 'ManufacturersModelName': 'Skyra', 'ParallelReductionFactorInPlane': 2, 'ParallelReductionType': 'GRAPPA', 'PartialFourier': 1, 'PixelBandwidth': 240, 'PulseSequenceDetails': '%SiemensSeq%_tfl', 'PulseSequenceType': 'MPRAGE', 'ReceiveCoilActiveElements': 'HE1-4', 'ReceiveCoilName': 'HeadNeck_20', 'RepetitionTime': 2.3, 'ScanOptions': 'IR', 'ScanningSequence': 'GR_IR', 'SequenceName': '_tfl3d1_16ns', 'SequenceVariant': 'SK_SP_MP', 'SoftwareVersions': 'syngo_MR_D11', 'StationName': 'AWP45031', 'datatype': 'anat', 'extension': 'nii.gz', 'subject': '106', 'suffix': 'T1w'}

So it’s really just a matter of how exactly you want the meta-data organized, but it is already programmatically accessible from pybids.

Thanks Yarik - will take a look and see how I can reuse or improve it for my needs! What I am looking for is something that’s purely python (no reliance on CLI/bash etc)

Thanks mate! I guess I haven’t come across the .get_RepetitionTime() in docs or examples. What would be ideal is if I can specify a modality, subject ID, and a parameter name (say PhaseEncodingDirection), and get the corresponding value. It could be layout.get_param_value('func', '106', 'PhaseEncodingDirection') or even for the entire dataset keyed-in by subject ID / session / run etc : layout.get_param_values('func', 'PhaseEncodingDirection')

the reason I would prefer avoid dealing with individual BIDSFile is I’d like to leave the traversal of the dataset to pybids i.e. I specify modality / subject ID / session / run, and the pybids constructs the URL for it, handling various exceptions etc. Does that make sense?

Ah, I see. Sometimes it’s necessary to deal with the files, since that’s how the meta-data is organized.

I suppose what you’d want is something like


but instead of just returning the unique set, for that to be keyed in by other modalities.

That currently doesn’t exist but if you wanted to add that to pybids, it shouldn’t be that much work.

If you already wanted to, you could make a loop like this:

for subject in layout.get_subjects():
    for session in layout.get_session(subject=subject):
       for run in layout.get_runs(subject=subject, session=session):
          layout.get_RepetitonTime(subject=subject, session=session, run=run)

That should let you give you want you want, but it’s going to be pretty slow since its not well optimized (lots of querying and taking sets internally).

If you want you could write something custom that is more optimized by directly interfacing w/ pybids sql db, and add it to pybids as a helper function.

thanks - that’s what I was thinking too - I could brute-force it and build a dict myself, but it will likely be slow and disk I/O intensive!

Also, given my goals of protocol compliance etc, I have to consider this and perhaps work with DICOMS on XNAT directly? Just thinking out loud here.

To be clear, all those stuff going on in pybids is going from the internal db object, so it should be more efficient.

That said, pybids already has to crawl the entire BIDS dataset (disk I/O intensive upon BIDSLayout creation).

I see - that’s great then, where is the internal db documented? Yes, one traversal would be necessary and would not be a bottleneck in most circumstances.

It’s not, really, because it’s only used by developers (i.e. @tal and now @effigies and me) when adding new functionality.

If you’re familar with db’s it shouldn’t be too difficult although there’s a learning curve to contributing to pybids in general:

You could start out w/ my suggestion and see if the performance for that is decent enough as well

tell me about the learning curve! :slight_smile:

truth be told, every time I try use it for some of my needs (which might not be mainstream), I always have to reorient my mental model a bit. I tried to convey some of the difficulties to Tal at OHBM’18 in Singapore but I failed to convince him of the need to develop a convenience routine that would roughly translate to a single call of the form:
ds = BIDSImport(ds_path_spec)
where would traverse everything that is in that BIDS folder and return a well-documented canonical class that we would work off of, instead of making a ton of calls around a layout whose config and form can vary depending on wha’s underneath. This may not be possible for all use cases, but should be possible the common simple cases (which most are)

FWIW (just to make sure), DataLad is in Python and all CLI has Python interfaces.

1 Like

sure, thanks Yarik! I am looking at pyxnat as well, and perhaps need to check xnat plugins also to see if they help with my goals.

What about the CuBIDS formerly known as BOND or something like that? Let me see if I can find it again…


Yep, there it is.


1 Like