I am currently trying to aggregate some subjects from openneuro with different dataset properties, for example by age, scan type, fieldmaps etc…
After cloning openneuro: datalad install ///openneuro
I search for specific filters (for example male 40yo) with: datalad -c datalad.search.index-egrep-documenttype=all search bids.subject.age:40 bids.subject.sex:male
How can I list all the values that exists for a given field? For example to list all manufacturers on openneuro, that could look like: datalad list bids.Manufacturer
How to check if some files exists for all the sub-datasets? For example to check for different fieldmaps (phase difference map): datalad exists *_magnitude1.json
Thank you for showing interest in datalad search. FWIW – as there is ongoing work on refactoring metadata storage etc, and there numerous issues with openneuro datasets complicating streamlinging this process, I have stopped extracting/aggregating metadata for openneuro (openneuro itself doesn’t do that, so it was up to us – datalad – to do that). Now that I see that there is interest, I will try to find time to re-introduced extraction of metadata. Hopefully within a week or two. Meanwhile, metadata will not be complete. Back to the specific questions
How can I list all the values that exists for a given field?
$> datalad search --show-keys full 'bids.Manufacturer$'
bids.Manufacturer
in 128 datasets
has 20 unique values: 'ANT'; 'Agilent'; 'Biosemi'; 'Brain Vision'; 'Bruker BioSpin MRI GmbH'; 'Bruker'; 'CTF'; 'Elekta/Neuromag'; 'GE 3 Tesla MR750'; 'GE MEDICAL SYSTEMS'; 'GE'; 'General Electrics'; 'Neurofile NT'; 'Philips Medical Systems'; 'Philips'; 'SIEMENS '; 'SIEMENS'; 'Siemens'; 'g.tec'; 'gtec'
How to check if some files exists for all the sub-datasets?
Perhaps in addition to the approach Yarik nicely described - I catalog several metadata fields for all of our public datasets in this google sheet. This could help find datasets that match what you are looking for (e.g. ages, modalities)
@yarikoptic The commands that you showed will really help me, thank you for that. @franklin I will definitively check you google sheet, this is also really usefull for the community, thank you!