Adding metadata to a file in DataLad

glatard · April 26, 2018, 8:17pm

I am trying to add key-value pairs to the metadata of a file in a DataLad dataset, using DataLad version 0.10.0.dev1, just cloned from the GitHub repo. It seems that the only available command to add metadata is “datalad aggregate-metadata” and that it requires specific metadata extractors. Am I right that I then need to write an extractor that gets the key-value pairs from my file and adds them to the metadata? If not, what is the way to go?

yarikoptic · April 28, 2018, 4:20am

If you do not have metadata in some kind of already known “metadata format” and for specific files we already support (bids for neuroimaging datasets, xmp for pdfs etc, fields within DICOMs, nifti fields, etc…) then just use git annex metadata command. Things are still in flux, but metadata documentation and search command doc recently got an update which might give more detail on types of metadata and what to do with them.

Here could be a set of commands for a basic example of working with metadata stored purely in git-annex with some actions descriptions:

datalad create /tmp/datata8
cd /tmp/datata8
echo 123> 123; datalad add 123
git annex metadata --set field=value 123   # assign metadata field with value to a file
git annex metadata   # to visualize git-annex metadata "natively" i
datalad aggregate-metadata      # to aggregate all (in this case not much yet) metadata
datalad -f json_pp metadata     # print known metadata
echo 124> 124; datalad add 124  # one more file
git annex metadata --set newfield=newvalue 124  # with new metadata
datalad aggregate-metadata      # would need to be reaggregated
datalad -f json_pp metadata     # pretty print all aggregated metadat
# by default search would be only across dataset level metadata
# but we have nothing interesting there so let's index/search
# on both dataset and file level
# (config is just git style config which you could set in your
# ~/.gitconfig or local per repo or permanent per dataset in .datalad/config)
datalad -c datalad.search.index-egrep-documenttype=all search -f value
datalad -f json_pp -c datalad.search.index-egrep-documenttype=all search -f newvalue

For our canonical distribution we would aggregate ourselves using all extractors we have, so on user side only index would need to be built according to the user’s desire (dataset or files level etc).
hope this helps somewhat.

glatard · May 1, 2018, 3:14pm

Thanks, that’s extrenely useful! Is there a way to add metadata to a file which is not in the annex?