That’s a good question!
If I understand correctly, you are asking: “At the time of commit C in the parent dataset,
what did file F in subdataset S look like?”
Indeed, according to YODA it would be typical to provide the inputs from a subdataset. But since the parent dataset only stores the subdataset’s commit hash (and not the whole history of the subdataset’s files), we cannot directly query a previous version of a file in a subdataset from the parent.
So you would need to do it in 2 steps:
(1) Find out which revision of the subdataset was stored in the parent dataset at the time of the parent’s commit of interest. For example, when you called datalad run
in the parent, what was the subdataset’s commit ID?
You can get this commit ID with the git command (source):
git rev-parse <parent_commit_hash>:<subds_rel_path>
where <subds_rel_path>
is the relative path to the subdataset from the parent dataset.
Example:
$ git rev-parse 193d23d82b911:data/inputs/rawdata/bids
e4cda1cf6f5e9f9be9c3baf7697c81cca7661130
(2) Take the resulting commit ID and plug it in the git show
command (same as in the simple case) executed in the subdataset. This will print out the contents of the file at the given revision.
Here is a snippet that glues together the two commands (you can copy it including the outer brackets, replace the variables and run it in the shell):
(
dataset=.
parent_commit="193d23d82b9116e5247c4ffb46f534e2e37ac7c5"
subds_path="data/inputs/rawdata/bids"
subds_file_path="./README.md"
datalad -f disabled \
foreach-dataset -d "${dataset}" --contains "${subds_path}" \
git show "$(git rev-parse "${parent_commit}":"${subds_path}")":"${subds_file_path}" | cat
)
the datalad foreach-dataset
command is really good for iterating through subdatasets – in case you have inputs from several different subdatasets.
Still, I wonder if there is a more DataLad-onic () solution for this use case?