Hello community,
I am struggling with a very unexpected behaviour from datalad.
In a nutshell, I have a BIDS dataset on which I run the BIDS-wrapper of pydeface, and datalad fails to detect that the anat image has changed.
To be more explicit:
I start with a clean, minimalistic BIDS dataset: (sorry formatting isn’t great in this post)
$ tree
.
├── dataset_description.json
├── participants.json
├── participants.tsv
├── README
└── sub-001
└── anat
├── sub-001_T1w.json
└── sub-001_T1w.nii.gz
Now I create a datalad repo:
$ datalad create --force -c text2git -D ‘this is a test dataset’
(note: I use the --force flag because I have already some stuff in here).
Now I add stuff in the datalad repo:
$ datalad save -m ‘first save’ .
The T1w has some size (this will be helpful later on): to find it, first I need to find where the link points to:
$ ls -lrt sub-001/anat/sub-001_T1w.nii.gz
and then using the output I can determine the actual size:
$ du -cs .git/annex/objects/Kx/pv/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz
10004 .git/annex/objects/Kx/pv/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz
10004 total
To keep things clean, I’ll create a branch to do some stuff on it, although this is not strictly necessary for demonstrate the issue I am facing.
$ git checkout -b experiments
Now I’ll run pydeface. But since I’d like to stay in the BIDS world, I am going to use the BIDS wrapper found in https://github.com/cbinyu/pydeface. I just build the docker image (v2.0.6 of the repo) using the tag “local/pydeface” (I ran something like “sudo docker build -t local/pydeface .” in the repo clone).
One property of that wrapper is that it just overwrites the original T1w image. I suspect this is a potential source of the issue I am having but I can’t make sure.
So, let’s run it. I have tried in two ways, using datalad run and without datalad run. The result is wrong in both cases, but let me give some details in the case “without” (just in case datalad run has a bug…):
sudo docker run -it --rm -v $PWD:/data local/pydeface /data /data/derivatives participant
After a bit, the command ends without any error, and I can check that the image has indeed been defaced. Moreover, I can also see that the size of the file has changed:
du -cs .git/annex/objects/Kx/pv/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz
9480 .git/annex/objects/Kx/pv/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz/MD5E-s10243241–5752b9e7c0a75ea4e0bb9d12194156f1.nii.gz
9480 total
And indeed, when I open the image, it is properly defaced. So now for the bug: when I run datalad save, no changes are detected. It just outputs nothing (no error). It is as if, from the the datalad perspective, nothing has changed.
What’s even worse, is than when I switch back to the master branch, the link points to the same file as in the experimental branch… hence the defaced image! And all this without doing any merge…
I have experimented using a remote and so on: the problem persists. In particular I can have a remote with the not-yet defaced image, and a local copy with the defaced image, and pushing does nothing, datalad thinks all is fine.
There are still many things I could investigate but I’d like to have you input on this… thanks community!!!
PS: I am using datalad v0.15.3