Moving superdatasets to a new location

lukasvo76 · May 11, 2023, 7:49am

Hi Datalad team,

Our current /data disk on our Linux server is full, so we added new disks.

Adding those to the existing /data partition seems to carry some risks (according to our IT), and simply creating a /data2 partition with the new disks seems like a safer solution.

This would however require moving some superdatasets with their subdatasets (all of whom have GIN siblings) from /data to /data2 (with otherwise identical paths). According to section 9.2.7 of the datalad handbook, this should be pretty straightforward with a simple mv command.

However, since this would be moving to a new disk/partition, I wanted to double check with the team whether the info from the handbook would equally apply here, rendering this a simple and safe approach?

Thanks a lot in advance!

Best wishes,

Lukas

cmo · May 11, 2023, 2:48pm

Hi Lukas. Usually mv would transparently move content between different mounted devices, iff the file system capabilities are sufficiently identical. I understand, that you want to be careful and I cannot guarantee that it would work in all configurations on different operating systems. So I would not go ahead and propose to just do it. I would at least test it with a test dataset, that is representative for your superdataset.

That being said, since you have both disks available, you could first copy the superdataset to the new partition and check its integrity. For example, if you want to copy a dataset /data/superset1 to /data2/superset1, the following commands might be used:

$ cd /data
$ tar -cf - superset1|tar -C /data2 -xf -

Or if you would like to see what is extracted on /data2, replace the second command with:

$ tar -cf - superset1|tar -C /data2 -xvf -

Afterwards, check the dataset in /data2/superset1. If everything is OK, it should be fine to remove the original dataset. I would feel more comfortable though getting confirmation about the deletion from another member of the datalad team. I have created a support issue here: Move a superdataset between different devices via `mv`. · Issue #70 · psychoinformatics-de/knowledge-base · GitHub . You should get another opinion on the matter shortly.

lukasvo76 · May 11, 2023, 3:22pm

Thanks a lot @cmo!

Just double-checking here: is the idea to copy the superdataset and its content including subdatasets using this command, check it on the new location, and then datalad save -r from the superdataset in the new location?

Will keep an eye on the support issue too!

eknahm · May 15, 2023, 5:11am

I can only echo @cmo that, given sufficient similarity in file system capabilities and permission setup, a plain mv should be safe.

I would personally create a test dataset in the old location, and then test copy with rsync (preserving permissions as necessary/possible --archive), and try whether things continue to operate as expected.

Once confirmed, move the datasets (mv), and then possibly place symlinks to the new location in the old ones, to avoid the necessity to fix up existing clones.

lukasvo76 · May 15, 2023, 1:10pm

Thanks a lot @eknahm!