Exporting datalad dataset tree to google drive

Is there a good method to export a datalad dataset tree directly to google drive?

The dataset is ~300GB and I’d like to not export and upload one large archive file eg with datalad export-archive? Ideally something like datalad-OSF Use case 2: Export a human-readable dataset to OSF but for google drive.

Would i need something like git-annex-remote-googledrive or git-annex-remote-rclone to accomplish this?

I’m speaking without experience with DataLad + Google Drive specifically, but one of the two git-annex special remotes you mention should be the way. Here’s a few general pointers.

With those special remotes you’d use git-annex commands to set up the remotes (siblings). To get the export tree, you need to set exporttree=yes when doing git annex initremote. Google drive remote explicitly says in its readme that exporttree is supported; not sure about rclone; but there’s a high chance it does.

Just to be clear, using “export tree mode” means that the files on Google Drive will have the same layout as they do in your folder (i.e. human-readable) - which as a consequence means that there can be only a single version of a given file in the storage. The alternative (default, without exporttree option) is that files are stored in a layout organised by checksums - as a consequence, multiple versions of a file can be stored and retrieved by git-annex (DataLad), but the layout isn’t very useful for browsing without DataLad.

Either way, these special remotes will store annexed contents (so no git history or availability information), and you’ll need to configure another sibling if you also want to publish the git part (with GitHub, GitLab, Gitea or similar). In this way, the Google Drive workflow will be more similar to that described for AWS or Dropbox in the DataLad Handbook (Dropbox workflow uses rclone).

If you set up an export remote (exporttree=yes), then the push command from DataLad core won’t work with it (it expects the default mode), and you’ll either need to call git annex export or use an updated push from DataLad-Next extension (see this issue in DataLad).

The OSF workflow is unusual in that it not only uses the tree layout for export, but also packages the git part and puts it alongside the annexed contents (so everything you need for datalad clone is in one place). However the create-sibling-osf method has been tailored specifically to OSF and there is no direct analogue for Google Drive. A similar (similar in practice, but differently implemented) method, create-sibling-webdav has recently been created for hosting services which support WebDAV and included in DataLad-Next.

1 Like