In datalad, how to attach a url to an existing output file?

I’m looking for some guidance on the following flow in datalad.

Note: I am using the python api, not the command line.

I would like to create a file using the run command, then upload that file to a remote service and retrieve a url, then attach that url to the output file so that others can clone the repo and then using datalad get to retrieve the file content.

I’m having trouble knowing how to attach the url.

I saw the addurls command, but it doesn’t seem like the right thing (tbh, I don’t understand what that does).


Not sure if there’s a way to do this in datalad, but I think you can use git annex directly:

git annex addurl --file $FILENAME $URL

Based on reading the help text, haven’t tried it.

1 Like

Thanks @effigies that worked!

Now I’m facing a new challenge.

I do not want the path in my local computer to show up in the git annex whereis ... of my file. I only want the web url to be checked into the repo, because I want that to be the sole source of truth. But it seems that datalad keeps adding my local path in. Is there a way to avoid/prevent that?

That “path” is just a “description” text field associated with your git repository location. By default it is servername:path but you can provide arbitrary one

❯ datalad create --help 2>&1 | grep -e -D
Usage: datalad create [-h] [-f] [-D DESCRIPTION] [-d DATASET] [--no-annex]

(and analogous description should be there in Python API).

In principle you can announce any of the instances “dead”

❯ git annex dead --help
git-annex dead - hide a lost repository or key

Usage: git-annex dead [[REPOSITORY ...] | [--key KEY]]

but not sure if it would not have side-effects for continuing to work in that repo.

Also you can announce repository clones private: