How do I add URLs to add that require authentication using datalad addurls?

question

How do I add URLs that require authentication to the repository using datalad addurls?

I am trying to add a specific URL to the repository using the “datalad addurls” command.
However, it does not work and I get an error.
The error message includes “unable to access url”.

The URL given to “datalad addurls” requires authentication, so I gave the following URL (example)
URL = http://username:password@domain.com/file/to/path/test_data.txt

In fact, we get the values for “datalad addurls” in a CSV file.

CSV file.

who,link
to/path/test_data.txt,http://username:password@domain.com/file/to/path/test_data.txt

Commands executed

datalad addurls --nosave --fast .tmp/datalad-addurls.csv ‘{link}’ ‘{who}’

Additional Information

Running “http://username:password@domain.com/file/to/path/test_data.txt” in a browser or Postman confirmed that we could get data.
image

I have looked into how to mound “datalad addurls” when dealing with URLs that require authentication, but have not found a good solution.

Version in my env

  • datalad 0.17.6
  • Ubuntu 20.04.4 LTS

Full Error

$ datalad addurls --nosave --fast .tmp/datalad-addurls.csv '{link}' '{who}'

["addurls(error): /home/jovyan/to/path/test_data.txt (file)
 [AnnexBatchCommandError(AnnexBatchCommandError: 'addurl' 
 [Error, annex reported failure for addurl (url='http://username:password@domain.com/file/to/path/test_data.txt'):
  {'command': 'addurl', 'success': False, 'input': ['http://username:password@domain.com/file/to/path/test_data.txt to/path/test_data.txt'], 'error-messages': ['  unable to access url: http://username:password@domain.com/file/to/path/test_data.txt'], 'file': 'to/path/test_data.txt'}])] [AnnexBatchCommandError: 'addurl' [Error, annex reported failure for addurl (url='http://username:password@domain.com/file/to/path/test_data.txt'): {'command': 'addurl', 'success': False, 'input': ['http://username:password@domain.com/file/to/path/test_data.txt'], 'error-messages': ['  unable to access url: http://username:password@domain.com/file/to/path/test_data.txt'], 'file': 'to/path/test_data.txt'}]]", "[ERROR] BatchedAnnex(command=['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'addurl', '--fast', '--with-files', '--json', '--json-error-messages', '--batch'], encoding=UTF-8, exception_on_timeout=False, generator=<datalad.runner.nonasyncrunner._ResultGenerator object at 0x7fa40c583a00>, output_proc=<function readline_json at 0x7fa40ce911f0>, path=/home/jovyan, return_code=None, runner=<datalad.runner.runner.WitlessRunner object at 0x7fa40c580ac0>, stderr_output=b'addurl: 1 failed\\n', stdin_queue=<queue.Queue object at 0x7fa40c5c7df0>, timeout=None, wait_timed_out=None) subprocess failed with CommandError: 'git -c diff.ignoreSubmodules=none annex addurl --fast --with-files --json --json-error-messages --batch' failed with exitcode 1 ", 'action summary:', '  addurls (error: 1)']

DataLad addurls command calls git annex addurl. Depending on dataset configuration, some special remote will handle that url from there (including authentication).

With no particular configuration, it would be the web special remote, which is enabled by default. I never used it in this way, but if this old comment is to be trusted, it should accept username:password URLs (also note that in this way you would record credentials in the dataset). However, that comment is really old, so it might not be the case any more?

Could be worth trying a plain git annex addurl ... (maybe also with https). But from the error message it seems that git-annex is called in a correct way, and reported that it could not access the URL.

There are two other ways that I would suggest. Both avoid including the username & password in the URL and use DataLad’s credential management system.

Alternative 1 (plain DataLad): datalad special remote

To use DataLad’s credential system, you can enable the datalad special remote (note: using git annex addurl in examples for simplicity, but if it works, datalad addurls should too)

git annex initremote datalad type=external externaltype=datalad encryption=none
git annex addurl http://domain.com/file/to/path/test_data.txt

at which point you would be guided through setting up a providers configuration (saved in your ~/.config/datalad/providers/, but can also be placed in dataset’s .datalad/providers) and setting a credential (stored in your system keyring, see datalad wtf --section credentials for a clue). The configuration includes e.g. the URL pattern for which the credentials would be used.

Alternative 2 (DataLad-next extension): uncurl special remote

If you install datalad-next extension (note: as of today I would strongly recommend the beta release, pip install datalad-next==1.0.0b3, over the “regular” 0.6 version), you will have access to datalad credentials command for easier credential management, and the uncurl special remote that uses next’s credential workflow (see the docs for details on both). This would translate to:

git annex initremote uncurl type=external externaltype=uncurl encryption=none
git annex addurl http://domain.com/file/to/path/test_data.txt

at which point you would be prompted for credentials (stored in your system keyring), that would be associated with given URL realm by a config entry in your ~/.gitconfig.

Note: you can optionally set autoenable=true for the remote, to enable it automatically for clones.